Patchwork [C++0x] contiguous bitfields race implementation

login
register
mail settings
Submitter Aldy Hernandez
Date July 25, 2011, 5:07 p.m.
Message ID <4E2DA2BA.1010003@redhat.com>
Download mbox | patch
Permalink /patch/106720/
State New
Headers show

Comments

Aldy Hernandez - July 25, 2011, 5:07 p.m.
On 07/22/11 13:44, Jason Merrill wrote:
> On 07/18/2011 08:02 AM, Aldy Hernandez wrote:
>> + /* If other threads can't see this value, no need to restrict
>> stores. */
>> + if (ALLOW_STORE_DATA_RACES
>> + || !DECL_THREAD_VISIBLE_P (innerdecl))
>> + {
>> + *bitstart = *bitend = 0;
>> + return;
>> + }
>
> What if get_inner_reference returns something that isn't a DECL, such as
> an INDIRECT_REF?

I had changed this already to take into account aliasing, so if we get 
an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we 
proceed with the restriction:

+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || (!ptr_deref_may_alias_global_p (innerdecl)
+         && (DECL_THREAD_LOCAL_P (innerdecl)
+             || !TREE_STATIC (innerdecl))))


>> + if (fld)
>> + {
>> + /* We found the end of the bit field sequence. Include the
>> + padding up to the next field and be done. */
>> + *bitend = bitpos - 1;
>> + }
>
> bitpos is the position of "field", and it seems to me we want the
> position of "fld" here.

Notice that bitpos gets recalculated at each iteration by 
get_inner_reference, so bitpos is actually the position of fld.

>> + /* If unset, no restriction. */
>> + if (!bitregion_end)
>> + maxbits = 0;
>> + else
>> + maxbits = (bitregion_end - bitregion_start) % align;
>
> Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?

Fixed everywhere.

>> + if (!bitregion_end)
>> + maxbits = 0;
>> + else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
>> + maxbits = bitregion_end - bitregion_start;
>> + else
>> + maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
>
> I assume the 1|| was there for debugging?

Fixed, plus I adjusted the calculation of maxbits everywhere because I 
found an off-by-one error.

I have also overhauled store_bit_field() to adjust the address of the 
address to point to the beginning of the bit region.  This fixed a 
myraid of corner cases pointed out by a test Hans Boehm was kind enough 
to provide.

I have added more tests.

How does this look?  (Pending tests.)
* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(get_bit_range): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* doc/invoke.texi: Document parameter allow-store-data-races.
Jason Merrill - July 25, 2011, 11:55 p.m.
On 07/25/2011 10:07 AM, Aldy Hernandez wrote:
> I had changed this already to take into account aliasing, so if we get
> an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we
> proceed with the restriction:

Sounds good.  "global" includes malloc'd memory, right?  There don't 
seem to be any tests for that.

Speaking of tests, please put them in c-c++-common.

> +      bitnum -= bitregion_start;
> +      bitregion_end -= bitregion_start;
> +      bitregion_start = 0;

Why is this necessary/useful?

Jason
Aldy Hernandez - July 26, 2011, 4:36 p.m.
>> + bitnum -= bitregion_start;
>> + bitregion_end -= bitregion_start;
>> + bitregion_start = 0;
>
> Why is this necessary/useful?

You mean, why am I resetting these values (because the call to 
get_best_mode() following it needs the adjusted values).  Or why am I 
adjusting the address to point to the beginning of the region?

A
Jason Merrill - July 26, 2011, 4:57 p.m.
On 07/26/2011 09:36 AM, Aldy Hernandez wrote:
>
>>> + bitnum -= bitregion_start;
>>> + bitregion_end -= bitregion_start;
>>> + bitregion_start = 0;
>>
>> Why is this necessary/useful?
>
> You mean, why am I resetting these values (because the call to
> get_best_mode() following it needs the adjusted values). Or why am I
> adjusting the address to point to the beginning of the region?

I think the adjustment above is intended to match the adjustment of the 
address by bitregion_start/BITS_PER_UNIT, but the above seems to assume 
that bitregion_start%BITS_PER_UNIT == 0.

Jason
Aldy Hernandez - July 26, 2011, 5:32 p.m.
> I think the adjustment above is intended to match the adjustment of the
> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
> that bitregion_start%BITS_PER_UNIT == 0.

That was intentional.  bitregion_start always falls on a byte boundary, 
does it not?

struct {
	stuff;
	unsigned int b:3;
	unsigned int other_bits:22;
	other_stuff;
}

Does not "b" always start at a byte boundary?
Jason Merrill - July 26, 2011, 5:38 p.m.
On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>
>> I think the adjustment above is intended to match the adjustment of the
>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>> that bitregion_start%BITS_PER_UNIT == 0.
>
> That was intentional. bitregion_start always falls on a byte boundary,
> does it not?

Ah, yes, of course, it's bitnum that might not.  The code changes look 
good, then.

Jason
Aldy Hernandez - July 26, 2011, 7:21 p.m.
On 07/25/11 18:55, Jason Merrill wrote:
> On 07/25/2011 10:07 AM, Aldy Hernandez wrote:
>> I had changed this already to take into account aliasing, so if we get
>> an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we
>> proceed with the restriction:
>
> Sounds good. "global" includes malloc'd memory, right? There don't seem
> to be any tests for that.

Is the attached test appropriate?
/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
/* { dg-options "-O2 --param allow-store-data-races=0" } */

#include <stdlib.h>

struct bits
{
  char a;
  int b:7;
  int c:9;
  unsigned char d;
} x;

struct bits *p;

static void allocit()
{
  p = (struct bits *) malloc (sizeof (struct bits));
}

void foo()
{
  allocit();
  p -> c = 55;
}

/* { dg-final { scan-assembler-not "movl\t\\(" } } */
Richard Guenther - July 27, 2011, 2:52 p.m.
On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>
>>> I think the adjustment above is intended to match the adjustment of the
>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>> that bitregion_start%BITS_PER_UNIT == 0.
>>
>> That was intentional. bitregion_start always falls on a byte boundary,
>> does it not?
>
> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
> then.

Looks like this was an approval ...

Anyway, I don't think a --param is appropriate to control a flag whether
to allow store data-races to be created.  Why not use a regular option instead?

I believe that any after-the-fact attempt to recover bitfield boundaries is
going to fail unless you preserve more information during bitfield layout.

Consider

struct {
  char : 8;
  char : 0;
  char : 8;
};

where the : 0 isn't preserved in any way and you can't distinguish
it from struct { char : 8; char : 8; }.

Richard.

> Jason
>
Richard Guenther - July 27, 2011, 2:56 p.m.
On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>
>>>> I think the adjustment above is intended to match the adjustment of the
>>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>>> that bitregion_start%BITS_PER_UNIT == 0.
>>>
>>> That was intentional. bitregion_start always falls on a byte boundary,
>>> does it not?
>>
>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>> then.
>
> Looks like this was an approval ...
>
> Anyway, I don't think a --param is appropriate to control a flag whether
> to allow store data-races to be created.  Why not use a regular option instead?
>
> I believe that any after-the-fact attempt to recover bitfield boundaries is
> going to fail unless you preserve more information during bitfield layout.
>
> Consider
>
> struct {
>  char : 8;
>  char : 0;
>  char : 8;
> };
>
> where the : 0 isn't preserved in any way and you can't distinguish
> it from struct { char : 8; char : 8; }.

Oh, and

   INNERDECL is the actual object being referenced.

      || (!ptr_deref_may_alias_global_p (innerdecl)

is surely not what you want.  That asks if *innerdecl is global memory.
I suppose you want is_global_var (innerdecl)?  But with

          && (DECL_THREAD_LOCAL_P (innerdecl)
              || !TREE_STATIC (innerdecl))))

you can simply skip this test.  Or what was it supposed to do?

Richard.
Richard Guenther - July 27, 2011, 3:03 p.m.
On Wed, Jul 27, 2011 at 4:56 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
>>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>>
>>>>> I think the adjustment above is intended to match the adjustment of the
>>>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>>>> that bitregion_start%BITS_PER_UNIT == 0.
>>>>
>>>> That was intentional. bitregion_start always falls on a byte boundary,
>>>> does it not?
>>>
>>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>>> then.
>>
>> Looks like this was an approval ...
>>
>> Anyway, I don't think a --param is appropriate to control a flag whether
>> to allow store data-races to be created.  Why not use a regular option instead?
>>
>> I believe that any after-the-fact attempt to recover bitfield boundaries is
>> going to fail unless you preserve more information during bitfield layout.
>>
>> Consider
>>
>> struct {
>>  char : 8;
>>  char : 0;
>>  char : 8;
>> };
>>
>> where the : 0 isn't preserved in any way and you can't distinguish
>> it from struct { char : 8; char : 8; }.
>
> Oh, and
>
>   INNERDECL is the actual object being referenced.
>
>      || (!ptr_deref_may_alias_global_p (innerdecl)
>
> is surely not what you want.  That asks if *innerdecl is global memory.
> I suppose you want is_global_var (innerdecl)?  But with
>
>          && (DECL_THREAD_LOCAL_P (innerdecl)
>              || !TREE_STATIC (innerdecl))))
>
> you can simply skip this test.  Or what was it supposed to do?

And

      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
                  unshare_expr (TREE_OPERAND (exp, 0)),
                  fld, NULL_TREE);
      get_inner_reference (t, &bitsize, &bitpos, &offset,
                           &mode, &unsignedp, &volatilep, true);

for each field of a struct type is of course ... gross!  In fact you already
have the FIELD_DECL in the single caller!  Yes I know there is not
enough information preserved by bitfield layout - see my previous reply.

      if (TREE_CODE (to) == COMPONENT_REF
          && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
        get_bit_range (&bitregion_start, &bitregion_end,
                       to, tem, bitpos, bitsize);

and shouldn't this test DECL_BIT_FIELD instead of DECL_BIT_FIELD_TYPE?

Richard.
Aldy Hernandez - July 27, 2011, 5:08 p.m.
> Anyway, I don't think a --param is appropriate to control a flag whether
> to allow store data-races to be created.  Why not use a regular option instead?

I don't care either way.  What -foption-name do you suggest?
Andrew MacLeod - July 27, 2011, 5:19 p.m.
On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
>
>> Anyway, I don't think a --param is appropriate to control a flag whether
>> to allow store data-races to be created.  Why not use a regular 
>> option instead?
>
> I don't care either way.  What -foption-name do you suggest?
Well, I suggested a -f option set last year when this was laid out, and 
Ian suggested that it should be a --param

http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html

"I don't agree with your proposed command line options.  They seem fine
for internal use, but I think very very few users would know when or
whether they should use -fno-data-race-stores.  I think you should
downgrade those options to a --param value, and think about a
multi-layered -fmemory-model option. "

Andrew
Aldy Hernandez - July 27, 2011, 5:36 p.m.
> Oh, and
>
>     INNERDECL is the actual object being referenced.
>
>        || (!ptr_deref_may_alias_global_p (innerdecl)
>
> is surely not what you want.  That asks if *innerdecl is global memory.
> I suppose you want is_global_var (innerdecl)?  But with
>
>            &&  (DECL_THREAD_LOCAL_P (innerdecl)
>                || !TREE_STATIC (innerdecl))))
>
> you can simply skip this test.  Or what was it supposed to do?

The test was there because neither DECL_THREAD_LOCAL_P nor is_global_var 
can handle MEM_REF's.

Would you prefer an explicit check for a *_DECL?

    if (ALLOW_STORE_DATA_RACES
-      || (!ptr_deref_may_alias_global_p (innerdecl)
+      || (DECL_P (innerdecl)
           && (DECL_THREAD_LOCAL_P (innerdecl)
               || !TREE_STATIC (innerdecl))))
H.J. Lu - July 27, 2011, 5:56 p.m.
On Mon, Jul 25, 2011 at 10:07 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 07/22/11 13:44, Jason Merrill wrote:
>>
>> On 07/18/2011 08:02 AM, Aldy Hernandez wrote:
>>>
>>> + /* If other threads can't see this value, no need to restrict
>>> stores. */
>>> + if (ALLOW_STORE_DATA_RACES
>>> + || !DECL_THREAD_VISIBLE_P (innerdecl))
>>> + {
>>> + *bitstart = *bitend = 0;
>>> + return;
>>> + }
>>
>> What if get_inner_reference returns something that isn't a DECL, such as
>> an INDIRECT_REF?
>
> I had changed this already to take into account aliasing, so if we get an
> INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we proceed
> with the restriction:
>
> +  /* If other threads can't see this value, no need to restrict stores.  */
> +  if (ALLOW_STORE_DATA_RACES
> +      || (!ptr_deref_may_alias_global_p (innerdecl)
> +         && (DECL_THREAD_LOCAL_P (innerdecl)
> +             || !TREE_STATIC (innerdecl))))
>
>
>>> + if (fld)
>>> + {
>>> + /* We found the end of the bit field sequence. Include the
>>> + padding up to the next field and be done. */
>>> + *bitend = bitpos - 1;
>>> + }
>>
>> bitpos is the position of "field", and it seems to me we want the
>> position of "fld" here.
>
> Notice that bitpos gets recalculated at each iteration by
> get_inner_reference, so bitpos is actually the position of fld.
>
>>> + /* If unset, no restriction. */
>>> + if (!bitregion_end)
>>> + maxbits = 0;
>>> + else
>>> + maxbits = (bitregion_end - bitregion_start) % align;
>>
>> Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?
>
> Fixed everywhere.
>
>>> + if (!bitregion_end)
>>> + maxbits = 0;
>>> + else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
>>> + maxbits = bitregion_end - bitregion_start;
>>> + else
>>> + maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
>>
>> I assume the 1|| was there for debugging?
>
> Fixed, plus I adjusted the calculation of maxbits everywhere because I found
> an off-by-one error.
>
> I have also overhauled store_bit_field() to adjust the address of the
> address to point to the beginning of the bit region.  This fixed a myraid of
> corner cases pointed out by a test Hans Boehm was kind enough to provide.
>
> I have added more tests.
>
> How does this look?  (Pending tests.)
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875
Joseph S. Myers - July 27, 2011, 9:34 p.m.
On Wed, 27 Jul 2011, Andrew MacLeod wrote:

> On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
> > 
> > > Anyway, I don't think a --param is appropriate to control a flag whether
> > > to allow store data-races to be created.  Why not use a regular option
> > > instead?
> > 
> > I don't care either way.  What -foption-name do you suggest?
> Well, I suggested a -f option set last year when this was laid out, and Ian
> suggested that it should be a --param
> 
> http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html
> 
> "I don't agree with your proposed command line options.  They seem fine
> for internal use, but I think very very few users would know when or
> whether they should use -fno-data-race-stores.  I think you should
> downgrade those options to a --param value, and think about a
> multi-layered -fmemory-model option. "

The documentation says --param is for "various constants to control the 
amount of optimization that is done".  I don't think it should be used for 
anything that affects the semantics of the program; I think -f options are 
what's appropriate here (with appropriate warnings in the documentation if 
most of the options should not generally be used directly by users).
Richard Guenther - July 28, 2011, 8:04 a.m.
On Wed, Jul 27, 2011 at 7:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Oh, and
>>
>>    INNERDECL is the actual object being referenced.
>>
>>       || (!ptr_deref_may_alias_global_p (innerdecl)
>>
>> is surely not what you want.  That asks if *innerdecl is global memory.
>> I suppose you want is_global_var (innerdecl)?  But with
>>
>>           &&  (DECL_THREAD_LOCAL_P (innerdecl)
>>               || !TREE_STATIC (innerdecl))))
>>
>> you can simply skip this test.  Or what was it supposed to do?
>
> The test was there because neither DECL_THREAD_LOCAL_P nor is_global_var can
> handle MEM_REF's.

Ok, in that case you want

  (TREE_CODE (innerdecl) == MEM_REF || TREE_CODE (innerdecl) == TARGET_MEM_REF)
  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))

which gets you at the actual pointer.

> Would you prefer an explicit check for a *_DECL?
>
>   if (ALLOW_STORE_DATA_RACES
> -      || (!ptr_deref_may_alias_global_p (innerdecl)
> +      || (DECL_P (innerdecl)
>          && (DECL_THREAD_LOCAL_P (innerdecl)
>              || !TREE_STATIC (innerdecl))))

Yes.  Together with the above it looks then optimal.

Richard.
Richard Guenther - July 28, 2011, 8:05 a.m.
On Wed, Jul 27, 2011 at 7:19 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
>>
>>> Anyway, I don't think a --param is appropriate to control a flag whether
>>> to allow store data-races to be created.  Why not use a regular option
>>> instead?
>>
>> I don't care either way.  What -foption-name do you suggest?
>
> Well, I suggested a -f option set last year when this was laid out, and Ian
> suggested that it should be a --param
>
> http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html
>
> "I don't agree with your proposed command line options.  They seem fine
> for internal use, but I think very very few users would know when or
> whether they should use -fno-data-race-stores.  I think you should
> downgrade those options to a --param value, and think about a
> multi-layered -fmemory-model option. "

Hm, ok.  I suppose we can revisit this when implementing such -fmemory-model
option then.  --params we can at least freely remove between releases.

Richard.

> Andrew
>
Richard Guenther - July 28, 2011, 11:40 a.m.
On Wed, Jul 27, 2011 at 5:03 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jul 27, 2011 at 4:56 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
>>>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>>>
>>>>>> I think the adjustment above is intended to match the adjustment of the
>>>>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>>>>> that bitregion_start%BITS_PER_UNIT == 0.
>>>>>
>>>>> That was intentional. bitregion_start always falls on a byte boundary,
>>>>> does it not?
>>>>
>>>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>>>> then.
>>>
>>> Looks like this was an approval ...
>>>
>>> Anyway, I don't think a --param is appropriate to control a flag whether
>>> to allow store data-races to be created.  Why not use a regular option instead?
>>>
>>> I believe that any after-the-fact attempt to recover bitfield boundaries is
>>> going to fail unless you preserve more information during bitfield layout.
>>>
>>> Consider
>>>
>>> struct {
>>>  char : 8;
>>>  char : 0;
>>>  char : 8;
>>> };
>>>
>>> where the : 0 isn't preserved in any way and you can't distinguish
>>> it from struct { char : 8; char : 8; }.
>>
>> Oh, and
>>
>>   INNERDECL is the actual object being referenced.
>>
>>      || (!ptr_deref_may_alias_global_p (innerdecl)
>>
>> is surely not what you want.  That asks if *innerdecl is global memory.
>> I suppose you want is_global_var (innerdecl)?  But with
>>
>>          && (DECL_THREAD_LOCAL_P (innerdecl)
>>              || !TREE_STATIC (innerdecl))))
>>
>> you can simply skip this test.  Or what was it supposed to do?
>
> And
>
>      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
>                  unshare_expr (TREE_OPERAND (exp, 0)),
>                  fld, NULL_TREE);
>      get_inner_reference (t, &bitsize, &bitpos, &offset,
>                           &mode, &unsignedp, &volatilep, true);
>
> for each field of a struct type is of course ... gross!  In fact you already
> have the FIELD_DECL in the single caller!  Yes I know there is not
> enough information preserved by bitfield layout - see my previous reply.

Looking at the C++ memory model what you need is indeed simple enough
to recover here.  Still this loop does quadratic work for a struct with
N bitfield members and a function which stores into all of them.
And that with a big constant factor as you build a component-ref
and even unshare trees (which isn't necessary here anyway).  In fact
you could easily manually keep track of bitpos when walking adjacent
bitfield members.  An initial call to get_inner_reference on
TREE_OPERAND (exp, 0) would give you the starting position of the record.

That would still be quadratic of course.

For bitfield lowering I'd like to preserve a way to get from a field-decl to
the first field-decl of a group of bitfield members that occupy an aligned
amount of storage (as place_field assigns it).  That wouldn't necessarily
match the first bitfield field in the C++ bitfield group sense but would
probably be sensible enough for conforming accesses (and you'd only
need to search forward from that first field looking for a zero-size
field).  Now, the question is of course what to do for DECL_PACKED
fields (I suppose, simply ignore the C++ memory model as C++ doesn't
have a notion of packed or specially (mis-)aligned structs or bitfields).

Richard.
Aldy Hernandez - July 28, 2011, 7:35 p.m.
>        if (TREE_CODE (to) == COMPONENT_REF
>            &&  DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
>          get_bit_range (&bitregion_start,&bitregion_end,
>                         to, tem, bitpos, bitsize);
>
> and shouldn't this test DECL_BIT_FIELD instead of DECL_BIT_FIELD_TYPE?

As I mentioned here:

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01416.html

I am using DECL_BIT_FIELD_TYPE instead of DECL_BIT_FIELD to determine if 
a DECL is a bit field because DECL_BIT_FIELD is not set for bit fields 
with mode sized number of bits (32-bits, 16-bits, etc).
Richard Guenther - July 28, 2011, 7:42 p.m.
On Thu, Jul 28, 2011 at 9:12 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Yes.  Together with the above it looks then optimal.
>
> Attached patch tested on x86-64 Linux.
>
> OK for mainline?

Ok with the || moved to the next line as per coding-standards.

Thanks,
Richard.
Aldy Hernandez - July 28, 2011, 9:32 p.m.
> I believe that any after-the-fact attempt to recover bitfield boundaries is
> going to fail unless you preserve more information during bitfield layout.
>
> Consider
>
> struct {
>    char : 8;
>    char : 0;
>    char : 8;
> };
>
> where the : 0 isn't preserved in any way and you can't distinguish
> it from struct { char : 8; char : 8; }.

Huh?  In my tests the :0 is preserved, it just doesn't have a DECL_NAME.

(gdb) p fld
$41 = (tree) 0x7ffff7778130
(gdb) pt
  <field_decl 0x7ffff7778130 D.1593
...

I have tried the following scenario, and we calculate the beginning of 
the bit region correctly (bit 32).

struct bits
{
   char a;
   int b:7;
   int :0;		<-- bitregion start
   int c:9;		<-- bitregion start
   unsigned char d;
} *p;

void foo() { p -> c = 55; }

Am I misunderstanding?  Why do you suggest we need to preserve more 
information during bitfield layout?

FWIW, I should add a zero-length bit test.
Jason Merrill - July 29, 2011, 12:04 a.m.
On 07/28/2011 04:40 AM, Richard Guenther wrote:
> field).  Now, the question is of course what to do for DECL_PACKED
> fields (I suppose, simply ignore the C++ memory model as C++ doesn't
> have a notion of packed or specially (mis-)aligned structs or bitfields).

I think treat them as bitfields for this purpose.

Jason
Richard Guenther - July 29, 2011, 9:37 a.m.
On Fri, Jul 29, 2011 at 4:12 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 07/28/11 06:40, Richard Guenther wrote:
>
>> Looking at the C++ memory model what you need is indeed simple enough
>> to recover here.  Still this loop does quadratic work for a struct with
>> N bitfield members and a function which stores into all of them.
>> And that with a big constant factor as you build a component-ref
>> and even unshare trees (which isn't necessary here anyway).  In fact
>> you could easily manually keep track of bitpos when walking adjacent
>> bitfield members.  An initial call to get_inner_reference on
>> TREE_OPERAND (exp, 0) would give you the starting position of the record.
>>
>> That would still be quadratic of course.
>
> Actually, we don't need to call get_inner_reference at all.  It seems
> DECL_FIELD_BIT_OFFSET has all the information we need.
>
> How about we simplify things further as in the attached patch?
>
> Tested on x86-64 Linux.
>
> OK for mainline?

Well ... byte pieces of the offset can be in the tree offset
(DECL_FIELD_OFFSET).  Only up to DECL_OFFSET_ALIGN bits
are tracked in DECL_FIELD_BIT_OFFSET (and DECL_FIELD_OFFSET
can be a non-constant - at least for Ada, not sure about C++).

But - can you please expand a bit on the desired semantics of
get_bit_range?  Especially, relative to what is *bitstart / *bitend
supposed to be?  Why do you pass in bitpos and bitsize - they
seem to be used as local variables only.  Why is the check for
thread-local storage in this function and not in the caller (and
what's the magic [0,0] bit-range relative to?)?

The existing get_inner_reference calls give you a bitpos relative
to the start of the containing object - but

      /* If this is the last element in the structure, include the padding
         at the end of structure.  */
      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;

will set *bitend to the size of the direct parent structure size, not the
size of the underlying object.  Your proposed patch changes
bitpos to be relative to the direct parent structure.

So - I guess you need to play with some testcases like

struct {
   int some_padding;
   struct {
      int bitfield :1;
   } x;
};

and split / clarify some of get_bit_range comments.

Thanks,
Richard.

>
Richard Guenther - Aug. 1, 2011, 1:50 p.m.
On Fri, Jul 29, 2011 at 11:37 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Jul 29, 2011 at 4:12 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 07/28/11 06:40, Richard Guenther wrote:
>>
>>> Looking at the C++ memory model what you need is indeed simple enough
>>> to recover here.  Still this loop does quadratic work for a struct with
>>> N bitfield members and a function which stores into all of them.
>>> And that with a big constant factor as you build a component-ref
>>> and even unshare trees (which isn't necessary here anyway).  In fact
>>> you could easily manually keep track of bitpos when walking adjacent
>>> bitfield members.  An initial call to get_inner_reference on
>>> TREE_OPERAND (exp, 0) would give you the starting position of the record.
>>>
>>> That would still be quadratic of course.
>>
>> Actually, we don't need to call get_inner_reference at all.  It seems
>> DECL_FIELD_BIT_OFFSET has all the information we need.
>>
>> How about we simplify things further as in the attached patch?
>>
>> Tested on x86-64 Linux.
>>
>> OK for mainline?
>
> Well ... byte pieces of the offset can be in the tree offset
> (DECL_FIELD_OFFSET).  Only up to DECL_OFFSET_ALIGN bits
> are tracked in DECL_FIELD_BIT_OFFSET (and DECL_FIELD_OFFSET
> can be a non-constant - at least for Ada, not sure about C++).
>
> But - can you please expand a bit on the desired semantics of
> get_bit_range?  Especially, relative to what is *bitstart / *bitend
> supposed to be?  Why do you pass in bitpos and bitsize - they
> seem to be used as local variables only.  Why is the check for
> thread-local storage in this function and not in the caller (and
> what's the magic [0,0] bit-range relative to?)?
>
> The existing get_inner_reference calls give you a bitpos relative
> to the start of the containing object - but
>
>      /* If this is the last element in the structure, include the padding
>         at the end of structure.  */
>      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
>
> will set *bitend to the size of the direct parent structure size, not the
> size of the underlying object.  Your proposed patch changes
> bitpos to be relative to the direct parent structure.

Using TYPE_SIZE can also run into issues with C++ tail packing,
you need to use DECL_SIZE of the respective field instead.  Consider

struct A {
  int : 17;
};
struct B : public A {
  char c;
};

where I'm not sure we are not allowed to pack c into the tail padding
in A.  Also neither TYPE_SIZE nor DECL_SIZE have to be constant,
at least in Ada you can have a variable-sized array before, and in
C you can have a trailing one.

Richard.

> So - I guess you need to play with some testcases like
>
> struct {
>   int some_padding;
>   struct {
>      int bitfield :1;
>   } x;
> };
>
> and split / clarify some of get_bit_range comments.
>
> Thanks,
> Richard.
>
>>
>

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 176280)
+++ doc/invoke.texi	(working copy)
@@ -9027,6 +9027,11 @@  The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @item case-values-threshold
 The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
Index: machmode.h
===================================================================
--- machmode.h	(revision 176280)
+++ machmode.h	(working copy)
@@ -248,7 +248,10 @@  extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176280)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@  optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5222,7 +5222,7 @@  fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5287,7 +5287,7 @@  fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 176280)
+++ params.h	(working copy)
@@ -211,4 +211,6 @@  extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: testsuite/gcc.dg/20110509-4.c
===================================================================
--- testsuite/gcc.dg/20110509-4.c	(revision 0)
+++ testsuite/gcc.dg/20110509-4.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int c:9;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler-not "movl" } } */
Index: testsuite/gcc.dg/20110509.c
===================================================================
--- testsuite/gcc.dg/20110509.c	(revision 0)
+++ testsuite/gcc.dg/20110509.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.A.  */
+
+struct S
+{
+  volatile unsigned int a : 4;
+  unsigned char b;
+  unsigned int c : 6;
+} var;
+
+void set_a()
+{
+  var.a = 12;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-2.c
===================================================================
--- testsuite/gcc.dg/20110509-2.c	(revision 0)
+++ testsuite/gcc.dg/20110509-2.c	(revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.K.  */
+
+struct S
+{
+  volatile int i;
+  volatile int j: 32;
+  volatile int k: 15;
+  volatile char c[2];
+} var;
+
+void setit()
+{
+  var.k = 13;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-3.c
===================================================================
--- testsuite/gcc.dg/20110509-3.c	(revision 0)
+++ testsuite/gcc.dg/20110509-3.c	(revision 0)
@@ -0,0 +1,21 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Make sure we don't narrow down to a QI or HI to store into VAR.J,
+   but instead use an SI.  */
+
+struct S
+{ 
+  volatile int i: 4;
+  volatile int j: 4;
+  volatile int k: 8;
+  volatile int l: 8;
+  volatile int m: 8;
+} var;
+
+void setit()
+{ 
+  var.j = 5;
+}
+
+/* { dg-final { scan-assembler "movl.*, var" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 176280)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@  noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,8 @@  noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
+		   0, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 176280)
+++ expr.c	(working copy)
@@ -55,6 +55,7 @@  along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -143,7 +144,9 @@  static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2074,7 +2077,7 @@  emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2168,7 +2171,7 @@  copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2805,7 +2808,7 @@  write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3940,6 +3943,8 @@  get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT bitregion_start,
+				 unsigned HOST_WIDE_INT bitregion_end,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4001,6 +4006,7 @@  optimize_bitfield_assignment_op (unsigne
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
       str_mode = get_best_mode (bitsize, bitpos,
+				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4109,6 +4115,113 @@  optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the bit range of
+   consecutive bits in which this COMPONENT_REF belongs in.  The
+   values are returned in *BITSTART and *BITEND.  If either the C++
+   memory model is not activated, or this memory access is not thread
+   visible, 0 is returned in *BITSTART and *BITEND.
+
+   EXP is the COMPONENT_REF.
+   INNERDECL is the actual object being referenced.
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above, a
+   range of 0..7 (because no one cares if we store into the
+   padding).  */
+
+static void
+get_bit_range (unsigned HOST_WIDE_INT *bitstart,
+	       unsigned HOST_WIDE_INT *bitend,
+	       tree exp, tree innerdecl,
+	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  bool found_field = false;
+  bool prev_field_is_bitfield;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || (!ptr_deref_may_alias_global_p (innerdecl)
+	  && (DECL_THREAD_LOCAL_P (innerdecl)
+	      || !TREE_STATIC (innerdecl))))
+    {
+      *bitstart = *bitend = 0;
+      return;
+    }
+
+  /* Bit field we're storing into.  */
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Count the contiguous bitfields for the memory location that
+     contains FIELD.  */
+  *bitstart = 0;
+  prev_field_is_bitfield = true;
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      if (field == fld)
+	found_field = true;
+
+      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+	{
+	  if (prev_field_is_bitfield == false)
+	    {
+	      *bitstart = bitpos;
+	      prev_field_is_bitfield = true;
+	    }
+	}
+      else
+	{
+	  prev_field_is_bitfield = false;
+	  if (found_field)
+	    break;
+	}
+    }
+  gcc_assert (found_field);
+
+  if (fld)
+    {
+      /* We found the end of the bit field sequence.  Include the
+	 padding up to the next field and be done.  */
+      *bitend = bitpos - 1;
+    }
+  else
+    {
+      /* If this is the last element in the structure, include the padding
+	 at the end of structure.  */
+      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+    }
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4208,6 +4321,8 @@  expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      unsigned HOST_WIDE_INT bitregion_start = 0;
+      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4217,6 +4332,11 @@  expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
+	get_bit_range (&bitregion_start, &bitregion_end,
+		       to, tem, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4298,11 +4418,14 @@  expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+				  bitregion_start, bitregion_end,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4323,7 +4446,9 @@  expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos,
+				    bitregion_start, bitregion_end,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4348,11 +4473,15 @@  expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
+					       bitregion_start, bitregion_end,
+					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4745,7 +4874,7 @@  store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5210,7 +5339,8 @@  store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5784,6 +5914,11 @@  store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5797,6 +5932,8 @@  store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT bitregion_start,
+	     unsigned HOST_WIDE_INT bitregion_end,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5829,8 +5966,9 @@  store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos,
+		   bitregion_start, bitregion_end,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5944,7 +6082,9 @@  store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos,
+		       bitregion_start, bitregion_end,
+		       mode, temp);
 
       return const0_rtx;
     }
@@ -7354,7 +7494,7 @@  expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 176280)
+++ expr.h	(working copy)
@@ -665,7 +665,10 @@  extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176280)
+++ stor-layout.c	(working copy)
@@ -2361,6 +2361,13 @@  fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   BITREGION_START is the bit position of the first bit in this
+   sequence of bit fields.  BITREGION_END is the last bit in this
+   sequence.  If these two fields are non-zero, we should restrict the
+   memory access to a maximum sized chunk of
+   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
+   any adjacent non bit-fields.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2378,11 +2385,21 @@  fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos,
+	       unsigned HOST_WIDE_INT bitregion_start,
+	       unsigned HOST_WIDE_INT bitregion_end,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
+  unsigned HOST_WIDE_INT maxbits;
+
+  /* If unset, no restriction.  */
+  if (!bitregion_end)
+    maxbits = MAX_FIXED_MODE_SIZE;
+  else
+    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2419,6 +2436,7 @@  get_best_mode (int bitsize, int bitpos, 
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 176280)
+++ calls.c	(working copy)
@@ -924,8 +924,8 @@  store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
-			     word);
+	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
+			     word_mode, word);
 	  }
       }
 }
Index: expmed.c
===================================================================
--- expmed.c	(revision 176280)
+++ expmed.c	(working copy)
@@ -47,9 +47,15 @@  struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +339,10 @@  mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT bitregion_start,
+		   unsigned HOST_WIDE_INT bitregion_end,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -455,6 +464,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
   /* We may be accessing data outside the field, which means
      we can alias adjacent data.  */
+  /* ?? not always for C++0x memory model ?? */
   if (MEM_P (op0))
     {
       op0 = shallow_copy_rtx (op0);
@@ -547,7 +557,9 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  bitregion_start, bitregion_end,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -710,6 +722,10 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
+      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
+
+      if (bitregion_end)
+	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -717,9 +733,12 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode  (bitsize, bitnum,
+				  bitregion_start, bitregion_end,
+				  MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,6 +767,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+				 bitregion_start, bitregion_end,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -760,21 +780,59 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos,
+			 bitregion_start, bitregion_end, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT bitregion_start,
+		 unsigned HOST_WIDE_INT bitregion_end,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  /* Under the C++0x memory model, we must not touch bits outside the
+     bit region.  Adjust the address to start at the beginning of the
+     bit region.  */
+  if (MEM_P (str_rtx)
+      && bitregion_start > 0)
+    {
+      enum machine_mode bestmode;
+      enum machine_mode op_mode;
+      unsigned HOST_WIDE_INT offset;
+
+      op_mode = mode_for_extraction (EP_insv, 3);
+      if (op_mode == MAX_MACHINE_MODE)
+	op_mode = VOIDmode;
+
+      offset = bitregion_start / BITS_PER_UNIT;
+      bitnum -= bitregion_start;
+      bitregion_end -= bitregion_start;
+      bitregion_start = 0;
+      bestmode = get_best_mode (bitsize, bitnum,
+				bitregion_start, bitregion_end,
+				MEM_ALIGN (str_rtx),
+				op_mode,
+				MEM_VOLATILE_P (str_rtx));
+      str_rtx = adjust_address (str_rtx, bestmode, offset);
+    }
+
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
+			  bitregion_start, bitregion_end,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 
@@ -790,7 +848,10 @@  store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -811,12 +872,19 @@  store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos,
+				 bitregion_start, bitregion_end,
+				 value);
 	  return;
 	}
     }
   else
     {
+      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
+
+      if (bitregion_end)
+	maxbits = bitregion_end - bitregion_start + 1;
+
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -829,10 +897,12 @@  store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      bitregion_start, bitregion_end,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -840,7 +910,7 @@  store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 bitregion_start, bitregion_end, value);
 	  return;
 	}
 
@@ -960,7 +1030,10 @@  store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1075,7 +1148,7 @@  store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, bitregion_start, bitregion_end, part);
       bitsdone += thissize;
     }
 }
@@ -1515,7 +1588,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1641,7 +1714,7 @@  extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 176280)
+++ Makefile.in	(working copy)
@@ -2908,7 +2908,7 @@  expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
-   $(COMMON_TARGET_H)
+   $(PARAMS_H) $(COMMON_TARGET_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 176280)
+++ stmt.c	(working copy)
@@ -1759,7 +1759,8 @@  expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD,
+			   0, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 176280)
+++ params.def	(working copy)
@@ -902,6 +902,12 @@  DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 	  "if 0, use the default for the machine",
           0, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables: