Patchwork [C++0x] contiguous bitfields race implementation

login
register
mail settings
Submitter Aldy Hernandez
Date May 9, 2011, 4:24 p.m.
Message ID <4DC8155A.3040401@redhat.com>
Download mbox | patch
Permalink /patch/94792/
State New
Headers show

Comments

Aldy Hernandez - May 9, 2011, 4:24 p.m.
Seeing that the current C++ draft has been approved, I'd like to submit 
this for mainline, and get the proper review everyone's being quietly 
avoiding :).

To refresh everyone's memory, here is the problem:

struct
{
     unsigned int a : 4;
     unsigned char b;
     unsigned int c: 6;
} var;


void seta(){
       var.a = 12;
}


In the new C++ standard, stores into <a> cannot touch <b>, so we can't 
store with anything wider (e.g. a 32 bit store) that will touch <b>. 
This problem can be seen on strictly aligned targets such as ARM, where 
we store the above sequence with a 32-bit store. Or on x86-64 with <a> 
being volatile (PR48124).

This patch fixes both problems, but only for the C++ memory model. This 
is NOT a generic fix PR48124, only a fix when using "--param 
allow-store-data-races=0".  I will gladly change the parameter name, if 
another is preferred.

The gist of this patch is in max_field_size(), where we calculate the 
maximum number of bits we can store into. In doing this calculation I 
assume we can store into the padding without causing any races. So, 
padding between fields and at the end of the structure are included.

Tested on x86-64 both with and without "--param 
allow-store-data-races=0", and visually inspecting the assembly on 
arm-linux and ia64-linux.

OK for trunk?
Aldy
* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(max_field_size): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* tree.h (DECL_THREAD_VISIBLE_P): New.
	* doc/invoke.texi: Document parameter allow-store-data-races.
Jeff Law - May 9, 2011, 5:15 p.m.
On 05/09/11 10:24, Aldy Hernandez wrote:
> Seeing that the current C++ draft has been approved, I'd like to submit
> this for mainline, and get the proper review everyone's being quietly
> avoiding :).
> 
> To refresh everyone's memory, here is the problem:
> 
> struct
> {
>     unsigned int a : 4;
>     unsigned char b;
>     unsigned int c: 6;
> } var;
> 
> 
> void seta(){
>       var.a = 12;
> }
> 
> 
> In the new C++ standard, stores into <a> cannot touch <b>, so we can't
> store with anything wider (e.g. a 32 bit store) that will touch <b>.
> This problem can be seen on strictly aligned targets such as ARM, where
> we store the above sequence with a 32-bit store. Or on x86-64 with <a>
> being volatile (PR48124).
> 
> This patch fixes both problems, but only for the C++ memory model. This
> is NOT a generic fix PR48124, only a fix when using "--param
> allow-store-data-races=0".  I will gladly change the parameter name, if
> another is preferred.
> 
> The gist of this patch is in max_field_size(), where we calculate the
> maximum number of bits we can store into. In doing this calculation I
> assume we can store into the padding without causing any races. So,
> padding between fields and at the end of the structure are included.
Well, the kernel guys would like to be able to be able to preserve the
padding bits too.  It's a long long sad story that I won't repeat...
And I don't think we should further complicate this stuff with the
desire to not clobber padding bits :-)  Though be aware the request
might come one day....


> 
> Tested on x86-64 both with and without "--param
> allow-store-data-races=0", and visually inspecting the assembly on
> arm-linux and ia64-linux.
Any way to add a test to the testsuite?

General approach seems OK; I didn't dive deeply into the implementation.
 I'll leave that for rth & jason :-)

jeff
Aldy Hernandez - May 9, 2011, 5:26 p.m.
>> struct
>> {
>>      unsigned int a : 4;
>>      unsigned char b;
>>      unsigned int c: 6;
>> } var;


> Well, the kernel guys would like to be able to be able to preserve the
> padding bits too.  It's a long long sad story that I won't repeat...
> And I don't think we should further complicate this stuff with the
> desire to not clobber padding bits :-)  Though be aware the request
> might come one day....

Woah, let me see if I got this right.  If we were to store in VAR.C 
above, the default for this memory model would be NOT to clobber the 
padding bits past <c>?  That definitely makes my implementation simpler, 
so I won't complain, but that's just weird.

>> Tested on x86-64 both with and without "--param
>> allow-store-data-races=0", and visually inspecting the assembly on
>> arm-linux and ia64-linux.
> Any way to add a test to the testsuite?

Arghhh... I was afraid you'd ask for one.  It was much easier with the 
test harness on cxx-memory-model.  I'll whip one up though...

Aldy
Jeff Law - May 9, 2011, 6:03 p.m.
On 05/09/11 11:26, Aldy Hernandez wrote:
> 
>>> struct
>>> {
>>>      unsigned int a : 4;
>>>      unsigned char b;
>>>      unsigned int c: 6;
>>> } var;
> 
> 
>> Well, the kernel guys would like to be able to be able to preserve the
>> padding bits too.  It's a long long sad story that I won't repeat...
>> And I don't think we should further complicate this stuff with the
>> desire to not clobber padding bits :-)  Though be aware the request
>> might come one day....
> 
> Woah, let me see if I got this right.  If we were to store in VAR.C
> above, the default for this memory model would be NOT to clobber the
> padding bits past <c>?  That definitely makes my implementation simpler,
> so I won't complain, but that's just weird.
Just to be clear, it's something I've discussed with the kernel guys and
is completely separate from the C++ memory model.  I don't think we
should wrap this into your current work.

Consider if the kernel team wanted to add some information to a
structure without growing the structure.  Furthermore, assume that the
structure escapes, say into modules that aren't necessarily going to be
rebuilt, but those modules won't need to ever access this new
information.  And assume there happens to be enough padding bits to hold
this auxiliary information.

This has actually occurred and the kernel team wanted to use the padding
bits to hold the auxiliary information and maintain kernel ABI/API
compatibility.  Unfortunately, a store to a nearby bitfield can
overwrite the padding, thus if the structure escaped to a module that
still thought the bits were padding, that module would/could clobber
those padding bits, destroying the auxiliary data.

If GCC had a mode where it would preserve the padding bits (when
possible), it'd help the kernel team in these situations.



> 
> Arghhh... I was afraid you'd ask for one.  It was much easier with the
> test harness on cxx-memory-model.  I'll whip one up though...
Given others have (rightly) called me out on it a lot recently, I
figured I'd pass along the love :-)

jeff
Jason Merrill - May 9, 2011, 7:23 p.m.
From a quick look it seems that this patch considers bitfields 
following the one we're deliberately touching, but not previous 
bitfields in the same memory location; we need to include those as well. 
  With your struct foo, the bits touched are the same regardless of 
whether we name .a or .b.

Jason

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 173263)
+++ doc/invoke.texi	(working copy)
@@ -8886,6 +8886,11 @@  The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @end table
 @end table
 
Index: machmode.h
===================================================================
--- machmode.h	(revision 173263)
+++ machmode.h	(working copy)
@@ -248,7 +248,9 @@  extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: tree.h
===================================================================
--- tree.h	(revision 173263)
+++ tree.h	(working copy)
@@ -3156,6 +3156,10 @@  struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 173263)
+++ fold-const.c	(working copy)
@@ -3409,7 +3409,7 @@  optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5237,7 +5237,7 @@  fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5302,7 +5302,7 @@  fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 173263)
+++ params.h	(working copy)
@@ -206,4 +206,6 @@  extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 173263)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@  noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,7 @@  noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 173263)
+++ expr.c	(working copy)
@@ -54,6 +54,7 @@  along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -142,7 +143,8 @@  static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2063,7 +2065,7 @@  emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2157,7 +2159,7 @@  copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2794,7 +2796,7 @@  write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3929,6 +3931,7 @@  get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -3989,7 +3992,7 @@  optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
-      str_mode = get_best_mode (bitsize, bitpos,
+      str_mode = get_best_mode (bitsize, bitpos, maxbits,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4098,6 +4101,92 @@  optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the maximum number of
+   bits we are allowed to store into, when storing into the
+   COMPONENT_REF.  We return 0, if there is no restriction.
+
+   EXP is the COMPONENT_REF.
+
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above,
+   we would return 8 maximum bits (because who cares if we store into
+   the padding).  */
+
+
+static unsigned HOST_WIDE_INT
+max_field_size (tree exp, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  HOST_WIDE_INT maxbits = bitsize;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || !DECL_THREAD_VISIBLE_P (TREE_OPERAND (exp, 0)))
+    return 0;
+
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Find the original field within the structure.  */
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    if (fld == field)
+      break;
+  gcc_assert (fld == field);
+
+  /* If this is the last element in the structure, we can touch from
+     BITPOS to the end of the structure (including the padding).  */
+  if (!DECL_CHAIN (fld))
+    return TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - bitpos;
+
+  /* Count contiguous bit fields not separated by a 0-length bit-field.  */
+  for (fld = DECL_CHAIN (fld); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      /* Only count contiguous bit fields, that are not separated by a
+	 zero-length bit field.  */
+      if (!DECL_BIT_FIELD (fld)
+	  || bitsize == 0)
+	{
+	  /* Include the padding up to the next field.  */
+	  maxbits += bitpos - maxbits;
+	  break;
+	}
+
+      maxbits += bitsize;
+    }
+
+  return maxbits;
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4197,6 +4286,9 @@  expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      /* Max consecutive bits we are allowed to touch while storing
+	 into TO.  */
+      HOST_WIDE_INT maxbits = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4206,6 +4298,10 @@  expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD (TREE_OPERAND (to, 1)))
+	maxbits = max_field_size (to, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4286,12 +4382,13 @@  expand_assignment (tree to, tree from, b
 	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
-	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos, maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4312,7 +4409,8 @@  expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos, maxbits,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4337,11 +4435,12 @@  expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos, maxbits, mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4734,7 +4833,7 @@  store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5177,7 +5276,8 @@  store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5751,6 +5851,8 @@  store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   MAXBITS is the number of bits we can store into, 0 if no limit.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5764,6 +5866,7 @@  store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5796,8 +5899,8 @@  store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos, maxbits,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5911,7 +6014,7 @@  store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos, maxbits, mode, temp);
 
       return const0_rtx;
     }
@@ -7323,7 +7426,7 @@  expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 173263)
+++ expr.h	(working copy)
@@ -665,7 +665,8 @@  extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 173263)
+++ stor-layout.c	(working copy)
@@ -2428,6 +2428,9 @@  fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   MAXBITS is the maximum number of bits we are allowed to touch, when
+   referencing this bit field.  MAXBITS is 0 if there is no limit.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2445,7 +2448,8 @@  fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos, unsigned HOST_WIDE_INT maxbits,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
@@ -2484,6 +2488,7 @@  get_best_mode (int bitsize, int bitpos, 
 	  if (bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && (!maxbits || unit <= maxbits)
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 173263)
+++ calls.c	(working copy)
@@ -909,7 +909,7 @@  store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
+	    store_bit_field (reg, bitsize, endian_correction, 0, word_mode,
 			     word);
 	  }
       }
Index: expmed.c
===================================================================
--- expmed.c	(revision 173263)
+++ expmed.c	(working copy)
@@ -47,9 +47,13 @@  struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +337,9 @@  mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT maxbits,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -547,7 +553,9 @@  store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  maxbits,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -718,9 +726,10 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || (maxbits && GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, maxbits, MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,7 +757,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+	  if (store_bit_field_1 (tempreg, bitsize, xbitpos, maxbits,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -761,21 +770,28 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos, maxbits, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   MAXBITS is the maximum number of bits we are allowed to store into,
+   0 if no limit.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT maxbits,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, maxbits,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 
@@ -791,7 +807,9 @@  store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -812,7 +830,7 @@  store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos, maxbits, value);
 	  return;
 	}
     }
@@ -830,10 +848,12 @@  store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      maxbits,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -841,7 +861,7 @@  store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 maxbits, value);
 	  return;
 	}
 
@@ -961,7 +981,9 @@  store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1076,7 +1098,7 @@  store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, maxbits, part);
       bitsdone += thissize;
     }
 }
@@ -1520,7 +1542,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1646,7 +1668,7 @@  extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 173263)
+++ Makefile.in	(working copy)
@@ -2916,7 +2916,7 @@  expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    typeclass.h hard-reg-set.h toplev.h $(DIAGNOSTIC_CORE_H) hard-reg-set.h $(EXCEPT_H) \
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
-   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
+   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) $(PARAMS_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 173263)
+++ stmt.c	(working copy)
@@ -1758,7 +1758,7 @@  expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 173263)
+++ params.def	(working copy)
@@ -884,6 +884,13 @@  DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk",
           2, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables: