Patchwork [C++0x] contiguous bitfields race implementation

login
register
mail settings
Submitter Aldy Hernandez
Date Aug. 5, 2011, 5:25 p.m.
Message ID <4E3C2773.30502@redhat.com>
Download mbox | patch
Permalink /patch/108700/
State New
Headers show

Comments

Aldy Hernandez - Aug. 5, 2011, 5:25 p.m.
Alright, I'm back and bearing patches.  Firmly ready for the crucifixion 
you will likely submit me to. :)

I've pretty much rewritten everything, taking into account all your 
suggestions, and adding a handful of tests for corner cases we will now 
handle correctly.

It seems the minimum needed is to calculate the byte offset of the start 
of the bit region, and the length of the bit region.  (Notice I say BYTE 
offset, as the start of any bit region will happily coincide with a byte 
boundary).  These will of course be adjusted as various parts of the 
bitfield infrastructure adjust offsets and memory addresses throughout.

First, it's not as easy as calling get_inner_reference() only once as 
you've suggested.  The only way to determine the padding at the end of a 
field is getting the bit position of the field following the field in 
question (or the size of the direct parent structure in the case where 
the field in question is the last field in the structure).  So we need 
two calls to get_inner_reference for the general case.  Which is at 
least better than my original call to get_inner_reference() for every field.

I have clarified the comments and made it clear what the offsets are 
relative to.

I am now handling large offsets that may appear as a tree OFFSET from 
get_inner_reference, and have added a test for one such corner case, 
including nested structures with head padding as you suggested.  I am 
still unsure that a variable length offset can happen before a bit field 
region.  So currently we assert that the final offset is host integer 
representable.  If you have a testcase that invalidates my assumption, I 
will gladly add a test and fix the code.

Honestly, the code isn't pretty, but neither is the rest of the bit 
field machinery.  I tried to make due, but I'll gladly take suggestions 
that are not in the form of "the entire bit field code needs to be 
rewritten" :-).

To aid in reviewing, the crux of everything is in the rewritten 
get_bit_range() and the first block of store_bit_field().  Everything 
else is mostly noise.  I have attached all of get_bit_range() as a 
separate attachment to aid in reviewing, since that's the main engine, 
and it has been largely rewritten.

This pacth handles all the testcases I could come up with, mostly 
inspired by your suggestions.  Eventually I would like to replace these 
target specific tests with target-agnostic tests using the gdb simulated 
thread test harness in the cxx-mem-model branch.

Finally, you had mentioned possible problems with tail padding in C++, 
and suggested I use DECL_SIZE instead of calculating the padding using 
the size of direct parent structure.  DECL_SIZE doesn't include padding, 
so I'm open to suggestions.

Fire away, but please be kind :).
* machmode.h (get_best_mode): Remove 2 arguments.
	* fold-const.c (optimize_bit_field_compare): Same.
	(fold_truthop): Same.
	* expr.c (store_field): Change argument types in prototype.
	(emit_group_store): Change argument types to store_bit_field call.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Change argument types.
	Change arguments to get_best_mode.
	(get_bit_range): Rewrite.
	(expand_assignment): Adjust new call to get_bit_range.
	Adjust bitregion_offset when to_rtx is changed.
	Adjust calls to store_field with new argument types.
	(store_field): New argument types.
	Adjust calls to store_bit_field with new arguments.
	* expr.h (store_bit_field): Change argument types.
	* stor-layout.c (get_best_mode): Remove use of bitregion* arguments.
	* expmed.c (store_bit_field_1): Change argument types.
	Do not calculate maxbits.
	Adjust bitregion_maxbits if offset changes.
	(store_bit_field): Change argument types.
	Adjust address taking into account bitregion_offset.
	(store_fixed_bit_field): Change argument types.
	Do not calculate maxbits.
	(store_split_bit_field): Change argument types.
	(extract_bit_field_1): Adjust arguments to get_best_mode.
	(extract_fixed_bit_field): Same.
/* In the C++ memory model, consecutive bit fields in a structure are
   considered one memory location.

   Given a COMPONENT_REF, this function calculates the byte offset of
   the beginning of the memory location containing bit field being
   referenced.  The byte offset is returned in *OFFSET and is the byte
   offset from the beginning of the containing object (INNERDECL).

   The largest mode that can be used to write into the bit field will
   be returned in *LARGEST_MODE.

   For example, in the following structure, the bit region starts in
   byte 4.  In an architecture where the size of BITS gets padded to
   32-bits, SImode will be returned in *LARGEST_MODE.

     struct bits {
       int some_padding;
       struct {
         volatile char bitfield :1;
       } bits;
       char b;
     };

   EXP is the COMPONENT_REF.

   Examples.

   While storing into FOO.A here...

      struct {
        BIT 0:
          unsigned int a : 4;
	  unsigned int b : 1;
	BIT 8:
	  unsigned char c;
	  unsigned int d : 6;
      } foo;

   ...we are not allowed to store past <b>, so for the layout above,
   *OFFSET will be byte 0, and *LARGEST_MODE will be QImode.

   Here we have 3 distinct memory locations because of the zero-sized
   bit-field separating the bits:
   
     struct bits
     {
       char a;
       int b:7;
       int :0;
       int c:7;
     } foo;

   Here we also have 3 distinct memory locations because
   structure/union boundaries will separate contiguous bit-field
   sequences:

     struct {
       char a:3;
       struct { char b:4; } x;
       char c:5;
     } foo;  */

static void
get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
{
  tree field, record_type, fld;
  bool found_field = false;
  bool prev_field_is_bitfield;
  tree start_offset, end_offset, maxbits_tree;
  tree start_bitpos_direct_parent = NULL_TREE;
  HOST_WIDE_INT start_bitpos, end_bitpos;
  HOST_WIDE_INT cumulative_bitsize = 0;

  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);

  /* Bit field we're storing into.  */
  field = TREE_OPERAND (exp, 1);
  record_type = DECL_FIELD_CONTEXT (field);

  /* Count the contiguous bitfields for the memory location that
     contains FIELD.  */
  start_offset = size_zero_node;
  start_bitpos = 0;
  prev_field_is_bitfield = false;
  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;

      if (field == fld)
	found_field = true;

      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && (!host_integerp (DECL_SIZE (fld), 1)
	      || tree_low_cst (DECL_SIZE (fld), 1) > 0))
	{
	  /* Start of a new bit region.  */
	  if (prev_field_is_bitfield == false)
	    {
	      HOST_WIDE_INT bitsize;
	      enum machine_mode mode;
	      int unsignedp, volatilep;

	      /* Save starting bitpos and offset.  */
	      get_inner_reference (build3 (COMPONENT_REF,
					   TREE_TYPE (exp),
					   TREE_OPERAND (exp, 0),
					   fld, NULL_TREE),
				   &bitsize, &start_bitpos, &start_offset,
				   &mode, &unsignedp, &volatilep, true);
	      /* Save the bit offset of the current structure.  */
	      start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (fld);
	      prev_field_is_bitfield = true;
	      cumulative_bitsize = 0;
	    }

	  cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);

	  /* Short-circuit out if we have the max bits allowed.  */
	  /* ?? Is this even worth it.  ?? */
	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
	    {
	      *maxbits = MAX_FIXED_MODE_SIZE;
	      /* Calculate byte offset to the beginning of the bit region.  */
	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
				     start_offset,
				     build_int_cst (integer_type_node,
						    start_bitpos / BITS_PER_UNIT));
	      return;
	    }
	}
      else
	{
	  prev_field_is_bitfield = false;
	  if (found_field)
	    break;
	}
    }
  gcc_assert (found_field);

  /* Calculate byte offset to the beginning of the bit region.  */
  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
  if (!start_offset)
    start_offset = size_zero_node;
  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
			 start_offset,
			 build_int_cst (integer_type_node,
					start_bitpos / BITS_PER_UNIT));
  if (fld)
    {
      HOST_WIDE_INT bitsize;
      enum machine_mode mode;
      int unsignedp, volatilep;

      /* We found the end of the bit field sequence.  Include the
	 padding up to the next field.  */

      /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &bitsize, &end_bitpos, &end_offset,
			   &mode, &unsignedp, &volatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      /* ?? Can we get a variable-lengthened offset here ?? */
      gcc_assert (host_integerp (maxbits_tree, 1));
      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
    }
  else
    {
      /* If this is the last element in the structure, include the padding
	 at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
    }
}
Richard Guenther - Aug. 9, 2011, 10:13 a.m.
On Fri, Aug 5, 2011 at 7:25 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> Alright, I'm back and bearing patches.  Firmly ready for the crucifixion you
> will likely submit me to. :)
>
> I've pretty much rewritten everything, taking into account all your
> suggestions, and adding a handful of tests for corner cases we will now
> handle correctly.
>
> It seems the minimum needed is to calculate the byte offset of the start of
> the bit region, and the length of the bit region.  (Notice I say BYTE
> offset, as the start of any bit region will happily coincide with a byte
> boundary).  These will of course be adjusted as various parts of the
> bitfield infrastructure adjust offsets and memory addresses throughout.
>
> First, it's not as easy as calling get_inner_reference() only once as you've
> suggested.  The only way to determine the padding at the end of a field is
> getting the bit position of the field following the field in question (or
> the size of the direct parent structure in the case where the field in
> question is the last field in the structure).  So we need two calls to
> get_inner_reference for the general case.  Which is at least better than my
> original call to get_inner_reference() for every field.
>
> I have clarified the comments and made it clear what the offsets are
> relative to.
>
> I am now handling large offsets that may appear as a tree OFFSET from
> get_inner_reference, and have added a test for one such corner case,
> including nested structures with head padding as you suggested.  I am still
> unsure that a variable length offset can happen before a bit field region.
>  So currently we assert that the final offset is host integer representable.
>  If you have a testcase that invalidates my assumption, I will gladly add a
> test and fix the code.
>
> Honestly, the code isn't pretty, but neither is the rest of the bit field
> machinery.  I tried to make due, but I'll gladly take suggestions that are
> not in the form of "the entire bit field code needs to be rewritten" :-).
>
> To aid in reviewing, the crux of everything is in the rewritten
> get_bit_range() and the first block of store_bit_field().  Everything else
> is mostly noise.  I have attached all of get_bit_range() as a separate
> attachment to aid in reviewing, since that's the main engine, and it has
> been largely rewritten.
>
> This pacth handles all the testcases I could come up with, mostly inspired
> by your suggestions.  Eventually I would like to replace these target
> specific tests with target-agnostic tests using the gdb simulated thread
> test harness in the cxx-mem-model branch.
>
> Finally, you had mentioned possible problems with tail padding in C++, and
> suggested I use DECL_SIZE instead of calculating the padding using the size
> of direct parent structure.  DECL_SIZE doesn't include padding, so I'm open
> to suggestions.
>
> Fire away, but please be kind :).

Just reading and commenting top-down on the new get_bit_range function.

      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && (!host_integerp (DECL_SIZE (fld), 1)
	      || tree_low_cst (DECL_SIZE (fld), 1) > 0))

DECL_SIZE should always be host_integerp for bitfields.

	      /* Save starting bitpos and offset.  */
	      get_inner_reference (build3 (COMPONENT_REF,
					   TREE_TYPE (exp),
					   TREE_OPERAND (exp, 0),
					   fld, NULL_TREE),
				   &bitsize, &start_bitpos, &start_offset,
				   &mode, &unsignedp, &volatilep, true);

ok, so now you do this only for the first field in a bitfield group.  But you
do it for _all_ bitfield groups in a struct, not only for the interesting one.

May I suggest to split the loop into two, first searching the first field
in the bitfield group that contains fld and then in a separate loop computing
the bitwidth?

Backing up, considering one of my earlier questions.  What is *offset
supposed to be relative to?  The docs say sth like "relative to INNERDECL",
but the code doesn't contain a reference to INNERDECL anymore.

I think if the offset is really supposed to be relative to INNERDECL then
you should return a split offset, similar to get_inner_reference itself.
Thus, return a byte tree offset plus a HWI bit offset and maxbits
(that HWI bit offset is the offset to the start of the bitfield group, right?
Not the offset of the field that is referenced?)

It really feels like you should do something like

  /* Get the offset to our parent structure.  */
  get_inner_reference (TREE_OPERAND (exp, 0), &offset, &bit_offset....);

  for (fld = TYPE_FIELDS (...) ...)
    /* Search for the starting field of the bitfield group of
TREE_OPERAND (exp, 1) */

  offset += DECL_FIELD_OFFSET (first_field_of_group);
  bit_offset += DECL_FIELD_BIT_OFFSET (first_field_of_group);
  (well, basically copy what get_inner_reference would do here)

  for (...)
    accumulate bit-offsets of the group (mind they'll eventually wrap
    when hitting DECL_OFFSET_ALIGN) to compute maxbits
    (that also always will fit in a HWI)

Now we come to that padding thing.  What's the C++ memory model
semantic for re-used tail padding?  Consider

  struct A
  {
     int i;
     bool a:1;
  }
  struct B : public A
  {
     bool b:1;
  }

The tail-padding of A is 3 bytes that may be used by b.  Now, is
accessing a allowed to race with accessing b?  Then the group for
a may include the 3 bytes tail padding.  If not, then it may not
(in which case using DECL_SIZE would be appropriate).

There is too much get_inner_reference and tree folding stuff in this
patch (which makes it expensive given that the algorithm is still
inherently quadratic).  You can rely on the bitfield group advancing
by integer-cst bits (but the start offset may be non-constant, so
may the size of the underlying record).

Now seeing all this - and considering that this is purely C++ frontend
semantics.  Why can't the C++ frontend itself constrain accesses
according to the required semantics?  It could simply create
BIT_FIELD_REF <MEM_REF <&containing_record,
byte-offset-to-start-of-group>, bit-size, bit-offset> for all bitfield
references (with a proper
type for the MEM_REF, specifying the size of the group).  That would
also avoid issues during tree optimization and would at least allow
optimizing the bitfield accesses according to the desired C++ semantics.

Richard.
Aldy Hernandez - Aug. 9, 2011, 6:39 p.m.
> ok, so now you do this only for the first field in a bitfield group.  But you
> do it for _all_ bitfield groups in a struct, not only for the interesting one.
>
> May I suggest to split the loop into two, first searching the first field
> in the bitfield group that contains fld and then in a separate loop computing
> the bitwidth?

Excellent idea.  Done!  Now there are at most two calls to 
get_inner_reference, and in many cases, only one.

> Backing up, considering one of my earlier questions.  What is *offset
> supposed to be relative to?  The docs say sth like "relative to INNERDECL",
> but the code doesn't contain a reference to INNERDECL anymore.

Sorry, I see your confusion.  The comments at the top were completely 
out of date.  I have simplified and rewritten them accordingly.  I am 
attaching get_bit_range() with these and other changes you suggested. 
See if it makes sense now.

> Now we come to that padding thing.  What's the C++ memory model
> semantic for re-used tail padding?  Consider

Andrew addressed this elsewhere.

> There is too much get_inner_reference and tree folding stuff in this
> patch (which makes it expensive given that the algorithm is still
> inherently quadratic).  You can rely on the bitfield group advancing
> by integer-cst bits (but the start offset may be non-constant, so
> may the size of the underlying record).

Now there are only two tree folding calls (apart from 
get_inner_reference), and the common case has very simple arithmetic 
tuples.  I see no clear way of removing the last call to 
get_inner_reference(), as the padding after the field can only be 
calculated by calling get_inner_reference() on the subsequent field.

> Now seeing all this - and considering that this is purely C++ frontend
> semantics.  Why can't the C++ frontend itself constrain accesses
> according to the required semantics?  It could simply create
> BIT_FIELD_REF<MEM_REF<&containing_record,
> byte-offset-to-start-of-group>, bit-size, bit-offset>  for all bitfield
> references (with a proper
> type for the MEM_REF, specifying the size of the group).  That would
> also avoid issues during tree optimization and would at least allow
> optimizing the bitfield accesses according to the desired C++ semantics.

Andrew addressed this as well.  Could you respond to his email if you 
think it is unsatisfactory?

a
/* In the C++ memory model, consecutive non-zero bit fields in a
   structure are considered one memory location.

   Given a COMPONENT_REF, this function calculates the byte offset
   from the containing object to the start of the contiguous bit
   region containing the field in question.  This byte offset is
   returned in *OFFSET.

   The maximum number of bits that can be addressed while storing into
   the COMPONENT_REF is returned in *MAXBITS.  This number is the
   number of bits in the contiguous bit region, up to a maximum of
   MAX_FIXED_MODE_SIZE.  */

static void
get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
{
  tree field, record_type, fld;
  bool prev_field_is_bitfield;
  tree start_offset;
  tree start_bitpos_direct_parent = NULL_TREE;
  HOST_WIDE_INT start_bitpos;
  HOST_WIDE_INT cumulative_bitsize = 0;
  /* First field of the bitfield group containing the bitfield we are
     referencing.  */
  tree bitregion_start;

  HOST_WIDE_INT tbitsize;
  enum machine_mode tmode;
  int tunsignedp, tvolatilep;
  bool found;

  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);

  /* Bit field we're storing into.  */
  field = TREE_OPERAND (exp, 1);
  record_type = DECL_FIELD_CONTEXT (field);

  /* Find the bitfield group containing the field in question, and set
     BITREGION_START to the start of the group.  */
  prev_field_is_bitfield = false;
  bitregion_start = NULL_TREE;
  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;
      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)
	{
	  if (!prev_field_is_bitfield)
	    {
	      bitregion_start = fld;
	      prev_field_is_bitfield = true;
	    }
	}
      else
	prev_field_is_bitfield = false;
      if (fld == field)
	break;
    }
  gcc_assert (bitregion_start);
  gcc_assert (fld);

  /* Save the starting position of the bitregion.  */
  get_inner_reference (build3 (COMPONENT_REF,
			       TREE_TYPE (exp),
			       TREE_OPERAND (exp, 0),
			       bitregion_start, NULL_TREE),
		       &tbitsize, &start_bitpos, &start_offset,
		       &tmode, &tunsignedp, &tvolatilep, true);
  if (!start_offset)
    start_offset = size_zero_node;
  /* Calculate byte offset to the beginning of the bit region.  */
  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
			 start_offset,
			 build_int_cst (integer_type_node,
					start_bitpos / BITS_PER_UNIT));
  /* Save the bit offset of the current structure.  */
  start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (bitregion_start);

  /* Count the bitsize of the bitregion containing the field in question.  */
  found = false;
  cumulative_bitsize = 0;
  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;
      if (fld == field)
	found = true;

      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)
	{
	  cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);

	  /* Short-circuit out if we have the max bits allowed.  */
	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
	    {
	      *maxbits = MAX_FIXED_MODE_SIZE;
	      /* Calculate byte offset to the beginning of the bit region.  */
	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
				     start_offset,
				     build_int_cst (integer_type_node,
						    start_bitpos / BITS_PER_UNIT));
	      return;
	    }
	}
      else if (found)
	break;
    }

  /* If we found the end of the bit field sequence, include the
     padding up to the next field...  */
  if (fld)
    {
      tree end_offset, maxbits_tree;
      HOST_WIDE_INT end_bitpos;

      /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &tbitsize, &end_bitpos, &end_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      /* ?? Can we get a variable-lengthened offset here ?? */
      gcc_assert (host_integerp (maxbits_tree, 1));
      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
    }
  /* ...otherwise, this is the last element in the structure.  */
  else
    {
      /* Include the padding at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
      if (*maxbits > MAX_FIXED_MODE_SIZE)
	*maxbits = MAX_FIXED_MODE_SIZE;
    }
}
Richard Guenther - Aug. 10, 2011, 11:17 a.m.
On Tue, Aug 9, 2011 at 8:39 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> ok, so now you do this only for the first field in a bitfield group.  But
>> you
>> do it for _all_ bitfield groups in a struct, not only for the interesting
>> one.
>>
>> May I suggest to split the loop into two, first searching the first field
>> in the bitfield group that contains fld and then in a separate loop
>> computing
>> the bitwidth?
>
> Excellent idea.  Done!  Now there are at most two calls to
> get_inner_reference, and in many cases, only one.
>
>> Backing up, considering one of my earlier questions.  What is *offset
>> supposed to be relative to?  The docs say sth like "relative to
>> INNERDECL",
>> but the code doesn't contain a reference to INNERDECL anymore.
>
> Sorry, I see your confusion.  The comments at the top were completely out of
> date.  I have simplified and rewritten them accordingly.  I am attaching
> get_bit_range() with these and other changes you suggested. See if it makes
> sense now.
>
>> Now we come to that padding thing.  What's the C++ memory model
>> semantic for re-used tail padding?  Consider
>
> Andrew addressed this elsewhere.
>
>> There is too much get_inner_reference and tree folding stuff in this
>> patch (which makes it expensive given that the algorithm is still
>> inherently quadratic).  You can rely on the bitfield group advancing
>> by integer-cst bits (but the start offset may be non-constant, so
>> may the size of the underlying record).
>
> Now there are only two tree folding calls (apart from get_inner_reference),
> and the common case has very simple arithmetic tuples.  I see no clear way
> of removing the last call to get_inner_reference(), as the padding after the
> field can only be calculated by calling get_inner_reference() on the
> subsequent field.
>
>> Now seeing all this - and considering that this is purely C++ frontend
>> semantics.  Why can't the C++ frontend itself constrain accesses
>> according to the required semantics?  It could simply create
>> BIT_FIELD_REF<MEM_REF<&containing_record,
>> byte-offset-to-start-of-group>, bit-size, bit-offset>  for all bitfield
>> references (with a proper
>> type for the MEM_REF, specifying the size of the group).  That would
>> also avoid issues during tree optimization and would at least allow
>> optimizing the bitfield accesses according to the desired C++ semantics.
>
> Andrew addressed this as well.  Could you respond to his email if you think
> it is unsatisfactory?

Some comments.

      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)

I think we can check bitsize != 0, thus

&& !integer_zerop (DECL_SIZE (fld))

instead.  You don't break groups here with MAX_FIXED_MODE_SIZE, so
I don't think it's ok to do that in the 2nd loop

	  /* Short-circuit out if we have the max bits allowed.  */
	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
	    {
	      *maxbits = MAX_FIXED_MODE_SIZE;
	      /* Calculate byte offset to the beginning of the bit region.  */
	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
				     start_offset,
				     build_int_cst (integer_type_node,
						    start_bitpos / BITS_PER_UNIT));
	      return;

apart from the *offset calculation being redundant, *offset + maxbits
may not include the referenced field.  How do you plan to find
an "optimal" window for such access? (*)

  /* Count the bitsize of the bitregion containing the field in question.  */
  found = false;
  cumulative_bitsize = 0;
  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;
      if (fld == field)
	found = true;

      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)
	{
...
         }
      else if (found)
	break;

should probably be

      if (!DECL_BIT_FIELD_TYPE (fld)
         || integer_zerop (DECL_SIZE (fld)))
        break;

we know that we'll eventually find field.

  /* If we found the end of the bit field sequence, include the
     padding up to the next field...  */
  if (fld)
    {

could be a non-FIELD_DECL, you have to skip those first.

     /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &tbitsize, &end_bitpos, &end_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      /* ?? Can we get a variable-lengthened offset here ?? */
      gcc_assert (host_integerp (maxbits_tree, 1));
      *maxbits = TREE_INT_CST_LOW (maxbits_tree);

I think you may end up enlarging maxbits to more than
MAX_FIXED_MODE_SIZE here.  What you instead should do (I think)
is sth along

   *maxbits = MIN (MAX_FIXED_MODE_SIZE,
                            *maxbits
                            + operand_equal_p (DECL_FIELD_OFFSET
(fld), DECL_FIELD_OFFSET (field)) ? DECL_FIELD_BIT_OFFSET (fld) -
DECL_FIELD_BIT_OFFSET (field) : DECL_OFFSET_ALIGN (field) -
DECL_FIELD_BIT_OFFSET (field));

Note that another complication comes to my mind now - the offset
field of a COMPONENT_REF is used to specify a variable offset
and has to be used, if present, instead of DECL_FIELD_OFFSET.
Thus your building of COMPONENT_REFs to then pass them to
get_inner_reference is broken.  As you are in generic code and not
in the C++ frontend I believe you have to properly handle this case
(may I suggest to, at the start of the function, simply return a
minimum byte-aligned blob for the case that there is a variable
offset to the bitfield?)

  /* ...otherwise, this is the last element in the structure.  */
  else
    {
      /* Include the padding at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
      if (*maxbits > MAX_FIXED_MODE_SIZE)
	*maxbits = MAX_FIXED_MODE_SIZE;
    }

with Andrews answer this is invalid.  You can (and should) at most do

  else
    *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);

thus, round *maxbits up to the next byte.

There is still the general issue of packed bitfields which will probably
make the issue of the computed group not covering all of field more
prominent (esp. if you limit to MAX_FIXED_MODE_SIZE - consider
struct __attribute__((packed)) { long long : 1; long long a : 64; char
c; } where
a does not fit in a DImode mem but crosses it.  Why constrain
*maxbits to MAX_FIXED_MODE_SIZE at all?  Shoudln't the *offset,
*maxbits pair just constrain what the caller does, not force it to actually
use an access covering that full range (does it?)?

Richard.

(*) For bitfield lowering we discussed this a bit and the solution would be
to mirror what place_field does, fill groups until the space for the mode of
the sofar largest field is filled (doesn't work for packed bitfields of course).

> a
>
>
>

Patch

Index: machmode.h
===================================================================
--- machmode.h	(revision 176891)
+++ machmode.h	(working copy)
@@ -249,8 +249,6 @@  extern enum machine_mode mode_for_vector
 /* Find the best mode to use to access a bit field.  */
 
 extern enum machine_mode get_best_mode (int, int,
-					unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT,
 					unsigned int,
 					enum machine_mode, int);
 
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176891)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@  optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
+    nmode = get_best_mode (lbitsize, lbitpos,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5221,7 +5221,7 @@  fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5286,7 +5286,7 @@  fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: testsuite/c-c++-common/cxxbitfields-6.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int :0;
+  volatile int c:7;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-8.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O --param allow-store-data-races=0" } */
+
+struct bits {
+  /* Make sure the bit position of the bitfield is larger than what
+     can be represented in an unsigned HOST_WIDE_INT, to force
+     get_inner_reference() to return something in POFFSET.  */
+      
+  struct {
+    int some_padding[1<<30];
+    char more_padding;
+  } pad[1<<29];
+
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+struct bits *p;
+
+/* Test that the store into <bitfield> is not done with something
+   wider than a byte move.  */
+void foo()
+{
+  p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-7.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  int some_padding;
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+/* Store into <bitfield> should not clobber <b>.  */
+void update(struct bits *p)
+{
+    p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: expr.c
===================================================================
--- expr.c	(revision 176891)
+++ expr.c	(working copy)
@@ -145,7 +145,7 @@  static void store_constructor_field (rtx
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			tree, HOST_WIDE_INT,
 			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
@@ -2077,7 +2077,8 @@  emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 integer_zero_node, MAX_FIXED_MODE_SIZE,
+			 mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2171,7 +2172,8 @@  copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD,
+		       integer_zero_node, MAX_FIXED_MODE_SIZE, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2808,7 +2810,8 @@  write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
+		   integer_zero_node, MAX_FIXED_MODE_SIZE, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3943,8 +3946,8 @@  get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
-				 unsigned HOST_WIDE_INT bitregion_start,
-				 unsigned HOST_WIDE_INT bitregion_end,
+				 tree bitregion_offset ATTRIBUTE_UNUSED,
+				 HOST_WIDE_INT bitregion_maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4005,8 +4008,9 @@  optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
+      if (bitregion_maxbits < GET_MODE_BITSIZE (str_mode))
+	str_mode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
       str_mode = get_best_mode (bitsize, bitpos,
-				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4118,18 +4122,31 @@  optimize_bitfield_assignment_op (unsigne
 /* In the C++ memory model, consecutive bit fields in a structure are
    considered one memory location.
 
-   Given a COMPONENT_REF, this function returns the bit range of
-   consecutive bits in which this COMPONENT_REF belongs in.  The
-   values are returned in *BITSTART and *BITEND.  If either the C++
-   memory model is not activated, or this memory access is not thread
-   visible, 0 is returned in *BITSTART and *BITEND.
+   Given a COMPONENT_REF, this function calculates the byte offset of
+   the beginning of the memory location containing bit field being
+   referenced.  The byte offset is returned in *OFFSET and is the byte
+   offset from the beginning of the containing object (INNERDECL).
+
+   The largest mode that can be used to write into the bit field will
+   be returned in *LARGEST_MODE.
+
+   For example, in the following structure, the bit region starts in
+   byte 4.  In an architecture where the size of BITS gets padded to
+   32-bits, SImode will be returned in *LARGEST_MODE.
+
+     struct bits {
+       int some_padding;
+       struct {
+         volatile char bitfield :1;
+       } bits;
+       char b;
+     };
 
    EXP is the COMPONENT_REF.
-   INNERDECL is the actual object being referenced.
-   BITPOS is the position in bits where the bit starts within the structure.
-   BITSIZE is size in bits of the field being referenced in EXP.
 
-   For example, while storing into FOO.A here...
+   Examples.
+
+   While storing into FOO.A here...
 
       struct {
         BIT 0:
@@ -4140,67 +4157,99 @@  optimize_bitfield_assignment_op (unsigne
 	  unsigned int d : 6;
       } foo;
 
-   ...we are not allowed to store past <b>, so for the layout above, a
-   range of 0..7 (because no one cares if we store into the
-   padding).  */
+   ...we are not allowed to store past <b>, so for the layout above,
+   *OFFSET will be byte 0, and *LARGEST_MODE will be QImode.
+
+   Here we have 3 distinct memory locations because of the zero-sized
+   bit-field separating the bits:
+   
+     struct bits
+     {
+       char a;
+       int b:7;
+       int :0;
+       int c:7;
+     } foo;
+
+   Here we also have 3 distinct memory locations because
+   structure/union boundaries will separate contiguous bit-field
+   sequences:
+
+     struct {
+       char a:3;
+       struct { char b:4; } x;
+       char c:5;
+     } foo;  */
 
 static void
-get_bit_range (unsigned HOST_WIDE_INT *bitstart,
-	       unsigned HOST_WIDE_INT *bitend,
-	       tree exp, tree innerdecl,
-	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
 {
   tree field, record_type, fld;
   bool found_field = false;
   bool prev_field_is_bitfield;
+  tree start_offset, end_offset, maxbits_tree;
+  tree start_bitpos_direct_parent = NULL_TREE;
+  HOST_WIDE_INT start_bitpos, end_bitpos;
+  HOST_WIDE_INT cumulative_bitsize = 0;
 
   gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
 
-  /* If other threads can't see this value, no need to restrict stores.  */
-  if (ALLOW_STORE_DATA_RACES
-      || ((TREE_CODE (innerdecl) == MEM_REF
-	   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
-	  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
-      || (DECL_P (innerdecl)
-	  && (DECL_THREAD_LOCAL_P (innerdecl)
-	      || !TREE_STATIC (innerdecl))))
-    {
-      *bitstart = *bitend = 0;
-      return;
-    }
-
   /* Bit field we're storing into.  */
   field = TREE_OPERAND (exp, 1);
   record_type = DECL_FIELD_CONTEXT (field);
 
   /* Count the contiguous bitfields for the memory location that
      contains FIELD.  */
-  *bitstart = 0;
-  prev_field_is_bitfield = true;
+  start_offset = size_zero_node;
+  start_bitpos = 0;
+  prev_field_is_bitfield = false;
   for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
     {
-      tree t, offset;
-      enum machine_mode mode;
-      int unsignedp, volatilep;
-
       if (TREE_CODE (fld) != FIELD_DECL)
 	continue;
 
-      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
-		  unshare_expr (TREE_OPERAND (exp, 0)),
-		  fld, NULL_TREE);
-      get_inner_reference (t, &bitsize, &bitpos, &offset,
-			   &mode, &unsignedp, &volatilep, true);
-
       if (field == fld)
 	found_field = true;
 
-      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+      /* If we have a bit-field with a bitsize > 0... */
+      if (DECL_BIT_FIELD_TYPE (fld)
+	  && (!host_integerp (DECL_SIZE (fld), 1)
+	      || tree_low_cst (DECL_SIZE (fld), 1) > 0))
 	{
+	  /* Start of a new bit region.  */
 	  if (prev_field_is_bitfield == false)
 	    {
-	      *bitstart = bitpos;
+	      HOST_WIDE_INT bitsize;
+	      enum machine_mode mode;
+	      int unsignedp, volatilep;
+
+	      /* Save starting bitpos and offset.  */
+	      get_inner_reference (build3 (COMPONENT_REF,
+					   TREE_TYPE (exp),
+					   TREE_OPERAND (exp, 0),
+					   fld, NULL_TREE),
+				   &bitsize, &start_bitpos, &start_offset,
+				   &mode, &unsignedp, &volatilep, true);
+	      /* Save the bit offset of the current structure.  */
+	      start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (fld);
 	      prev_field_is_bitfield = true;
+	      cumulative_bitsize = 0;
+	    }
+
+	  cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);
+
+	  /* Short-circuit out if we have the max bits allowed.  */
+	  /* ?? Is this even worth it.  ?? */
+	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
+	    {
+	      *maxbits = MAX_FIXED_MODE_SIZE;
+	      /* Calculate byte offset to the beginning of the bit region.  */
+	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+				     start_offset,
+				     build_int_cst (integer_type_node,
+						    start_bitpos / BITS_PER_UNIT));
+	      return;
 	    }
 	}
       else
@@ -4212,17 +4261,58 @@  get_bit_range (unsigned HOST_WIDE_INT *b
     }
   gcc_assert (found_field);
 
+  /* Calculate byte offset to the beginning of the bit region.  */
+  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
+  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+  if (!start_offset)
+    start_offset = size_zero_node;
+  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+			 start_offset,
+			 build_int_cst (integer_type_node,
+					start_bitpos / BITS_PER_UNIT));
   if (fld)
     {
+      HOST_WIDE_INT bitsize;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
       /* We found the end of the bit field sequence.  Include the
-	 padding up to the next field and be done.  */
-      *bitend = bitpos - 1;
+	 padding up to the next field.  */
+
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &bitsize, &end_bitpos, &end_offset,
+			   &mode, &unsignedp, &volatilep, true);
+      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);
+
+      if (end_offset)
+	{
+	  tree type = TREE_TYPE (end_offset), end;
+
+	  /* Calculate byte offset to the end of the bit region.  */
+	  end = fold_build2 (PLUS_EXPR, type,
+			     end_offset,
+			     build_int_cst (type,
+					    end_bitpos / BITS_PER_UNIT));
+	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
+	}
+      else
+	maxbits_tree = build_int_cst (integer_type_node,
+				      end_bitpos - start_bitpos);
+
+      /* ?? Can we get a variable-lengthened offset here ?? */
+      gcc_assert (host_integerp (maxbits_tree, 1));
+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
     }
   else
     {
       /* If this is the last element in the structure, include the padding
 	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
     }
 }
 
@@ -4324,8 +4414,8 @@  expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
-      unsigned HOST_WIDE_INT bitregion_start = 0;
-      unsigned HOST_WIDE_INT bitregion_end = 0;
+      tree bitregion_offset = size_zero_node;
+      HOST_WIDE_INT bitregion_maxbits = MAX_FIXED_MODE_SIZE;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4337,8 +4427,23 @@  expand_assignment (tree to, tree from, b
 
       if (TREE_CODE (to) == COMPONENT_REF
 	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
-	get_bit_range (&bitregion_start, &bitregion_end,
-		       to, tem, bitpos, bitsize);
+	{
+	  /* If other threads can't see this value, no need to
+	     restrict stores.  */
+	  if (ALLOW_STORE_DATA_RACES
+	      || ((TREE_CODE (tem) == MEM_REF
+		   || TREE_CODE (tem) == TARGET_MEM_REF)
+		  && !ptr_deref_may_alias_global_p (TREE_OPERAND (tem, 0)))
+	      || (DECL_P (tem)
+		  && (DECL_THREAD_LOCAL_P (tem)
+		      || !TREE_STATIC (tem))))
+	    {
+	      bitregion_offset = size_zero_node;
+	      bitregion_maxbits = MAX_FIXED_MODE_SIZE;
+	    }
+	  else
+	    get_bit_range (to, &bitregion_offset, &bitregion_maxbits);
+	}
 
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
@@ -4388,12 +4493,19 @@  expand_assignment (tree to, tree from, b
 	      && MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1))
 	    {
 	      to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
+	      bitregion_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));
 	      bitpos = 0;
 	    }
 
 	  to_rtx = offset_address (to_rtx, offset_rtx,
 				   highest_pow2_factor_for_target (to,
 				   				   offset));
+	  bitregion_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					  bitregion_offset,
+					  offset);
 	}
 
       /* No action is needed if the target is not a memory and the field
@@ -4421,13 +4533,13 @@  expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4450,7 +4562,7 @@  expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
-				    bitregion_start, bitregion_end,
+				    bitregion_offset, bitregion_maxbits,
 				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
@@ -4477,13 +4589,14 @@  expand_assignment (tree to, tree from, b
 	    }
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
-					       bitregion_start, bitregion_end,
+					       bitregion_offset,
+					       bitregion_maxbits,
 					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -5917,10 +6030,10 @@  store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_OFFSET is the byte offset from the beginning of the
+   containing object to the start of the bit region.
+   BITREGION_MAXBITS is the size in bits of the largest mode that can
+   be used to set the bit-field in question.
 
    Always return const0_rtx unless we have something particular to
    return.
@@ -5935,8 +6048,8 @@  store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-	     unsigned HOST_WIDE_INT bitregion_start,
-	     unsigned HOST_WIDE_INT bitregion_end,
+	     tree bitregion_offset,
+	     HOST_WIDE_INT bitregion_maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5970,7 +6083,7 @@  store_field (rtx target, HOST_WIDE_INT b
 	emit_move_insn (object, target);
 
       store_field (blk_object, bitsize, bitpos,
-		   bitregion_start, bitregion_end,
+		   bitregion_offset, bitregion_maxbits,
 		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
@@ -6086,7 +6199,7 @@  store_field (rtx target, HOST_WIDE_INT b
 
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
-		       bitregion_start, bitregion_end,
+		       bitregion_offset, bitregion_maxbits,
 		       mode, temp);
 
       return const0_rtx;
Index: expr.h
===================================================================
--- expr.h	(revision 176891)
+++ expr.h	(working copy)
@@ -666,8 +666,8 @@  mode_for_extraction (enum extraction_pat
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+			     tree,
+			     HOST_WIDE_INT,
 			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176891)
+++ stor-layout.c	(working copy)
@@ -2361,13 +2361,6 @@  fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
-   BITREGION_START is the bit position of the first bit in this
-   sequence of bit fields.  BITREGION_END is the last bit in this
-   sequence.  If these two fields are non-zero, we should restrict the
-   memory access to a maximum sized chunk of
-   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
-   any adjacent non bit-fields.
-
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2386,20 +2379,11 @@  fixup_unsigned_type (tree type)
 
 enum machine_mode
 get_best_mode (int bitsize, int bitpos,
-	       unsigned HOST_WIDE_INT bitregion_start,
-	       unsigned HOST_WIDE_INT bitregion_end,
 	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
-  unsigned HOST_WIDE_INT maxbits;
-
-  /* If unset, no restriction.  */
-  if (!bitregion_end)
-    maxbits = MAX_FIXED_MODE_SIZE;
-  else
-    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2436,7 +2420,6 @@  get_best_mode (int bitsize, int bitpos,
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
-	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: expmed.c
===================================================================
--- expmed.c	(revision 176891)
+++ expmed.c	(working copy)
@@ -48,13 +48,11 @@  struct target_expmed *this_target_expmed
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   tree, HOST_WIDE_INT,
 				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   tree, HOST_WIDE_INT,
 				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
@@ -340,8 +338,8 @@  mode_for_extraction (enum extraction_pat
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum,
-		   unsigned HOST_WIDE_INT bitregion_start,
-		   unsigned HOST_WIDE_INT bitregion_end,
+		   tree bitregion_offset,
+		   HOST_WIDE_INT bitregion_maxbits,
 		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
@@ -558,7 +556,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
 				  bitnum + bit_offset,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  word_mode,
 				  value_word, fallback_p))
 	    {
@@ -722,10 +720,6 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -733,15 +727,18 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
-	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
+	  || GET_MODE_BITSIZE (GET_MODE (op0)) > bitregion_maxbits
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode  (bitsize, bitnum,
-				  bitregion_start, bitregion_end,
-				  MEM_ALIGN (op0),
-				  (op_mode == MAX_MACHINE_MODE
-				   ? VOIDmode : op_mode),
-				  MEM_VOLATILE_P (op0));
+	{
+	  bestmode = (op_mode == MAX_MACHINE_MODE ? VOIDmode : op_mode);
+	  if (bitregion_maxbits < GET_MODE_SIZE (op_mode))
+	    bestmode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
+	  bestmode = get_best_mode  (bitsize, bitnum,
+				     MEM_ALIGN (op0),
+				     bestmode,
+				     MEM_VOLATILE_P (op0));
+	}
       else
 	bestmode = GET_MODE (op0);
 
@@ -767,7 +764,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
-				 bitregion_start, bitregion_end,
+				 bitregion_offset,
+				 bitregion_maxbits - xoffset * BITS_PER_UNIT,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -780,8 +778,9 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
+  bitregion_maxbits -= offset * BITS_PER_UNIT;
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_offset, bitregion_maxbits, value);
   return true;
 }
 
@@ -789,18 +788,17 @@  store_bit_field_1 (rtx str_rtx, unsigned
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_OFFSET is the byte offset STR_RTX to the start of the bit
+   region.  BITREGION_MAXBITS is the number of bits of the largest
+   mode that can be used to set the bit-field in question.
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		 unsigned HOST_WIDE_INT bitnum,
-		 unsigned HOST_WIDE_INT bitregion_start,
-		 unsigned HOST_WIDE_INT bitregion_end,
+		 tree bitregion_offset,
+		 HOST_WIDE_INT bitregion_maxbits,
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
@@ -808,30 +806,23 @@  store_bit_field (rtx str_rtx, unsigned H
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
   if (MEM_P (str_rtx)
-      && bitregion_start > 0)
+      && bitregion_maxbits < MAX_FIXED_MODE_SIZE)
     {
-      enum machine_mode bestmode;
-      enum machine_mode op_mode;
-      unsigned HOST_WIDE_INT offset;
+      HOST_WIDE_INT offset;
 
-      op_mode = mode_for_extraction (EP_insv, 3);
-      if (op_mode == MAX_MACHINE_MODE)
-	op_mode = VOIDmode;
-
-      offset = bitregion_start / BITS_PER_UNIT;
-      bitnum -= bitregion_start;
-      bitregion_end -= bitregion_start;
-      bitregion_start = 0;
-      bestmode = get_best_mode (bitsize, bitnum,
-				bitregion_start, bitregion_end,
-				MEM_ALIGN (str_rtx),
-				op_mode,
-				MEM_VOLATILE_P (str_rtx));
-      str_rtx = adjust_address (str_rtx, bestmode, offset);
+      /* ?? Can we get a variable length offset here ?? */
+      gcc_assert (host_integerp (bitregion_offset, 1));
+      offset = tree_low_cst (bitregion_offset, 1);
+
+      /* Adjust the bit position accordingly.  */
+      bitnum -= offset * BITS_PER_UNIT;
+      bitregion_offset = integer_zero_node;
+      /* Adjust the actual address.  */
+      str_rtx = adjust_address (str_rtx, GET_MODE (str_rtx), offset);
     }
 
   if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
-			  bitregion_start, bitregion_end,
+			  bitregion_offset, bitregion_maxbits,
 			  fieldmode, value, true))
     gcc_unreachable ();
 }
@@ -849,8 +840,8 @@  static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       tree bitregion_offset,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   enum machine_mode mode;
@@ -873,17 +864,14 @@  store_fixed_bit_field (rtx op0, unsigned
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
 	  store_split_bit_field (op0, bitsize, bitpos,
-				 bitregion_start, bitregion_end,
+				 bitregion_offset, bitregion_maxbits,
 				 value);
 	  return;
 	}
     }
   else
     {
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
+      HOST_WIDE_INT maxbits = bitregion_maxbits;
 
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
@@ -901,16 +889,19 @@  store_fixed_bit_field (rtx op0, unsigned
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
-			      bitregion_start, bitregion_end,
-			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	{
+	  if (bitregion_maxbits < GET_MODE_BITSIZE (mode))
+	    mode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
+	  mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+				MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	}
 
       if (mode == VOIDmode)
 	{
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 bitregion_start, bitregion_end, value);
+				 bitregion_offset, bitregion_maxbits, value);
 	  return;
 	}
 
@@ -1031,8 +1022,8 @@  store_fixed_bit_field (rtx op0, unsigned
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       tree bitregion_offset,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   unsigned int unit;
@@ -1148,7 +1139,8 @@  store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, bitregion_start, bitregion_end, part);
+			       thispos, bitregion_offset, bitregion_maxbits,
+			       part);
       bitsdone += thissize;
     }
 }
@@ -1588,7 +1580,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1714,7 +1706,7 @@  extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)