Patchwork reimplement -fstrict-volatile-bitfields v4, part 1/2

login
register
mail settings
Submitter Sandra Loosemore
Date Sept. 28, 2013, 2:19 a.m.
Message ID <52463CA4.7060303@codesourcery.com>
Download mbox | patch
Permalink /patch/278701/
State New
Headers show

Comments

Sandra Loosemore - Sept. 28, 2013, 2:19 a.m.
This patch fixes various -fstrict-volatile-bitfields wrong-code bugs, 
including instances where bitfield values were being quietly truncated 
as well as bugs relating to using the wrong width.  The code changes are 
identical to those in the previous version of this patch series 
(originally posted in June and re-pinged many times since then), but for 
this version I have cleaned up the test cases to remove dependencies on 
header files, as Bernd requested.

-Sandra
Richard Guenther - Oct. 18, 2013, 10:50 a.m.
On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
> This patch fixes various -fstrict-volatile-bitfields wrong-code bugs,
> including instances where bitfield values were being quietly truncated as
> well as bugs relating to using the wrong width.  The code changes are
> identical to those in the previous version of this patch series (originally
> posted in June and re-pinged many times since then), but for this version I
> have cleaned up the test cases to remove dependencies on header files, as
> Bernd requested.

Ok.

Thanks,
Richard.

> -Sandra
>
Sandra Loosemore - Oct. 18, 2013, 4:29 p.m.
On 10/18/2013 04:50 AM, Richard Biener wrote:
> On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
> <sandra@codesourcery.com> wrote:
>> This patch fixes various -fstrict-volatile-bitfields wrong-code bugs,
>> including instances where bitfield values were being quietly truncated as
>> well as bugs relating to using the wrong width.  The code changes are
>> identical to those in the previous version of this patch series (originally
>> posted in June and re-pinged many times since then), but for this version I
>> have cleaned up the test cases to remove dependencies on header files, as
>> Bernd requested.
>
> Ok.

Just to clarify, is this approval unconditional, independent of part 2 
and other patches or changes still under active discussion?

-Sandra
Richard Guenther - Oct. 18, 2013, 4:38 p.m.
Sandra Loosemore <sandra@codesourcery.com> wrote:
>On 10/18/2013 04:50 AM, Richard Biener wrote:
>> On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
>> <sandra@codesourcery.com> wrote:
>>> This patch fixes various -fstrict-volatile-bitfields wrong-code
>bugs,
>>> including instances where bitfield values were being quietly
>truncated as
>>> well as bugs relating to using the wrong width.  The code changes
>are
>>> identical to those in the previous version of this patch series
>(originally
>>> posted in June and re-pinged many times since then), but for this
>version I
>>> have cleaned up the test cases to remove dependencies on header
>files, as
>>> Bernd requested.
>>
>> Ok.
>
>Just to clarify, is this approval unconditional, independent of part 2 
>and other patches or changes still under active discussion?

Yes.

Richard.

>-Sandra
Sandra Loosemore - Oct. 21, 2013, 2:23 a.m.
On 10/18/2013 10:38 AM, Richard Biener wrote:
> Sandra Loosemore <sandra@codesourcery.com> wrote:
>> On 10/18/2013 04:50 AM, Richard Biener wrote:
>>> On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
>>> <sandra@codesourcery.com> wrote:
>>>> This patch fixes various -fstrict-volatile-bitfields wrong-code
>> bugs,
>>>> including instances where bitfield values were being quietly
>> truncated as
>>>> well as bugs relating to using the wrong width.  The code changes
>> are
>>>> identical to those in the previous version of this patch series
>> (originally
>>>> posted in June and re-pinged many times since then), but for this
>> version I
>>>> have cleaned up the test cases to remove dependencies on header
>> files, as
>>>> Bernd requested.
>>>
>>> Ok.
>>
>> Just to clarify, is this approval unconditional, independent of part 2
>> and other patches or changes still under active discussion?
>
> Yes.

Hrmmmm.  After some further testing, I'm afraid this patch is still 
buggy.  :-(

I tried a backport to GCC 4.8 and tested on arm-none-eabi.  On the new 
pr56997-1.c testcase, it got stuck in infinite recursion between 
store_split_bit_field/store_fixed_bit_field and/or 
extract_split_bit_field/extract_fixed_bit_field.  This didn't show up in 
my previous mainline testing.

The difference between 4.8 and mainline head is the alignment of the 
incoming str_rtx passed to store_bit_field/extract_bit_field, due to the 
changes in r199898.  The alignment is 8 bits on mainline, and 32 on 4.8. 
  It seems to me that the bitfield code ought to handle store/extract 
from a more-aligned object and it's probably possible to construct an 
example that fails in this way on mainline as well.

It looks like there are conflicting assumptions between get_best_mode, 
the places that call it in store_fixed_bit_field and 
extract_fixed_bit_field, and the code that actually does the splitting 
-- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather 
than its mode.  In the case where it's failing, get_best_mode is 
deciding it can't do a HImode access without splitting, but then the 
split code is assuming SImode units because of the 32-bit alignment, but 
not actually changing the mode of the rtx to match that.

On top of that, this is one of the cases that strict_volatile_bitfield_p 
checks for and returns false, but the callers of store_bit_field and 
extract_bit_field in expr.c have already fiddled with the mode of the 
incoming rtx assuming that -fstrict-volatile-bitfields does apply.  It 
doesn't get into that infinite recursion if it's compiled with 
-fno-strict-volatile-bitfields instead; in that case the incoming rtx 
has BLKmode, get_best_mode chooses SImode, and it's able to do the 
access without splitting at all.

Anyway....  I tried a couple different bandaids that solved the infinite 
recursion problem but caused regressions elsewhere, and now I'm not sure 
of the right place to fix this.  Given that there is also still ongoing 
discussion about making this a 3-way switch (etc) I am going to hold off 
on committing this patch for now.

-Sandra
Bernd Edlinger - Oct. 22, 2013, 1:43 a.m.
Hi,

On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote:
> Hrmmmm. After some further testing, I'm afraid this patch is still
> buggy. :-(
>
> I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new
> pr56997-1.c testcase, it got stuck in infinite recursion between
> store_split_bit_field/store_fixed_bit_field and/or
> extract_split_bit_field/extract_fixed_bit_field. This didn't show up in
> my previous mainline testing.
>
> The difference between 4.8 and mainline head is the alignment of the
> incoming str_rtx passed to store_bit_field/extract_bit_field, due to the
> changes in r199898. The alignment is 8 bits on mainline, and 32 on 4.8.
> It seems to me that the bitfield code ought to handle store/extract
> from a more-aligned object and it's probably possible to construct an
> example that fails in this way on mainline as well.
>
> It looks like there are conflicting assumptions between get_best_mode,
> the places that call it in store_fixed_bit_field and
> extract_fixed_bit_field, and the code that actually does the splitting
> -- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather
> than its mode. In the case where it's failing, get_best_mode is
> deciding it can't do a HImode access without splitting, but then the
> split code is assuming SImode units because of the 32-bit alignment, but
> not actually changing the mode of the rtx to match that.
>
> On top of that, this is one of the cases that strict_volatile_bitfield_p
> checks for and returns false, but the callers of store_bit_field and
> extract_bit_field in expr.c have already fiddled with the mode of the
> incoming rtx assuming that -fstrict-volatile-bitfields does apply. It
> doesn't get into that infinite recursion if it's compiled with
> -fno-strict-volatile-bitfields instead; in that case the incoming rtx
> has BLKmode, get_best_mode chooses SImode, and it's able to do the
> access without splitting at all.
>
> Anyway.... I tried a couple different bandaids that solved the infinite
> recursion problem but caused regressions elsewhere, and now I'm not sure
> of the right place to fix this. Given that there is also still ongoing
> discussion about making this a 3-way switch (etc) I am going to hold off
> on committing this patch for now.
>
> -Sandra
>

You are right,
if I modify pr56997-1 the patch crashes on trunk:

typedef struct s{
 char Prefix;
 short Type;
}__attribute__((packed,aligned(4))) ss;

please add a test case for this.

This way we have op0 aligned 32 and HI mode selected bitoffset=8
numbits=16.
crashes only when reading this value, because the access tries to use
SImode.

For some reason the alignment seems to be wrong in 4.8.

Thanks
Bernd.
Bernd Edlinger - Oct. 22, 2013, 8:19 a.m.
Well,

one more point where the current patch is probably wrong:

the AAPCS states that for volatile bit-field access:

"For a write operation the read must always occur even if the entire contents of the container will be replaced"

that means 
struct s
{
  volatile int a:32;
} ss;

ss.a=1; //needs to read the value exactly once and write the new value.

currently we just store.

Bernd.
Bernd Edlinger - Oct. 23, 2013, 7:11 a.m.
Hi Richard/Joseph,

I noticed, this test case crashes on arm-eabi already witout the patch.

extern void abort (void);

#define test_type unsigned short
#define MAGIC (unsigned short)0x102u

typedef struct s{
 unsigned char Prefix[1];
 test_type Type;
}__attribute((__packed__,__aligned__(4))) ss;

volatile ss v;
ss g;

void __attribute__((noinline))
foo (test_type u)
{
  v.Type = u;
}

test_type __attribute__((noinline))
bar (void)
{
  return v.Type;
}


However when compiled with -fno-strict-volatile-bitfields it does not crash,
but AFAIK the generated code for foo() violates the C++ memory model:

foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr    r2, .L2
    ldr    r3, [r2]
    bic    r3, r3, #16711680
    bic    r3, r3, #65280
    orr    r3, r3, r0, asl #8
    str    r3, [r2]
    bx    lr


On Intel the generated code uses unaligned access, but is OK for the memory model:

foo:
.LFB0:
    .cfi_startproc
    movw    %di, v+1(%rip)
    ret


Am I right, or is the code OK for the Memory model?

Regards
Bernd.
Richard Guenther - Oct. 23, 2013, 12:37 p.m.
On Wed, Oct 23, 2013 at 9:11 AM, Bernd Edlinger
<bernd.edlinger@hotmail.de> wrote:
> Hi Richard/Joseph,
>
> I noticed, this test case crashes on arm-eabi already witout the patch.
>
> extern void abort (void);
>
> #define test_type unsigned short
> #define MAGIC (unsigned short)0x102u
>
> typedef struct s{
>  unsigned char Prefix[1];
>  test_type Type;
> }__attribute((__packed__,__aligned__(4))) ss;
>
> volatile ss v;
> ss g;
>
> void __attribute__((noinline))
> foo (test_type u)
> {
>   v.Type = u;
> }
>
> test_type __attribute__((noinline))
> bar (void)
> {
>   return v.Type;
> }
>
>
> However when compiled with -fno-strict-volatile-bitfields it does not crash,
> but AFAIK the generated code for foo() violates the C++ memory model:
>
> foo:
>     @ Function supports interworking.
>     @ args = 0, pretend = 0, frame = 0
>     @ frame_needed = 0, uses_anonymous_args = 0
>     @ link register save eliminated.
>     ldr    r2, .L2
>     ldr    r3, [r2]
>     bic    r3, r3, #16711680
>     bic    r3, r3, #65280
>     orr    r3, r3, r0, asl #8
>     str    r3, [r2]
>     bx    lr
>
>
> On Intel the generated code uses unaligned access, but is OK for the memory model:
>
> foo:
> .LFB0:
>     .cfi_startproc
>     movw    %di, v+1(%rip)
>     ret
>
>
> Am I right, or is the code OK for the Memory model?

The C++ memory model says that you may not introduce a data-race
and thus you have to access Type without touching Prefix.

Richard.

> Regards
> Bernd.
Bernd Edlinger - Oct. 23, 2013, 12:53 p.m.
Hi,

On Wed, 23 Oct 2013 14:37:43, Richard Biener wrote:

> The C++ memory model says that you may not introduce a data-race
> and thus you have to access Type without touching Prefix.

Thanks.

Right now I see the following priorities:

1. make -fno-strict-volatile-bitfields respect the C++ memory model
=>  I think, I know how to do that and can prepare a patch for that.
      This patch should fix the recursion problem that Sandra pointed out.

2. make Sandra's -fstrict-volatile-bitfields aware of the C++ memory model.
=> that means passing bitregion_start/end to strict_volatile_bitfield_p()
     and return false if the access is outside. (this is -fstrict-volatile-bitfields=gnu)

3. another patch can add -fstrict-volatile-bitfields=aapcs later, but I don't think
    we have to hurry for that.


Bernd.
Bernd Edlinger - Oct. 28, 2013, 9:20 a.m.
Hi Sandra,


On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote:
>
> On 10/18/2013 10:38 AM, Richard Biener wrote:
>> Sandra Loosemore <sandra@codesourcery.com> wrote:
>>> On 10/18/2013 04:50 AM, Richard Biener wrote:
>>>> On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
>>>> <sandra@codesourcery.com> wrote:
>>>>> This patch fixes various -fstrict-volatile-bitfields wrong-code
>>> bugs,
>>>>> including instances where bitfield values were being quietly
>>> truncated as
>>>>> well as bugs relating to using the wrong width. The code changes
>>> are
>>>>> identical to those in the previous version of this patch series
>>> (originally
>>>>> posted in June and re-pinged many times since then), but for this
>>> version I
>>>>> have cleaned up the test cases to remove dependencies on header
>>> files, as
>>>>> Bernd requested.
>>>>
>>>> Ok.
>>>
>>> Just to clarify, is this approval unconditional, independent of part 2
>>> and other patches or changes still under active discussion?
>>
>> Yes.
>
> Hrmmmm. After some further testing, I'm afraid this patch is still
> buggy. :-(
>
> I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new
> pr56997-1.c testcase, it got stuck in infinite recursion between
> store_split_bit_field/store_fixed_bit_field and/or
> extract_split_bit_field/extract_fixed_bit_field. This didn't show up in
> my previous mainline testing.
>
> The difference between 4.8 and mainline head is the alignment of the
> incoming str_rtx passed to store_bit_field/extract_bit_field, due to the
> changes in r199898. The alignment is 8 bits on mainline, and 32 on 4.8.
> It seems to me that the bitfield code ought to handle store/extract
> from a more-aligned object and it's probably possible to construct an
> example that fails in this way on mainline as well.
>
> It looks like there are conflicting assumptions between get_best_mode,
> the places that call it in store_fixed_bit_field and
> extract_fixed_bit_field, and the code that actually does the splitting
> -- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather
> than its mode. In the case where it's failing, get_best_mode is
> deciding it can't do a HImode access without splitting, but then the
> split code is assuming SImode units because of the 32-bit alignment, but
> not actually changing the mode of the rtx to match that.
>
> On top of that, this is one of the cases that strict_volatile_bitfield_p
> checks for and returns false, but the callers of store_bit_field and
> extract_bit_field in expr.c have already fiddled with the mode of the
> incoming rtx assuming that -fstrict-volatile-bitfields does apply. It
> doesn't get into that infinite recursion if it's compiled with
> -fno-strict-volatile-bitfields instead; in that case the incoming rtx
> has BLKmode, get_best_mode chooses SImode, and it's able to do the
> access without splitting at all.
>
> Anyway.... I tried a couple different bandaids that solved the infinite
> recursion problem but caused regressions elsewhere, and now I'm not sure
> of the right place to fix this. Given that there is also still ongoing
> discussion about making this a 3-way switch (etc) I am going to hold off
> on committing this patch for now.
>
> -Sandra
>

I have attached an update to your patch, that should
a) fix the recursion problem.
b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model.

Bernd.
Sandra Loosemore - Oct. 29, 2013, 3:29 a.m.
On 10/28/2013 03:20 AM, Bernd Edlinger wrote:
>
> On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote:
>>
>> I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new
>> pr56997-1.c testcase, it got stuck in infinite recursion between
>> store_split_bit_field/store_fixed_bit_field and/or
>> extract_split_bit_field/extract_fixed_bit_field. This didn't show up in
>> my previous mainline testing.
>>
>> [snip]
>
> I have attached an update to your patch, that should
> a) fix the recursion problem.
> b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model.
>

Thanks for picking up the ball on this -- I've been busy with other 
tasks for several days.

I again tried backporting the patch series along with your fix to GCC 
4.8 and tested on arm-none-eabi.  I found that it was still getting 
stuck in infinite recursion unless the test from this patch hunk

> @@ -1699,6 +1711,14 @@ extract_bit_field (rtx str_rtx, unsigned
>  {
>    enum machine_mode mode1;
>
> +  /* Handle extraction of unaligned fields,
> +     this can happen in -fstrict-volatile-bitfields.  */
> +  if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
> +      && GET_MODE_BITSIZE (GET_MODE (str_rtx)) < GET_MODE_BITSIZE (word_mode)
> +      && bitnum % GET_MODE_BITSIZE (GET_MODE (str_rtx)) + bitsize
> +	 > GET_MODE_BITSIZE (GET_MODE (str_rtx)) )
> +    str_rtx = adjust_address (str_rtx, word_mode, 0);
> +
>    /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
>    if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) > 0)
>      mode1 = GET_MODE (str_rtx);

was also inserted into store_bit_field.

I also think that, minimally, this needs to be rewritten as something like

   if (MEM_P (str_rtx)
       && STRICT_ALIGNMENT
       && GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
       && GET_MODE_BITSIZE (GET_MODE (str_rtx)) < GET_MODE_BITSIZE 
(word_mode)
       && (bitnum % MEM_ALIGN (str_rtx) + bitsize
	  > GET_MODE_BITSIZE (GET_MODE (str_rtx))))
     str_rtx = adjust_address (str_rtx, word_mode, 0);

Otherwise, we'll end up using word_mode instead of the field mode on 
targets that support unaligned accesses.  I tested that (again, 4.8 and 
arm-none-eabi) and results looked good, but of course ARM isn't one of 
the targets that supports unaligned accesses.  :-S

I'm still not convinced this is the right fix, though.  It seems to me 
that callers of store_bit_field and extract_bit_field in expr.c ought 
not to be messing with the mode of the rtx when 
flag_strict_volatile_bitfields > 0, because that is losing information 
about the original mode that we are trying to patch up here by forcing 
it to word_mode.  Instead, I think the callers ought to be passing the 
declared field mode into store_bit_field and extract_bit_field, and 
those functions ought to change the mode of the incoming rtx to match 
the field mode only if strict_volatile_bitfield_p assures us that the 
insertion/extraction can be done in that mode.

Alternatively, if strict_volatile_bitfield_p returns false but 
flag_strict_volatile_bitfields > 0, then always force to word_mode and 
change the -fstrict-volatile-bitfields documentation to indicate that's 
the fallback if the insertion/extraction cannot be done in the declared 
mode, rather than claiming that it tries to do the same thing as if 
-fstrict-volatile-bitfields were not enabled at all.

Either way, still needs more work, and more thorough testing (more 
targets, and obviously trunk as well as the 4.8 backport) before I'd 
consider this ready to commit.  I might or might not be able to find 
some more time to work on this in the next week....

-Sandra
Bernd Edlinger - Oct. 29, 2013, 8:51 a.m.
Hi,

On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:
>
> On 10/28/2013 03:20 AM, Bernd Edlinger wrote:
>>
>> On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote:
>>>
>>> I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new
>>> pr56997-1.c testcase, it got stuck in infinite recursion between
>>> store_split_bit_field/store_fixed_bit_field and/or
>>> extract_split_bit_field/extract_fixed_bit_field. This didn't show up in
>>> my previous mainline testing.
>>>
>>> [snip]
>>
>> I have attached an update to your patch, that should
>> a) fix the recursion problem.
>> b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model.
>>
>
> Thanks for picking up the ball on this -- I've been busy with other
> tasks for several days.
>
> I again tried backporting the patch series along with your fix to GCC
> 4.8 and tested on arm-none-eabi. I found that it was still getting
> stuck in infinite recursion unless the test from this patch hunk
>

hmmm....
Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi.
I have seen the recursion on the extract_bit_field, but not on store_bit_field
so far, maybe you could give me a hint which test case exposes the other
flavour of this recursion problem.

>> @@ -1699,6 +1711,14 @@ extract_bit_field (rtx str_rtx, unsigned
>> {
>> enum machine_mode mode1;
>>
>> + /* Handle extraction of unaligned fields,
>> + this can happen in -fstrict-volatile-bitfields. */
>> + if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
>> + && GET_MODE_BITSIZE (GET_MODE (str_rtx)) < GET_MODE_BITSIZE (word_mode)
>> + && bitnum % GET_MODE_BITSIZE (GET_MODE (str_rtx)) + bitsize
>> +> GET_MODE_BITSIZE (GET_MODE (str_rtx)) )
>> + str_rtx = adjust_address (str_rtx, word_mode, 0);
>> +
>> /* Handle -fstrict-volatile-bitfields in the cases where it applies. */
>> if (GET_MODE_BITSIZE (GET_MODE (str_rtx))> 0)
>> mode1 = GET_MODE (str_rtx);
>
> was also inserted into store_bit_field.
>
> I also think that, minimally, this needs to be rewritten as something like
>
> if (MEM_P (str_rtx)
> && STRICT_ALIGNMENT
> && GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
> && GET_MODE_BITSIZE (GET_MODE (str_rtx)) < GET_MODE_BITSIZE
> (word_mode)
> && (bitnum % MEM_ALIGN (str_rtx) + bitsize
>> GET_MODE_BITSIZE (GET_MODE (str_rtx))))
> str_rtx = adjust_address (str_rtx, word_mode, 0);
>

Yes, that looks fine. 
 

> Otherwise, we'll end up using word_mode instead of the field mode on
> targets that support unaligned accesses. I tested that (again, 4.8 and
> arm-none-eabi) and results looked good, but of course ARM isn't one of
> the targets that supports unaligned accesses. :-S
>
> I'm still not convinced this is the right fix, though. It seems to me
> that callers of store_bit_field and extract_bit_field in expr.c ought
> not to be messing with the mode of the rtx when
> flag_strict_volatile_bitfields> 0, because that is losing information
> about the original mode that we are trying to patch up here by forcing
> it to word_mode. Instead, I think the callers ought to be passing the
> declared field mode into store_bit_field and extract_bit_field, and
> those functions ought to change the mode of the incoming rtx to match
> the field mode only if strict_volatile_bitfield_p assures us that the
> insertion/extraction can be done in that mode.
>

The problem starts likely at expr.c when this is done:

if (volatilep && flag_strict_volatile_bitfields> 0)
     to_rtx = adjust_address (to_rtx, mode1, 0);

this restricts the possible access mode not only for bit-fields
but for all possible volatile members. But -fstrict-volatile-bitfields
is supposed to affect bit-fields only.

> Alternatively, if strict_volatile_bitfield_p returns false but
> flag_strict_volatile_bitfields> 0, then always force to word_mode and
> change the -fstrict-volatile-bitfields documentation to indicate that's
> the fallback if the insertion/extraction cannot be done in the declared
> mode, rather than claiming that it tries to do the same thing as if
> -fstrict-volatile-bitfields were not enabled at all.
>
> Either way, still needs more work, and more thorough testing (more
> targets, and obviously trunk as well as the 4.8 backport) before I'd
> consider this ready to commit. I might or might not be able to find
> some more time to work on this in the next week....
>

Yes. 

And it would be good to reach Richard's Nov-21 deadline for GCC-4.9

Bernd.

> -Sandra
>
Sandra Loosemore - Oct. 29, 2013, 2:41 p.m.
On 10/29/2013 02:51 AM, Bernd Edlinger wrote:
> On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:
>>
>> I again tried backporting the patch series along with your fix to GCC
>> 4.8 and tested on arm-none-eabi. I found that it was still getting
>> stuck in infinite recursion unless the test from this patch hunk
>>
>
> hmmm....
> Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi.
> I have seen the recursion on the extract_bit_field, but not on store_bit_field
> so far, maybe you could give me a hint which test case exposes the other
> flavour of this recursion problem.

Sorry, I was going to describe that in more detail but I forgot.  It was 
compiling pr56997-1.c with -mthumb, like:

arm-none-eabi-gcc /path/to/gcc/testsuite/gcc.dg/pr56997-1.c -c 
-fno-diagnostics-show-caret   -fstrict-volatile-bitfields 
-DSTACK_SIZE=16384 -mthumb

Other testing with -mthumb looked fine.

>> Either way, still needs more work, and more thorough testing (more
>> targets, and obviously trunk as well as the 4.8 backport) before I'd
>> consider this ready to commit. I might or might not be able to find
>> some more time to work on this in the next week....
>
> Yes.
>
> And it would be good to reach Richard's Nov-21 deadline for GCC-4.9

Yeah, I know the clock is ticking.  :-(

-Sandra

Patch

Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	(revision 203003)
+++ gcc/expmed.c	(working copy)
@@ -415,6 +415,42 @@  lowpart_bit_field_p (unsigned HOST_WIDE_
     return bitnum % BITS_PER_WORD == 0;
 }
 
+/* Return true if -fstrict-volatile-bitfields applies an access of OP0
+   containing BITSIZE bits starting at BITNUM, with field mode FIELDMODE.  */
+
+static bool
+strict_volatile_bitfield_p (rtx op0, unsigned HOST_WIDE_INT bitsize,
+			    unsigned HOST_WIDE_INT bitnum,
+			    enum machine_mode fieldmode)
+{
+  unsigned HOST_WIDE_INT modesize = GET_MODE_BITSIZE (fieldmode);
+
+  /* -fstrict-volatile-bitfields must be enabled and we must have a
+     volatile MEM.  */
+  if (!MEM_P (op0)
+      || !MEM_VOLATILE_P (op0)
+      || flag_strict_volatile_bitfields <= 0)
+    return false;
+
+  /* Non-integral modes likely only happen with packed structures.
+     Punt.  */
+  if (!SCALAR_INT_MODE_P (fieldmode))
+    return false;
+
+  /* The bit size must not be larger than the field mode, and
+     the field mode must not be larger than a word.  */
+  if (bitsize > modesize || modesize > BITS_PER_WORD)
+    return false;
+
+  /* Check for cases of unaligned fields that must be split.  */
+  if (bitnum % BITS_PER_UNIT + bitsize > modesize
+      || (STRICT_ALIGNMENT
+	  && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize > modesize))
+    return false;
+
+  return true;
+}
+
 /* Return true if OP is a memory and if a bitfield of size BITSIZE at
    bit number BITNUM can be treated as a simple value of mode MODE.  */
 
@@ -813,12 +849,8 @@  store_bit_field_1 (rtx str_rtx, unsigned
      cheap register alternative is available.  */
   if (MEM_P (op0))
     {
-      /* Do not use unaligned memory insvs for volatile bitfields when
-	 -fstrict-volatile-bitfields is in effect.  */
-      if (!(MEM_VOLATILE_P (op0)
-	    && flag_strict_volatile_bitfields > 0)
-	  && get_best_mem_extraction_insn (&insv, EP_insv, bitsize, bitnum,
-					   fieldmode)
+      if (get_best_mem_extraction_insn (&insv, EP_insv, bitsize, bitnum,
+					fieldmode)
 	  && store_bit_field_using_insv (&insv, op0, bitsize, bitnum, value))
 	return true;
 
@@ -871,6 +903,27 @@  store_bit_field (rtx str_rtx, unsigned H
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
+  /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
+  if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, fieldmode))
+    {
+
+      /* Storing any naturally aligned field can be done with a simple
+	 store.  For targets that support fast unaligned memory, any
+	 naturally sized, unit aligned field can be done directly.  */
+      if (simple_mem_bitfield_p (str_rtx, bitsize, bitnum, fieldmode))
+	{
+	  str_rtx = adjust_bitfield_address (str_rtx, fieldmode,
+					     bitnum / BITS_PER_UNIT);
+	  emit_move_insn (str_rtx, value);
+	}
+      else
+	/* Explicitly override the C/C++ memory model; ignore the
+	   bit range so that we can do the access in the mode mandated
+	   by -fstrict-volatile-bitfields instead.  */
+	store_fixed_bit_field (str_rtx, bitsize, bitnum, 0, 0, value);
+      return;
+    }
+
   /* Under the C++0x memory model, we must not touch bits outside the
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
@@ -923,29 +976,12 @@  store_fixed_bit_field (rtx op0, unsigned
 
   if (MEM_P (op0))
     {
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
-
-      /* Get the proper mode to use for this field.  We want a mode that
-	 includes the entire field.  If such a mode would be larger than
-	 a word, we won't be doing the extraction the normal way.
-	 We don't want a mode bigger than the destination.  */
-
       mode = GET_MODE (op0);
       if (GET_MODE_BITSIZE (mode) == 0
 	  || GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (word_mode))
 	mode = word_mode;
-
-      if (MEM_VOLATILE_P (op0)
-          && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
-	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
-	  && flag_strict_volatile_bitfields > 0)
-	mode = GET_MODE (op0);
-      else
-	mode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end,
-			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+      mode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end,
+			    MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
 	{
@@ -1429,19 +1465,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
      If that's wrong, the solution is to test for it and set TARGET to 0
      if needed.  */
 
-  /* If the bitfield is volatile, we need to make sure the access
-     remains on a type-aligned boundary.  */
-  if (GET_CODE (op0) == MEM
-      && MEM_VOLATILE_P (op0)
-      && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
-      && flag_strict_volatile_bitfields > 0)
-    goto no_subreg_mode_swap;
-
-  /* Only scalar integer modes can be converted via subregs.  There is an
-     additional problem for FP modes here in that they can have a precision
-     which is different from the size.  mode_for_size uses precision, but
-     we want a mode based on the size, so we must avoid calling it for FP
-     modes.  */
+  /* Get the mode of the field to use for atomic access or subreg
+     conversion.  */
   mode1 = mode;
   if (SCALAR_INT_MODE_P (tmode))
     {
@@ -1474,8 +1499,6 @@  extract_bit_field_1 (rtx str_rtx, unsign
       return convert_extracted_bit_field (op0, mode, tmode, unsignedp);
     }
 
- no_subreg_mode_swap:
-
   /* Handle fields bigger than a word.  */
 
   if (bitsize > BITS_PER_WORD)
@@ -1595,11 +1618,8 @@  extract_bit_field_1 (rtx str_rtx, unsign
      cheap register alternative is available.  */
   if (MEM_P (op0))
     {
-      /* Do not use extv/extzv for volatile bitfields when
-         -fstrict-volatile-bitfields is in effect.  */
-      if (!(MEM_VOLATILE_P (op0) && flag_strict_volatile_bitfields > 0)
-	  && get_best_mem_extraction_insn (&extv, pattern, bitsize, bitnum,
-					   tmode))
+      if (get_best_mem_extraction_insn (&extv, pattern, bitsize, bitnum,
+					tmode))
 	{
 	  rtx result = extract_bit_field_using_extv (&extv, op0, bitsize,
 						     bitnum, unsignedp,
@@ -1665,6 +1685,31 @@  extract_bit_field (rtx str_rtx, unsigned
 		   unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
 		   enum machine_mode mode, enum machine_mode tmode)
 {
+  enum machine_mode mode1;
+
+  /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
+  if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) > 0)
+    mode1 = GET_MODE (str_rtx);
+  else if (target && GET_MODE_BITSIZE (GET_MODE (target)) > 0)
+    mode1 = GET_MODE (target);
+  else
+    mode1 = tmode;
+
+  if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, mode1))
+    {
+      rtx result;
+
+      /* Extraction of a full MODE1 value can be done with a load as long as
+	 the field is on a byte boundary and is sufficiently aligned.  */
+      if (simple_mem_bitfield_p (str_rtx, bitsize, bitnum, mode1))
+	result = adjust_bitfield_address (str_rtx, mode1,
+					  bitnum / BITS_PER_UNIT);
+      else
+	result = extract_fixed_bit_field (mode, str_rtx, bitsize, bitnum,
+					  target, unsignedp);
+      return convert_extracted_bit_field (result, mode, tmode, unsignedp);
+    }
+  
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
 			      target, mode, tmode, true);
 }
@@ -1691,45 +1736,19 @@  extract_fixed_bit_field (enum machine_mo
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.  */
 
-      if (MEM_VOLATILE_P (op0)
-	  && flag_strict_volatile_bitfields > 0)
-	{
-	  if (GET_MODE_BITSIZE (GET_MODE (op0)) > 0)
-	    mode = GET_MODE (op0);
-	  else if (target && GET_MODE_BITSIZE (GET_MODE (target)) > 0)
-	    mode = GET_MODE (target);
-	  else
-	    mode = tmode;
-	}
-      else
-	mode = get_best_mode (bitsize, bitnum, 0, 0,
-			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
+      mode = GET_MODE (op0);
+      if (GET_MODE_BITSIZE (mode) == 0
+	  || GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (word_mode))
+	mode = word_mode;
+      mode = get_best_mode (bitsize, bitnum, 0, 0,
+			    MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
 	/* The only way this should occur is if the field spans word
 	   boundaries.  */
 	return extract_split_bit_field (op0, bitsize, bitnum, unsignedp);
 
-      unsigned int total_bits = GET_MODE_BITSIZE (mode);
-      HOST_WIDE_INT bit_offset = bitnum - bitnum % total_bits;
-
-      /* If we're accessing a volatile MEM, we can't apply BIT_OFFSET
-	 if it results in a multi-word access where we otherwise wouldn't
-	 have one.  So, check for that case here.  */
-      if (MEM_P (op0)
-	  && MEM_VOLATILE_P (op0)
-	  && flag_strict_volatile_bitfields > 0
-	  && bitnum % BITS_PER_UNIT + bitsize <= total_bits
-	  && bitnum % GET_MODE_BITSIZE (mode) + bitsize > total_bits)
-	{
-	  /* If the target doesn't support unaligned access, give up and
-	     split the access into two.  */
-	  if (STRICT_ALIGNMENT)
-	    return extract_split_bit_field (op0, bitsize, bitnum, unsignedp);
-	  bit_offset = bitnum - bitnum % BITS_PER_UNIT;
-	}
-      op0 = adjust_bitfield_address (op0, mode, bit_offset / BITS_PER_UNIT);
-      bitnum -= bit_offset;
+      op0 = narrow_bit_field_mem (op0, mode, bitsize, bitnum, &bitnum);
     }
 
   mode = GET_MODE (op0);
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 203003)
+++ gcc/doc/invoke.texi	(working copy)
@@ -21169,6 +21169,12 @@  instruction, even though that accesses b
 any portion of the bit-field, or memory-mapped registers unrelated to
 the one being updated.
 
+In some cases, such as when the @code{packed} attribute is applied to a 
+structure field, it may not be possible to access the field with a single
+read or write that is correctly aligned for the target machine.  In this
+case GCC falls back to generating multiple accesses rather than code that 
+will fault or truncate the result at run time.
+
 The default value of this option is determined by the application binary
 interface for the target processor.
 
Index: gcc/testsuite/gcc.dg/pr23623.c
===================================================================
--- gcc/testsuite/gcc.dg/pr23623.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr23623.c	(revision 0)
@@ -0,0 +1,45 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fstrict-volatile-bitfields -fdump-rtl-final" } */
+
+/* With -fstrict-volatile-bitfields, the volatile accesses to bf2.b
+   and bf3.b must do unsigned int reads/writes.  The non-volatile
+   accesses to bf1.b are not so constrained.  */
+
+extern struct
+{
+  unsigned int b : 1;
+} bf1;
+
+extern volatile struct
+{
+  unsigned int b : 1;
+} bf2;
+
+extern struct
+{
+  volatile unsigned int b : 1;
+} bf3;
+
+void writeb(void)
+{
+  bf1.b = 1;
+  bf2.b = 1;	/* volatile read + volatile write */
+  bf3.b = 1;	/* volatile read + volatile write */
+}
+
+extern unsigned int x1, x2, x3;
+
+void readb(void)
+{
+  x1 = bf1.b;
+  x2 = bf2.b;   /* volatile write */
+  x3 = bf3.b;   /* volatile write */
+}
+
+/* There should be 6 volatile MEMs total, but scan-rtl-dump-times counts
+   the number of match variables and not the number of matches.  Since
+   the parenthesized subexpression in the regexp introduces an extra match
+   variable, we need to give a count of 12 instead of 6 here.  */
+/* { dg-final { scan-rtl-dump-times "mem/v(/.)*:SI" 12 "final" } } */
+/* { dg-final { cleanup-rtl-dump "final" } } */
+
Index: gcc/testsuite/gcc.dg/pr48784-1.c
===================================================================
--- gcc/testsuite/gcc.dg/pr48784-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr48784-1.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do run } */
+/* { dg-options "-fstrict-volatile-bitfields" } */
+
+extern void abort (void);
+
+#pragma pack(1)
+volatile struct S0 {
+   signed a : 7;
+   unsigned b : 28;  /* b can't be fetched with an aligned 32-bit access, */
+                     /* but it certainly can be fetched with an unaligned access */
+} g = {0,0xfffffff};
+
+int main() {
+  unsigned b = g.b;
+  if (b != 0xfffffff)
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr48784-2.c
===================================================================
--- gcc/testsuite/gcc.dg/pr48784-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr48784-2.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do run } */
+/* { dg-options "-fno-strict-volatile-bitfields" } */
+
+extern void abort (void);
+
+#pragma pack(1)
+volatile struct S0 {
+   signed a : 7;
+   unsigned b : 28;  /* b can't be fetched with an aligned 32-bit access, */
+                     /* but it certainly can be fetched with an unaligned access */
+} g = {0,0xfffffff};
+
+int main() {
+  unsigned b = g.b;
+  if (b != 0xfffffff)
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr56341-1.c
===================================================================
--- gcc/testsuite/gcc.dg/pr56341-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr56341-1.c	(revision 0)
@@ -0,0 +1,40 @@ 
+/* { dg-do run } */
+/* { dg-options "-fstrict-volatile-bitfields" } */
+
+extern void abort (void);
+
+struct test0
+{
+  unsigned char b1[2];
+} __attribute__((packed, aligned(2)));
+
+struct test1
+{
+  volatile unsigned long a1;
+  unsigned char b1[4];
+} __attribute__((packed, aligned(2)));
+
+struct test2
+{
+  struct test0 t0;
+  struct test1 t1;
+  struct test0 t2;
+} __attribute__((packed, aligned(2)));
+
+struct test2 xx;
+struct test2 *x1 = &xx;
+
+#define MAGIC 0x12345678
+
+void test0 (struct test2* x1)
+{
+  x1->t1.a1 = MAGIC;
+}
+
+int main()
+{
+  test0 (x1);
+  if (xx.t1.a1 != MAGIC)
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr56341-2.c
===================================================================
--- gcc/testsuite/gcc.dg/pr56341-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr56341-2.c	(revision 0)
@@ -0,0 +1,40 @@ 
+/* { dg-do run } */
+/* { dg-options "-fno-strict-volatile-bitfields" } */
+
+extern void abort (void);
+
+struct test0
+{
+  unsigned char b1[2];
+} __attribute__((packed, aligned(2)));
+
+struct test1
+{
+  volatile unsigned long a1;
+  unsigned char b1[4];
+} __attribute__((packed, aligned(2)));
+
+struct test2
+{
+  struct test0 t0;
+  struct test1 t1;
+  struct test0 t2;
+} __attribute__((packed, aligned(2)));
+
+struct test2 xx;
+struct test2 *x1 = &xx;
+
+#define MAGIC 0x12345678
+
+void test0 (struct test2* x1)
+{
+  x1->t1.a1 = MAGIC;
+}
+
+int main()
+{
+  test0 (x1);
+  if (xx.t1.a1 != MAGIC)
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr56997-1.c
===================================================================
--- gcc/testsuite/gcc.dg/pr56997-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr56997-1.c	(revision 0)
@@ -0,0 +1,44 @@ 
+/* Test volatile access to unaligned field.  */
+/* { dg-do run } */
+/* { dg-options "-fstrict-volatile-bitfields" } */
+
+extern void abort (void);
+
+#define test_type unsigned short
+#define MAGIC (unsigned short)0x102u
+
+typedef struct s{
+ unsigned char Prefix;
+ test_type Type;
+}__attribute((__packed__)) ss;
+
+volatile ss v;
+ss g;
+
+void __attribute__((noinline))
+foo (test_type u)
+{
+  v.Type = u;
+}
+
+test_type __attribute__((noinline))
+bar (void)
+{
+  return v.Type;
+}
+
+int main()
+{
+  test_type temp;
+  foo(MAGIC);
+  __builtin_memcpy(&g, (void *)&v, sizeof(g));
+  if (g.Type != MAGIC)
+    abort ();
+
+  g.Type = MAGIC;
+  __builtin_memcpy((void *)&v, &g, sizeof(v));
+  temp = bar();
+  if (temp != MAGIC)
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr56997-2.c
===================================================================
--- gcc/testsuite/gcc.dg/pr56997-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr56997-2.c	(revision 0)
@@ -0,0 +1,44 @@ 
+/* Test volatile access to unaligned field.  */
+/* { dg-do run } */
+/* { dg-options "-fstrict-volatile-bitfields" } */
+
+extern void abort (void);
+
+#define test_type unsigned int
+#define MAGIC 0x1020304u
+
+typedef struct s{
+ unsigned char Prefix;
+ test_type Type;
+}__attribute((__packed__)) ss;
+
+volatile ss v;
+ss g;
+
+void __attribute__((noinline))
+foo (test_type u)
+{
+  v.Type = u;
+}
+
+test_type __attribute__((noinline))
+bar (void)
+{
+  return v.Type;
+}
+
+int main()
+{
+  test_type temp;
+  foo(MAGIC);
+  __builtin_memcpy(&g, (void *)&v, sizeof(g));
+  if (g.Type != MAGIC)
+    abort ();
+
+  g.Type = MAGIC;
+  __builtin_memcpy((void *)&v, &g, sizeof(v));
+  temp = bar();
+  if (temp != MAGIC)
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr56997-3.c
===================================================================
--- gcc/testsuite/gcc.dg/pr56997-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr56997-3.c	(revision 0)
@@ -0,0 +1,44 @@ 
+/* Test volatile access to unaligned field.  */
+/* { dg-do run } */
+/* { dg-options "-fstrict-volatile-bitfields" } */
+
+extern void abort (void);
+
+#define test_type unsigned long long
+#define MAGIC 0x102030405060708ull
+
+typedef struct s{
+ unsigned char Prefix;
+ test_type Type;
+}__attribute((__packed__)) ss;
+
+volatile ss v;
+ss g;
+
+void __attribute__((noinline))
+foo (test_type u)
+{
+  v.Type = u;
+}
+
+test_type __attribute__((noinline))
+bar (void)
+{
+  return v.Type;
+}
+
+int main()
+{
+  test_type temp;
+  foo(MAGIC);
+  __builtin_memcpy(&g, (void *)&v, sizeof(g));
+  if (g.Type != MAGIC)
+    abort ();
+
+  g.Type = MAGIC;
+  __builtin_memcpy((void *)&v, &g, sizeof(v));
+  temp = bar();
+  if (temp != MAGIC)
+    abort ();
+  return 0;
+}