Patchwork Continue strict-volatile-bitfields fixes

login
register
mail settings
Submitter Thomas Schwinge
Date April 19, 2012, 5:46 p.m.
Message ID <871unjobrq.fsf@schwinge.name>
Download mbox | patch
Permalink /patch/153845/
State Not Applicable, archived
Headers show

Comments

Thomas Schwinge - April 19, 2012, 5:46 p.m.
Hi!

On Wed, 18 Apr 2012 18:16:32 +0200, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 04/18/2012 06:14 PM, Richard Earnshaw wrote:
> > On 18/04/12 16:37, Thomas Schwinge wrote:
> >> gcc/testsuite/
> >> 	* gcc.dg/tree-ssa/20030922-1.c: Compile with
> >> 	-fno-strict-volatile-bitfields.
> >> 	* gcc.dg/tree-ssa/foldconst-3.c: Likewise.
> >> 	* gcc.dg/tree-ssa/vrp15.c: Likewise.
> >>
> > 
> > None of these have any volatile bitfields, so the option should be a no-op.

That's what I thought, too.  :-)

> The problem is that we have to treat normal bitfields differently as
> well, since a variable may later be declared as volatile.

Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c:

    extern void abort (void);
    
    enum tree_code
    {
      BIND_EXPR,
    };
    struct tree_common
    {
      enum tree_code code:8;
    };
    void
    voidify_wrapper_expr (struct tree_common *common)
    {
      switch (common->code)
        {
        case BIND_EXPR:
          if (common->code != BIND_EXPR)
            abort ();
        }
    }

The diff for -fdump-tree-all between compiling with
-fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins
with the following, right in 20030922-1.c.003t.original:


That is, for -fno-strict-volatile-bitfields the second instance of
»common->code« it is a component_ref, whereas for
-fstrict-volatile-bitfields it is a bit_field_ref.  The former will be
optimized as intended, the latter will not.  So, what should be emitted
in the -fstrict-volatile-bitfields case?  A component_ref -- but this is
what Bernd's commit r182545 (for PR51200) changed, I think?  Or should
later optimization stages learn to better deal with a bit_field_ref, and
where would this have to happen?


Grüße,
 Thomas
Joey Ye - April 20, 2012, 6:57 a.m.
> -----Original Message-----
> From: Thomas Schwinge [mailto:thomas@codesourcery.com]
> Sent: Friday, April 20, 2012 01:46
> To: Bernd Schmidt; Richard Earnshaw
> Cc: Richard Guenther; Joey Ye; dj@redhat.com; GCC Patches; Mitchell,
> Mark
> Subject: Re: Continue strict-volatile-bitfields fixes
> That is, for -fno-strict-volatile-bitfields the second instance of
> »common->code« it is a component_ref, whereas for
> -fstrict-volatile-bitfields it is a bit_field_ref.  The former will be
> optimized as intended, the latter will not.  So, what should be emitted
> in the -fstrict-volatile-bitfields case?  A component_ref -- but this
> is
> what Bernd's commit r182545 (for PR51200) changed, I think?  Or should
> later optimization stages learn to better deal with a bit_field_ref,
> and
> where would this have to happen?
R182545 does changes compiler behavior for non-volatile bitfields. As shown in following ARM example.

typedef struct {
  unsigned short a:8, b:8;
} BitStruct;
BitStruct bits;

void foo()
{
	bits.a=3;
}

compiled with latest trunk which includes r182545
1. -fstrict-volatile-bitfields -O0:
        ldrh    r2, [r3, #0]    @ movhi
        movs    r1, #3
        bfi     r2, r1, #0, #8
        strh    r2, [r3, #0]    @ movhi

2. -fno-strict-volatile-bitfields -O0:
        movs    r2, #3
        strb    r2, [r3, #0]

3. -fstrict-volatile-bitfields -Os:
        movs    r2, #3
        strb    r2, [r3, #0]

compiled without r182545:
4. -fstrict-volatile-bitfields -O0:
        movs    r2, #3
        strb    r2, [r3, #0]

Compare 1 to 4, r182545 does change behavior of non-volatile bitfields.

Comparing to 2, 1 miss the opportunity to use field to replace bitfield. But in 3, combine pass merges the bitfield instructions into one instruction. 

So r182545 doesn't impact performance with optimization, at least on ARM. Not sure if this combine always success in all targets.

- Joey
Thomas Schwinge - April 25, 2012, 11:27 a.m.
Hi!

On Thu, 19 Apr 2012 19:46:17 +0200, I wrote:
> Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c:
> 
>     extern void abort (void);
>     
>     enum tree_code
>     {
>       BIND_EXPR,
>     };
>     struct tree_common
>     {
>       enum tree_code code:8;
>     };
>     void
>     voidify_wrapper_expr (struct tree_common *common)
>     {
>       switch (common->code)
>         {
>         case BIND_EXPR:
>           if (common->code != BIND_EXPR)
>             abort ();
>         }
>     }
> 
> The diff for -fdump-tree-all between compiling with
> -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins
> with the following, right in 20030922-1.c.003t.original:
> 
> diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original
> --- fnsvb/20030922-1.c.003t.original    2012-04-19 16:51:18.322150866 +0200
> +++ fsvb/20030922-1.c.003t.original     2012-04-19 16:49:18.132088498 +0200
> @@ -7,7 +7,7 @@
>    switch ((int) common->code)
>      {
>        case 0:;
> -      if (common->code != 0)
> +      if ((BIT_FIELD_REF <*common, 32, 0> & 255) != 0)
>          {
>            abort ();
>          }
> 
> That is, for -fno-strict-volatile-bitfields the second instance of
> »common->code« it is a component_ref, whereas for
> -fstrict-volatile-bitfields it is a bit_field_ref.  The former will be
> optimized as intended, the latter will not.  So, what should be emitted
> in the -fstrict-volatile-bitfields case?  A component_ref -- but this is
> what Bernd's commit r182545 (for PR51200) changed, I think?  Or should
> later optimization stages learn to better deal with a bit_field_ref, and
> where would this have to happen?

I figured out why this generic code is behaving differently for SH
targets.  I compared to ARM as well as x86, for which indeed the
optimization possibility is not lost even when compiling with
-fstrict-volatile-bitfields.

The component_ref inside the if statement (but not the one in the switch
statement) is fed into optimize_bit_field_compare where it is optimized
to »BIT_FIELD_REF <*common, 32, 0> & 255« only for SH, because
get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has
»#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to
SImode, hoping for better code generation »since it may eliminate
subsequent memory access if subsequent accesses occur to other fields in
the same word of the structure, but to different bytes«.  (Well, the
converse happens here...)  After that, in optimize_bit_field_compare, for
ARM and x86, »nbitsize == lbitsize«, and thus the optimization is
aborted, whereas for SH it will be performed, that is, the component_ref
is replaced with »BIT_FIELD_REF <*common, 32, 0> & 255«.

If I manually force SH's SImode back to QImode (that is, abort
optimize_bit_field_compare), the later optimization passes work as they
do for ARM and x86.

Before Bernd's r182545, optimize_bit_field_compare returned early (that
is, aborted this optimization attempt), because »lbitsize ==
GET_MODE_BITSIZE (lmode)« (8 bits).  (With the current sources, lmode is
VOIDmode.)

Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case,
or should a later optimization pass be able to figure out that
»BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as
common->code, and then be able to conflate these?  Any suggestions
where/how to tackle this?


Grüße,
 Thomas
Richard Guenther - April 25, 2012, 11:51 a.m.
On Wed, Apr 25, 2012 at 1:27 PM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Thu, 19 Apr 2012 19:46:17 +0200, I wrote:
>> Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c:
>>
>>     extern void abort (void);
>>
>>     enum tree_code
>>     {
>>       BIND_EXPR,
>>     };
>>     struct tree_common
>>     {
>>       enum tree_code code:8;
>>     };
>>     void
>>     voidify_wrapper_expr (struct tree_common *common)
>>     {
>>       switch (common->code)
>>         {
>>         case BIND_EXPR:
>>           if (common->code != BIND_EXPR)
>>             abort ();
>>         }
>>     }
>>
>> The diff for -fdump-tree-all between compiling with
>> -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins
>> with the following, right in 20030922-1.c.003t.original:
>>
>> diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original
>> --- fnsvb/20030922-1.c.003t.original    2012-04-19 16:51:18.322150866 +0200
>> +++ fsvb/20030922-1.c.003t.original     2012-04-19 16:49:18.132088498 +0200
>> @@ -7,7 +7,7 @@
>>    switch ((int) common->code)
>>      {
>>        case 0:;
>> -      if (common->code != 0)
>> +      if ((BIT_FIELD_REF <*common, 32, 0> & 255) != 0)
>>          {
>>            abort ();
>>          }
>>
>> That is, for -fno-strict-volatile-bitfields the second instance of
>> »common->code« it is a component_ref, whereas for
>> -fstrict-volatile-bitfields it is a bit_field_ref.  The former will be
>> optimized as intended, the latter will not.  So, what should be emitted
>> in the -fstrict-volatile-bitfields case?  A component_ref -- but this is
>> what Bernd's commit r182545 (for PR51200) changed, I think?  Or should
>> later optimization stages learn to better deal with a bit_field_ref, and
>> where would this have to happen?
>
> I figured out why this generic code is behaving differently for SH
> targets.  I compared to ARM as well as x86, for which indeed the
> optimization possibility is not lost even when compiling with
> -fstrict-volatile-bitfields.
>
> The component_ref inside the if statement (but not the one in the switch
> statement) is fed into optimize_bit_field_compare where it is optimized
> to »BIT_FIELD_REF <*common, 32, 0> & 255« only for SH, because
> get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has
> »#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to
> SImode, hoping for better code generation »since it may eliminate
> subsequent memory access if subsequent accesses occur to other fields in
> the same word of the structure, but to different bytes«.  (Well, the
> converse happens here...)  After that, in optimize_bit_field_compare, for
> ARM and x86, »nbitsize == lbitsize«, and thus the optimization is
> aborted, whereas for SH it will be performed, that is, the component_ref
> is replaced with »BIT_FIELD_REF <*common, 32, 0> & 255«.
>
> If I manually force SH's SImode back to QImode (that is, abort
> optimize_bit_field_compare), the later optimization passes work as they
> do for ARM and x86.
>
> Before Bernd's r182545, optimize_bit_field_compare returned early (that
> is, aborted this optimization attempt), because »lbitsize ==
> GET_MODE_BITSIZE (lmode)« (8 bits).  (With the current sources, lmode is
> VOIDmode.)
>
> Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case,
> or should a later optimization pass be able to figure out that
> »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as
> common->code, and then be able to conflate these?  Any suggestions
> where/how to tackle this?

The BIT_FIELD_REF is somewhat of a red-herring.  It is created by fold-const.c
in optimize_bit_field_compare, code that I think should be removed completely.
Or it needs to be made aware of strict-volatile bitfield and C++ memory model
details.

Richard.

>
> Grüße,
>  Thomas

Patch

diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original
--- fnsvb/20030922-1.c.003t.original    2012-04-19 16:51:18.322150866 +0200
+++ fsvb/20030922-1.c.003t.original     2012-04-19 16:49:18.132088498 +0200
@@ -7,7 +7,7 @@ 
   switch ((int) common->code)
     {
       case 0:;
-      if (common->code != 0)
+      if ((BIT_FIELD_REF <*common, 32, 0> & 255) != 0)
         {
           abort ();
         }