Message ID | 871unjobrq.fsf@schwinge.name |
---|---|
State | Not Applicable, archived |
Headers | show |
> -----Original Message----- > From: Thomas Schwinge [mailto:thomas@codesourcery.com] > Sent: Friday, April 20, 2012 01:46 > To: Bernd Schmidt; Richard Earnshaw > Cc: Richard Guenther; Joey Ye; dj@redhat.com; GCC Patches; Mitchell, > Mark > Subject: Re: Continue strict-volatile-bitfields fixes > That is, for -fno-strict-volatile-bitfields the second instance of > »common->code« it is a component_ref, whereas for > -fstrict-volatile-bitfields it is a bit_field_ref. The former will be > optimized as intended, the latter will not. So, what should be emitted > in the -fstrict-volatile-bitfields case? A component_ref -- but this > is > what Bernd's commit r182545 (for PR51200) changed, I think? Or should > later optimization stages learn to better deal with a bit_field_ref, > and > where would this have to happen? R182545 does changes compiler behavior for non-volatile bitfields. As shown in following ARM example. typedef struct { unsigned short a:8, b:8; } BitStruct; BitStruct bits; void foo() { bits.a=3; } compiled with latest trunk which includes r182545 1. -fstrict-volatile-bitfields -O0: ldrh r2, [r3, #0] @ movhi movs r1, #3 bfi r2, r1, #0, #8 strh r2, [r3, #0] @ movhi 2. -fno-strict-volatile-bitfields -O0: movs r2, #3 strb r2, [r3, #0] 3. -fstrict-volatile-bitfields -Os: movs r2, #3 strb r2, [r3, #0] compiled without r182545: 4. -fstrict-volatile-bitfields -O0: movs r2, #3 strb r2, [r3, #0] Compare 1 to 4, r182545 does change behavior of non-volatile bitfields. Comparing to 2, 1 miss the opportunity to use field to replace bitfield. But in 3, combine pass merges the bitfield instructions into one instruction. So r182545 doesn't impact performance with optimization, at least on ARM. Not sure if this combine always success in all targets. - Joey
Hi! On Thu, 19 Apr 2012 19:46:17 +0200, I wrote: > Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c: > > extern void abort (void); > > enum tree_code > { > BIND_EXPR, > }; > struct tree_common > { > enum tree_code code:8; > }; > void > voidify_wrapper_expr (struct tree_common *common) > { > switch (common->code) > { > case BIND_EXPR: > if (common->code != BIND_EXPR) > abort (); > } > } > > The diff for -fdump-tree-all between compiling with > -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins > with the following, right in 20030922-1.c.003t.original: > > diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original > --- fnsvb/20030922-1.c.003t.original 2012-04-19 16:51:18.322150866 +0200 > +++ fsvb/20030922-1.c.003t.original 2012-04-19 16:49:18.132088498 +0200 > @@ -7,7 +7,7 @@ > switch ((int) common->code) > { > case 0:; > - if (common->code != 0) > + if ((BIT_FIELD_REF <*common, 32, 0> & 255) != 0) > { > abort (); > } > > That is, for -fno-strict-volatile-bitfields the second instance of > »common->code« it is a component_ref, whereas for > -fstrict-volatile-bitfields it is a bit_field_ref. The former will be > optimized as intended, the latter will not. So, what should be emitted > in the -fstrict-volatile-bitfields case? A component_ref -- but this is > what Bernd's commit r182545 (for PR51200) changed, I think? Or should > later optimization stages learn to better deal with a bit_field_ref, and > where would this have to happen? I figured out why this generic code is behaving differently for SH targets. I compared to ARM as well as x86, for which indeed the optimization possibility is not lost even when compiling with -fstrict-volatile-bitfields. The component_ref inside the if statement (but not the one in the switch statement) is fed into optimize_bit_field_compare where it is optimized to »BIT_FIELD_REF <*common, 32, 0> & 255« only for SH, because get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has »#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to SImode, hoping for better code generation »since it may eliminate subsequent memory access if subsequent accesses occur to other fields in the same word of the structure, but to different bytes«. (Well, the converse happens here...) After that, in optimize_bit_field_compare, for ARM and x86, »nbitsize == lbitsize«, and thus the optimization is aborted, whereas for SH it will be performed, that is, the component_ref is replaced with »BIT_FIELD_REF <*common, 32, 0> & 255«. If I manually force SH's SImode back to QImode (that is, abort optimize_bit_field_compare), the later optimization passes work as they do for ARM and x86. Before Bernd's r182545, optimize_bit_field_compare returned early (that is, aborted this optimization attempt), because »lbitsize == GET_MODE_BITSIZE (lmode)« (8 bits). (With the current sources, lmode is VOIDmode.) Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case, or should a later optimization pass be able to figure out that »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as common->code, and then be able to conflate these? Any suggestions where/how to tackle this? Grüße, Thomas
On Wed, Apr 25, 2012 at 1:27 PM, Thomas Schwinge <thomas@codesourcery.com> wrote: > Hi! > > On Thu, 19 Apr 2012 19:46:17 +0200, I wrote: >> Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c: >> >> extern void abort (void); >> >> enum tree_code >> { >> BIND_EXPR, >> }; >> struct tree_common >> { >> enum tree_code code:8; >> }; >> void >> voidify_wrapper_expr (struct tree_common *common) >> { >> switch (common->code) >> { >> case BIND_EXPR: >> if (common->code != BIND_EXPR) >> abort (); >> } >> } >> >> The diff for -fdump-tree-all between compiling with >> -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins >> with the following, right in 20030922-1.c.003t.original: >> >> diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original >> --- fnsvb/20030922-1.c.003t.original 2012-04-19 16:51:18.322150866 +0200 >> +++ fsvb/20030922-1.c.003t.original 2012-04-19 16:49:18.132088498 +0200 >> @@ -7,7 +7,7 @@ >> switch ((int) common->code) >> { >> case 0:; >> - if (common->code != 0) >> + if ((BIT_FIELD_REF <*common, 32, 0> & 255) != 0) >> { >> abort (); >> } >> >> That is, for -fno-strict-volatile-bitfields the second instance of >> »common->code« it is a component_ref, whereas for >> -fstrict-volatile-bitfields it is a bit_field_ref. The former will be >> optimized as intended, the latter will not. So, what should be emitted >> in the -fstrict-volatile-bitfields case? A component_ref -- but this is >> what Bernd's commit r182545 (for PR51200) changed, I think? Or should >> later optimization stages learn to better deal with a bit_field_ref, and >> where would this have to happen? > > I figured out why this generic code is behaving differently for SH > targets. I compared to ARM as well as x86, for which indeed the > optimization possibility is not lost even when compiling with > -fstrict-volatile-bitfields. > > The component_ref inside the if statement (but not the one in the switch > statement) is fed into optimize_bit_field_compare where it is optimized > to »BIT_FIELD_REF <*common, 32, 0> & 255« only for SH, because > get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has > »#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to > SImode, hoping for better code generation »since it may eliminate > subsequent memory access if subsequent accesses occur to other fields in > the same word of the structure, but to different bytes«. (Well, the > converse happens here...) After that, in optimize_bit_field_compare, for > ARM and x86, »nbitsize == lbitsize«, and thus the optimization is > aborted, whereas for SH it will be performed, that is, the component_ref > is replaced with »BIT_FIELD_REF <*common, 32, 0> & 255«. > > If I manually force SH's SImode back to QImode (that is, abort > optimize_bit_field_compare), the later optimization passes work as they > do for ARM and x86. > > Before Bernd's r182545, optimize_bit_field_compare returned early (that > is, aborted this optimization attempt), because »lbitsize == > GET_MODE_BITSIZE (lmode)« (8 bits). (With the current sources, lmode is > VOIDmode.) > > Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case, > or should a later optimization pass be able to figure out that > »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as > common->code, and then be able to conflate these? Any suggestions > where/how to tackle this? The BIT_FIELD_REF is somewhat of a red-herring. It is created by fold-const.c in optimize_bit_field_compare, code that I think should be removed completely. Or it needs to be made aware of strict-volatile bitfield and C++ memory model details. Richard. > > Grüße, > Thomas
diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original --- fnsvb/20030922-1.c.003t.original 2012-04-19 16:51:18.322150866 +0200 +++ fsvb/20030922-1.c.003t.original 2012-04-19 16:49:18.132088498 +0200 @@ -7,7 +7,7 @@ switch ((int) common->code) { case 0:; - if (common->code != 0) + if ((BIT_FIELD_REF <*common, 32, 0> & 255) != 0) { abort (); }