diff mbox

PR 45819 - possible fix?

Message ID 201107210357.p6L3vW4T032314@greed.delorie.com
State New
Headers show

Commit Message

DJ Delorie July 21, 2011, 3:57 a.m. UTC
In this PR, a cast to a volatile type is lost during forwprop1,
resulting in the wrong access semantics being used for a memory-mapped
peripheral register.  Checking for loss of volatile in this patch
solves the problem, but I don't know if this is the right place to put
such a test - it could go in tree-ssa-forwprop.c also, to be specific
to this bug.

Comments?

Comments

Richard Biener July 21, 2011, 9:11 a.m. UTC | #1
On Thu, Jul 21, 2011 at 5:57 AM, DJ Delorie <dj@redhat.com> wrote:
>
> In this PR, a cast to a volatile type is lost during forwprop1,
> resulting in the wrong access semantics being used for a memory-mapped
> peripheral register.  Checking for loss of volatile in this patch
> solves the problem, but I don't know if this is the right place to put
> such a test - it could go in tree-ssa-forwprop.c also, to be specific
> to this bug.
>
> Comments?

The patch is not correct, it papers over a problem elsewhere (maybe
in forwprop).

I can't btw reproduce the issue on either the 4.5 or the 4.6 branch or trunk
with the testcase from the PR.  I always get

  v_2 ={v} st_1(D)->ptr;

in the .optimized tree dump which correctly states this is a volatile load.

So - if there is an issue on some target then it is to be found in either
RTL expansion code or target specific code, not in the tree optimizers.

On x86_64 I again see a correct

(insn 6 5 0 t.c:7 (set (reg/v:SI 58 [ v ])
        (mem/s/v:SI (reg/v/f:DI 60 [ st ]) [3 st_1(D)->ptr+0 S4 A8])) -1 (nil))

btw. (notice MEM_VOLATILE).

Richard.

> Index: tree-ssa.c
> ===================================================================
> --- tree-ssa.c  (revision 176495)
> +++ tree-ssa.c  (working copy)
> @@ -1274,12 +1274,18 @@ useless_type_conversion_p (tree outer_ty
>
>       /* If the outer type is (void *), the conversion is not necessary.  */
>       if (VOID_TYPE_P (TREE_TYPE (outer_type)))
>        return true;
>     }
>
> +  /* Do not lose casts to volatile qualified types.  */
> +  if ((TYPE_VOLATILE (outer_type)
> +       != TYPE_VOLATILE (inner_type))
> +      && TYPE_VOLATILE (outer_type))
> +    return false;
> +
>   /* From now on qualifiers on value types do not matter.  */
>   inner_type = TYPE_MAIN_VARIANT (inner_type);
>   outer_type = TYPE_MAIN_VARIANT (outer_type);
>
>   if (inner_type == outer_type)
>     return true;
>
DJ Delorie July 21, 2011, 5:21 p.m. UTC | #2
> The patch is not correct, it papers over a problem elsewhere (maybe
> in forwprop).
> 
> I can't btw reproduce the issue on either the 4.5 or the 4.6 branch or trunk
> with the testcase from the PR.  I always get
> 
>   v_2 ={v} st_1(D)->ptr;
> 
> in the .optimized tree dump which correctly states this is a volatile load.

x86 isn't strict_align so the original problem doesn't happen.  With
armv7hl-redhat-linux-gnueabi it happens with this test case (from the
PR):

struct ehci_regs {                                                              
char x;                                                                         
unsigned int port_status[0];                                                    
} __attribute__ ((packed));                                                     
//} __attribute__ ((packed,aligned(__alignof__(int))));                         

struct ehci_hcd{                                                                
struct ehci_regs *regs;                                                         
};                                                                              

int ehci_hub_control (                                                          
 struct ehci_hcd *ehci,                                                         
 int wIndex                                                                     
) {                                                                             
 unsigned int *status_reg = &ehci->regs->port_status[wIndex];                   
 return *(volatile unsigned int *)status_reg;                                   
}

Here's the 45819.c.023t.ccp1 dump:

;; Function ehci_hub_control (ehci_hub_control)

ehci_hub_control (struct ehci_hcd * ehci, int wIndex)
{
  unsigned int * status_reg;
  volatile unsigned int D.2015;
  int D.2014;
  unsigned int D.2013;
  unsigned int wIndex.0;
  unsigned int[0:] * D.2011;
  struct ehci_regs * D.2010;

<bb 2>:
  D.2010_2 = ehci_1(D)->regs;
  D.2011_3 = &D.2010_2->port_status;
  wIndex.0_5 = (unsigned int) wIndex_4(D);
  D.2013_6 = wIndex.0_5 * 4;
  status_reg_7 = D.2011_3 + D.2013_6;
  D.2015_8 ={v} MEM[(volatile unsigned int *)status_reg_7];
  D.2014_9 = (int) D.2015_8;
  return D.2014_9;

}

The problem happens when status_reg_7 is replaced in the D.2015_9
assignment.  I've attached the relevent before/after lhs/rhs trees.
By replacing the address with the previously computed value, gcc seems
to end up with alignment info that the user was trying to get rid of,
so gcc produces four byte-loads instead of a single word-load.

024t.forwprop1 shows this:

  D.2015_8 = MEM[(volatile unsigned int *)D.2010_2].port_status[wIndex.0_5]{lb: 0 sz: 4};

It looks like the "volatile unsigned int *" semantics now happen
*before* the structure reference semantics, instead of after, and the
alignment of the structure field takes precedence over the volatile,
instead of the other way around.  At least, that's what it looks like
to me ;-)


 <ssa_name 0x7f873559e3c8
    type <pointer_type 0x7f87354cfb28
        type <integer_type 0x7f87354bc540 unsigned int public unsigned SI
            size <integer_cst 0x7f87354a8708 constant 32>
            unit size <integer_cst 0x7f87354a8410 constant 4>
            align 32 symtab 0 alias set -1 canonical type 0x7f87354bc540 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
            pointer_to_this <pointer_type 0x7f87354cfb28>>
        sizes-gimplified public unsigned SI size <integer_cst 0x7f87354a8708 32> unit size <integer_cst 0x7f87354a8410 4>
        align 32 symtab 0 alias set -1 canonical type 0x7f87354cfb28>
    var <var_decl 0x7f873559c000 status_reg>def_stmt status_reg_7 = &D.2010_2->port_status[wIndex.0_5]{lb: 0 sz: 4};

    version 7>


 <addr_expr 0x7f8735599750
    type <pointer_type 0x7f87354cfb28
        type <integer_type 0x7f87354bc540 unsigned int public unsigned SI
            size <integer_cst 0x7f87354a8708 constant 32>
            unit size <integer_cst 0x7f87354a8410 constant 4>
            align 32 symtab 0 alias set -1 canonical type 0x7f87354bc540 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
            pointer_to_this <pointer_type 0x7f87354cfb28>>
        sizes-gimplified public unsigned SI size <integer_cst 0x7f87354a8708 32> unit size <integer_cst 0x7f87354a8410 4>
        align 32 symtab 0 alias set -1 canonical type 0x7f87354cfb28>
   
    arg 0 <array_ref 0x7f8735587510 type <integer_type 0x7f87354bc540 unsigned int>
       
        arg 0 <component_ref 0x7f873559d200 type <array_type 0x7f87355852a0>
           
            arg 0 <mem_ref 0x7f873b8cf460 type <record_type 0x7f87355850a8 ehci_regs>
               
                arg 0 <ssa_name 0x7f873559e210 type <pointer_type 0x7f87355853f0>
                    var <var_decl 0x7f873559c0a0 D.2010>def_stmt D.2010_2 = ehci_1(D)->regs;

                    version 2>
                arg 1 <integer_cst 0x7f8735579618 constant 0>
                45819.c:15:40> arg 1 <field_decl 0x7f87354b1b48 port_status>
            45819.c:15:40>
        arg 1 <ssa_name 0x7f873559e318 type <integer_type 0x7f87354bc540 unsigned int>
            var <var_decl 0x7f873559c1e0 wIndex.0>def_stmt wIndex.0_5 = (unsigned int) wIndex_4(D);

            version 5>
        arg 2 <integer_cst 0x7f87354a8c30 constant 0>>>


 <ssa_name 0x7f873559e420
    type <integer_type 0x7f8735585690 unsigned int volatile unsigned SI
        size <integer_cst 0x7f87354a8708 constant 32>
        unit size <integer_cst 0x7f87354a8410 constant 4>
        align 32 symtab 0 alias set -1 canonical type 0x7f8735585690 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
        pointer_to_this <pointer_type 0x7f8735585738>>
    var <var_decl 0x7f873559c3c0 D.2015>def_stmt D.2015_8 ={v} MEM[(volatile unsigned int *)status_reg_7];

    version 8>


 <mem_ref 0x7f873b8cf380
    type <integer_type 0x7f8735585690 unsigned int volatile unsigned SI
        size <integer_cst 0x7f87354a8708 constant 32>
        unit size <integer_cst 0x7f87354a8410 constant 4>
        align 32 symtab 0 alias set -1 canonical type 0x7f8735585690 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
        pointer_to_this <pointer_type 0x7f8735585738>>
    side-effects volatile
    arg 0 <ssa_name 0x7f873559e3c8
        type <pointer_type 0x7f87354cfb28 type <integer_type 0x7f87354bc540 unsigned int>
            sizes-gimplified public unsigned SI size <integer_cst 0x7f87354a8708 32> unit size <integer_cst 0x7f87354a8410 4>
            align 32 symtab 0 alias set -1 canonical type 0x7f87354cfb28>
        var <var_decl 0x7f873559c000 status_reg>def_stmt status_reg_7 = &D.2010_2->port_status[wIndex.0_5]{lb: 0 sz: 4};

        version 7>
    arg 1 <integer_cst 0x7f8735579668 type <pointer_type 0x7f8735585738> constant 0>
    45819.c:16:9>
Richard Biener July 22, 2011, 8:16 a.m. UTC | #3
On Thu, Jul 21, 2011 at 7:21 PM, DJ Delorie <dj@redhat.com> wrote:
>
>> The patch is not correct, it papers over a problem elsewhere (maybe
>> in forwprop).
>>
>> I can't btw reproduce the issue on either the 4.5 or the 4.6 branch or trunk
>> with the testcase from the PR.  I always get
>>
>>   v_2 ={v} st_1(D)->ptr;
>>
>> in the .optimized tree dump which correctly states this is a volatile load.
>
> x86 isn't strict_align so the original problem doesn't happen.  With
> armv7hl-redhat-linux-gnueabi it happens with this test case (from the
> PR):
>
> struct ehci_regs {
> char x;
> unsigned int port_status[0];
> } __attribute__ ((packed));
> //} __attribute__ ((packed,aligned(__alignof__(int))));
>
> struct ehci_hcd{
> struct ehci_regs *regs;
> };
>
> int ehci_hub_control (
>  struct ehci_hcd *ehci,
>  int wIndex
> ) {
>  unsigned int *status_reg = &ehci->regs->port_status[wIndex];
>  return *(volatile unsigned int *)status_reg;
> }
>
> Here's the 45819.c.023t.ccp1 dump:
>
> ;; Function ehci_hub_control (ehci_hub_control)
>
> ehci_hub_control (struct ehci_hcd * ehci, int wIndex)
> {
>  unsigned int * status_reg;
>  volatile unsigned int D.2015;
>  int D.2014;
>  unsigned int D.2013;
>  unsigned int wIndex.0;
>  unsigned int[0:] * D.2011;
>  struct ehci_regs * D.2010;
>
> <bb 2>:
>  D.2010_2 = ehci_1(D)->regs;
>  D.2011_3 = &D.2010_2->port_status;
>  wIndex.0_5 = (unsigned int) wIndex_4(D);
>  D.2013_6 = wIndex.0_5 * 4;
>  status_reg_7 = D.2011_3 + D.2013_6;
>  D.2015_8 ={v} MEM[(volatile unsigned int *)status_reg_7];
>  D.2014_9 = (int) D.2015_8;
>  return D.2014_9;
>
> }
>
> The problem happens when status_reg_7 is replaced in the D.2015_9
> assignment.  I've attached the relevent before/after lhs/rhs trees.
> By replacing the address with the previously computed value, gcc seems
> to end up with alignment info that the user was trying to get rid of,
> so gcc produces four byte-loads instead of a single word-load.
>
> 024t.forwprop1 shows this:
>
>  D.2015_8 = MEM[(volatile unsigned int *)D.2010_2].port_status[wIndex.0_5]{lb: 0 sz: 4};
>
> It looks like the "volatile unsigned int *" semantics now happen
> *before* the structure reference semantics, instead of after, and the
> alignment of the structure field takes precedence over the volatile,
> instead of the other way around.  At least, that's what it looks like
> to me ;-)

No, that looks simply like a forwprop bug - I'll have a look.

Thanks,
Richard.

>
>  <ssa_name 0x7f873559e3c8
>    type <pointer_type 0x7f87354cfb28
>        type <integer_type 0x7f87354bc540 unsigned int public unsigned SI
>            size <integer_cst 0x7f87354a8708 constant 32>
>            unit size <integer_cst 0x7f87354a8410 constant 4>
>            align 32 symtab 0 alias set -1 canonical type 0x7f87354bc540 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
>            pointer_to_this <pointer_type 0x7f87354cfb28>>
>        sizes-gimplified public unsigned SI size <integer_cst 0x7f87354a8708 32> unit size <integer_cst 0x7f87354a8410 4>
>        align 32 symtab 0 alias set -1 canonical type 0x7f87354cfb28>
>    var <var_decl 0x7f873559c000 status_reg>def_stmt status_reg_7 = &D.2010_2->port_status[wIndex.0_5]{lb: 0 sz: 4};
>
>    version 7>
>
>
>  <addr_expr 0x7f8735599750
>    type <pointer_type 0x7f87354cfb28
>        type <integer_type 0x7f87354bc540 unsigned int public unsigned SI
>            size <integer_cst 0x7f87354a8708 constant 32>
>            unit size <integer_cst 0x7f87354a8410 constant 4>
>            align 32 symtab 0 alias set -1 canonical type 0x7f87354bc540 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
>            pointer_to_this <pointer_type 0x7f87354cfb28>>
>        sizes-gimplified public unsigned SI size <integer_cst 0x7f87354a8708 32> unit size <integer_cst 0x7f87354a8410 4>
>        align 32 symtab 0 alias set -1 canonical type 0x7f87354cfb28>
>
>    arg 0 <array_ref 0x7f8735587510 type <integer_type 0x7f87354bc540 unsigned int>
>
>        arg 0 <component_ref 0x7f873559d200 type <array_type 0x7f87355852a0>
>
>            arg 0 <mem_ref 0x7f873b8cf460 type <record_type 0x7f87355850a8 ehci_regs>
>
>                arg 0 <ssa_name 0x7f873559e210 type <pointer_type 0x7f87355853f0>
>                    var <var_decl 0x7f873559c0a0 D.2010>def_stmt D.2010_2 = ehci_1(D)->regs;
>
>                    version 2>
>                arg 1 <integer_cst 0x7f8735579618 constant 0>
>                45819.c:15:40> arg 1 <field_decl 0x7f87354b1b48 port_status>
>            45819.c:15:40>
>        arg 1 <ssa_name 0x7f873559e318 type <integer_type 0x7f87354bc540 unsigned int>
>            var <var_decl 0x7f873559c1e0 wIndex.0>def_stmt wIndex.0_5 = (unsigned int) wIndex_4(D);
>
>            version 5>
>        arg 2 <integer_cst 0x7f87354a8c30 constant 0>>>
>
>
>  <ssa_name 0x7f873559e420
>    type <integer_type 0x7f8735585690 unsigned int volatile unsigned SI
>        size <integer_cst 0x7f87354a8708 constant 32>
>        unit size <integer_cst 0x7f87354a8410 constant 4>
>        align 32 symtab 0 alias set -1 canonical type 0x7f8735585690 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
>        pointer_to_this <pointer_type 0x7f8735585738>>
>    var <var_decl 0x7f873559c3c0 D.2015>def_stmt D.2015_8 ={v} MEM[(volatile unsigned int *)status_reg_7];
>
>    version 8>
>
>
>  <mem_ref 0x7f873b8cf380
>    type <integer_type 0x7f8735585690 unsigned int volatile unsigned SI
>        size <integer_cst 0x7f87354a8708 constant 32>
>        unit size <integer_cst 0x7f87354a8410 constant 4>
>        align 32 symtab 0 alias set -1 canonical type 0x7f8735585690 precision 32 min <integer_cst 0x7f87354a8730 0> max <integer_cst 0x7f87354a86e0 4294967295>
>        pointer_to_this <pointer_type 0x7f8735585738>>
>    side-effects volatile
>    arg 0 <ssa_name 0x7f873559e3c8
>        type <pointer_type 0x7f87354cfb28 type <integer_type 0x7f87354bc540 unsigned int>
>            sizes-gimplified public unsigned SI size <integer_cst 0x7f87354a8708 32> unit size <integer_cst 0x7f87354a8410 4>
>            align 32 symtab 0 alias set -1 canonical type 0x7f87354cfb28>
>        var <var_decl 0x7f873559c000 status_reg>def_stmt status_reg_7 = &D.2010_2->port_status[wIndex.0_5]{lb: 0 sz: 4};
>
>        version 7>
>    arg 1 <integer_cst 0x7f8735579668 type <pointer_type 0x7f8735585738> constant 0>
>    45819.c:16:9>
>
Richard Biener July 22, 2011, 9:47 a.m. UTC | #4
On Fri, Jul 22, 2011 at 10:16 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 7:21 PM, DJ Delorie <dj@redhat.com> wrote:
>>
>>> The patch is not correct, it papers over a problem elsewhere (maybe
>>> in forwprop).
>>>
>>> I can't btw reproduce the issue on either the 4.5 or the 4.6 branch or trunk
>>> with the testcase from the PR.  I always get
>>>
>>>   v_2 ={v} st_1(D)->ptr;
>>>
>>> in the .optimized tree dump which correctly states this is a volatile load.
>>
>> x86 isn't strict_align so the original problem doesn't happen.  With
>> armv7hl-redhat-linux-gnueabi it happens with this test case (from the
>> PR):
>>
>> struct ehci_regs {
>> char x;
>> unsigned int port_status[0];
>> } __attribute__ ((packed));
>> //} __attribute__ ((packed,aligned(__alignof__(int))));
>>
>> struct ehci_hcd{
>> struct ehci_regs *regs;
>> };
>>
>> int ehci_hub_control (
>>  struct ehci_hcd *ehci,
>>  int wIndex
>> ) {
>>  unsigned int *status_reg = &ehci->regs->port_status[wIndex];
>>  return *(volatile unsigned int *)status_reg;
>> }
>>
>> Here's the 45819.c.023t.ccp1 dump:
>>
>> ;; Function ehci_hub_control (ehci_hub_control)
>>
>> ehci_hub_control (struct ehci_hcd * ehci, int wIndex)
>> {
>>  unsigned int * status_reg;
>>  volatile unsigned int D.2015;
>>  int D.2014;
>>  unsigned int D.2013;
>>  unsigned int wIndex.0;
>>  unsigned int[0:] * D.2011;
>>  struct ehci_regs * D.2010;
>>
>> <bb 2>:
>>  D.2010_2 = ehci_1(D)->regs;
>>  D.2011_3 = &D.2010_2->port_status;
>>  wIndex.0_5 = (unsigned int) wIndex_4(D);
>>  D.2013_6 = wIndex.0_5 * 4;
>>  status_reg_7 = D.2011_3 + D.2013_6;
>>  D.2015_8 ={v} MEM[(volatile unsigned int *)status_reg_7];
>>  D.2014_9 = (int) D.2015_8;
>>  return D.2014_9;
>>
>> }
>>
>> The problem happens when status_reg_7 is replaced in the D.2015_9
>> assignment.  I've attached the relevent before/after lhs/rhs trees.
>> By replacing the address with the previously computed value, gcc seems
>> to end up with alignment info that the user was trying to get rid of,
>> so gcc produces four byte-loads instead of a single word-load.
>>
>> 024t.forwprop1 shows this:
>>
>>  D.2015_8 = MEM[(volatile unsigned int *)D.2010_2].port_status[wIndex.0_5]{lb: 0 sz: 4};
>>
>> It looks like the "volatile unsigned int *" semantics now happen
>> *before* the structure reference semantics, instead of after, and the
>> alignment of the structure field takes precedence over the volatile,
>> instead of the other way around.  At least, that's what it looks like
>> to me ;-)
>
> No, that looks simply like a forwprop bug - I'll have a look.

I'm testing the following.

Richard.

2011-07-22  Richard Guenther  <rguenther@suse.de>

        PR tree-optimization/45819
        * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Properly
        preserve volatile and notrap flags.

        * gcc.dg/pr45819.c: New testcase.
DJ Delorie July 23, 2011, 5:30 a.m. UTC | #5
> I'm testing the following.
> 
> Richard.
> 
> 2011-07-22  Richard Guenther  <rguenther@suse.de>
> 
>         PR tree-optimization/45819
>         * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Properly
>         preserve volatile and notrap flags.
> 
>         * gcc.dg/pr45819.c: New testcase.

I built an arm kernel with it, the USB stuff works.
DJ Delorie July 23, 2011, 6:02 a.m. UTC | #6
> I built an arm kernel with it, the USB stuff works.

I take that back - booted the wrong kernel.  Didn't work, but I'll
confirm that I'm building with a patched gcc (too many kernels and
compilers to keep track of ;)
Richard Biener July 23, 2011, 10:03 a.m. UTC | #7
On Sat, Jul 23, 2011 at 8:02 AM, DJ Delorie <dj@redhat.com> wrote:
>
>> I built an arm kernel with it, the USB stuff works.
>
> I take that back - booted the wrong kernel.  Didn't work, but I'll
> confirm that I'm building with a patched gcc (too many kernels and
> compilers to keep track of ;)

Yeah, the testcase is invalid - that we lost the volatile was a bug, but
you really have to fix the kernel.

Richard.
DJ Delorie July 23, 2011, 4:29 p.m. UTC | #8
> Yeah, the testcase is invalid - that we lost the volatile was a bug, but
> you really have to fix the kernel.

Sadly, that's not a helpful suggestion.  How else should the kernel
force a word-sized read?  I thought volatile was the way to tell gcc
"do exactly what I tell you to do" ?

And the kernel does produce the right code at that particular spot
now, but for some reason the usb module still doesn't work.  We'll
have to debug it some more.  The same code works just fine with gcc
4.4.
Richard Biener July 24, 2011, 10:41 a.m. UTC | #9
On Sat, Jul 23, 2011 at 6:29 PM, DJ Delorie <dj@redhat.com> wrote:
>
>> Yeah, the testcase is invalid - that we lost the volatile was a bug, but
>> you really have to fix the kernel.
>
> Sadly, that's not a helpful suggestion.  How else should the kernel
> force a word-sized read?  I thought volatile was the way to tell gcc
> "do exactly what I tell you to do" ?

Huh?  No.  Volatile means "preserve this load/store".  If you want
to force it to assume the memory is aligned then you have to do
that during RTL expansion or in the target.

> And the kernel does produce the right code at that particular spot
> now, but for some reason the usb module still doesn't work.  We'll
> have to debug it some more.  The same code works just fine with gcc
> 4.4.

Fact is that GCC knows that memory is not properly aligned.  Iff arm
is truly a strict-align target then an aligned load/store from such memory
will fault.  And I seriously doubt you get the int member magically aligned
by "luck" (thus, misaligning the first char member by exactly 3 bytes).

So either arm should not lie and claim it's strict-align or you need to
fix the kernel anyway.

Richard.
DJ Delorie July 25, 2011, 5:05 p.m. UTC | #10
> Fact is that GCC knows that memory is not properly aligned.

So in the impossibly rare case that gcc is *wrong*, how is the
programmer supposed to tell gcc that?  I mean, gcc 4.4 has been doing
what the programmer wanted, and zillions of ARM devices have been
happily working, and now you tell me they should have been segfaulting
for the last N years.  Surely there's a chance that the ARM developers
know what they're talking about, and have been desperately trying to
convince gcc to stop trying to second-guess them?

I mean, what else should the user expect when they cast a random value
to a "volatile uint32_t *" and derefence it?  I would have expected
gcc to preserve the load *exactly* as the user specified it, not
convert that one load into FOUR loads.
Richard Biener July 26, 2011, 7:34 a.m. UTC | #11
On Mon, Jul 25, 2011 at 7:05 PM, DJ Delorie <dj@redhat.com> wrote:
>
>> Fact is that GCC knows that memory is not properly aligned.
>
> So in the impossibly rare case that gcc is *wrong*, how is the
> programmer supposed to tell gcc that?  I mean, gcc 4.4 has been doing
> what the programmer wanted, and zillions of ARM devices have been
> happily working, and now you tell me they should have been segfaulting
> for the last N years.  Surely there's a chance that the ARM developers
> know what they're talking about, and have been desperately trying to
> convince gcc to stop trying to second-guess them?

So don't lie to GCC then?  You specify

struct X { char c; int i; } __attribute__((packed)) x;

and expect that GCC knows x.i is aligned to 4 bytes!?  It sees the access
to x.i, sees you are on a strict-align target and makes sure your code does
not fault.  How is that not a sane default behavior?

> I mean, what else should the user expect when they cast a random value
> to a "volatile uint32_t *" and derefence it?  I would have expected
> gcc to preserve the load *exactly* as the user specified it, not
> convert that one load into FOUR loads.

Well, then provide a proper struct definition, I suppose adding some
head padding and adjusting the pointer in ehci->regs would do the trick.

Or declare it a bug in volatile handling (which is, after all, not
very well defined)
and simply throw away any alignment information we have in that case
(which would make it an expander bug I guess, unless the arm target does
something special here).

Richard.
DJ Delorie July 26, 2011, 3:01 p.m. UTC | #12
> So don't lie to GCC then?  You specify
> 
> struct X { char c; int i; } __attribute__((packed)) x;
> 
> and expect that GCC knows x.i is aligned to 4 bytes!?

The actual header is much more complex than this trivial example.

It also fails with this example, where the port_status[] array *is*
obviously aligned, but the "packed" attribute *also* makes gcc think
the *structure* is misaligned, which is not the case:

struct ehci_regs {                                                              
	char x;                                                                         
	short y;                                                                         
	char z;                                                                         
	unsigned int port_status[0];                                                    
} __attribute__ ((packed));                                                     

The line that fails is the next one:

	return *(volatile unsigned int *)status_reg;                                   

The user has explicitly told gcc that the pointer is a valid
pointer-to-int.  How else can the user tell gcc it's wrong about
alignment?  I mean, without changing zillions of released kernel
header files?

> Or declare it a bug in volatile handling (which is, after all, not
> very well defined)

I've been working on "fixing" volatile to mean "do what I tell you"
but historically, "volatile" has had lots of leeway in gcc.

> and simply throw away any alignment information we have in that case
> (which would make it an expander bug I guess, unless the arm target does
> something special here).

Various targets choose to honor volatile over gcc's natural bitfield
access rules, and yes, we're doing it in the expander.  Other volatile
access rules could be applied there too.
diff mbox

Patch

Index: tree-ssa.c
===================================================================
--- tree-ssa.c	(revision 176495)
+++ tree-ssa.c	(working copy)
@@ -1274,12 +1274,18 @@  useless_type_conversion_p (tree outer_ty
 
       /* If the outer type is (void *), the conversion is not necessary.  */
       if (VOID_TYPE_P (TREE_TYPE (outer_type)))
 	return true;
     }
 
+  /* Do not lose casts to volatile qualified types.  */
+  if ((TYPE_VOLATILE (outer_type)
+       != TYPE_VOLATILE (inner_type))
+      && TYPE_VOLATILE (outer_type))
+    return false;
+
   /* From now on qualifiers on value types do not matter.  */
   inner_type = TYPE_MAIN_VARIANT (inner_type);
   outer_type = TYPE_MAIN_VARIANT (outer_type);
 
   if (inner_type == outer_type)
     return true;