Message ID | DUB118-W252167753CCC32AFAEE0C6E41F0@phx.gbl |
---|---|
State | New |
Headers | show |
On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > Hi, > > On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote: >> >> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger >> <bernd.edlinger@hotmail.de> wrote: >>> bounced... again, without html. >>> >>> >>> Hi Richard, >>> >>> while working on another bug in the area of -fstrict-volatile-bitfields >>> I became aware of another example where -fstrict-volatile-bitfields may generate >>> wrong code. This is reproducible on a !STRICT_ALIGNMENT target like x86_64. >>> >>> The problem is that strict_volatile_bitfield_p tries to allow more than necessary >>> if !STRICT_ALIGNMENT. Everything works OK on ARM for instance. >>> >>> If this function returns true, we may later call narrow_bit_field_mem, and >>> the check in strict_volatile_bitfield_p should mirror the logic there: >>> narrow_bit_field_mem just uses GET_MODE_BITSIZE (mode) and does not >>> care about STRICT_ALIGNMENT, and in the end *new_bitnum + bitsize may >>> reach beyond the end of the region. This causes store_fixed_bit_field_1 >>> to silently fail to generate correct code. >> >> Hmm, but the comment sounds like if using GET_MODE_ALIGNMENT is >> more correct (even for !strict-alignment) - if mode is SImode and mode >> alignment is 16 (HImode aligned) then we don't need to split the load >> if bitnum is 16 and bitsize is 32. >> >> So maybe narrow_bit_field_mem needs to be fixed as well? >> > > I'd rather not touch that function.... > > In the whole expmed.c the only place where GET_MODE_ALIGNMENT > is used, is in simple_mem_bitfield_p but only if SLOW_UNALIGNED_ACCESS > returns 1, which is only the case on few targets. > Do you know any targets, where GET_MODE_ALIGNMENT may be less than > GET_MODE_BITSIZE? DImode on i?86, I suppose any mode on targets like AVR. > Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure. > > Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode) > Because the strict volatile bitfields handling will inevitably try to use > the fieldmode to access the memory. > > Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode), > because it is easier to explain when some one asks, when we guarantee the semantics > of strict volatile bitfield? But on non-strict-align targets you can even for 1-byte aligned MEMs access an SImode field directly. So the old code looks correct to me here and the fix needs to be done somewhere else. > Probably there is already something in the logic in expr.c that prevents these cases, > because otherwise it would be way to easy to find an example for unaligned accesses > to unaligned memory on STRICT_ALIGNMENT targets. > > > Ok, what would you think about this variant? > > --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100 > +++ expmed.c 2015-03-05 11:50:09.400444416 +0100 > @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns > return false; > > /* Check for cases of unaligned fields that must be split. */ > - if (bitnum % BITS_PER_UNIT + bitsize> modesize > - || (STRICT_ALIGNMENT > - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize)) > + if (bitnum % modesize + bitsize> modesize) > + return false; > + > + /* Check that the memory is sufficiently aligned. */ > + if (MEM_ALIGN (op0) < modesize) I think that only applies to strict-align targets and then only for GET_MODE_ALIGNMENT (modesize). And of course what matters is the alignment at bitnum - even though op0 may be not sufficiently aligned it may have known misalignment so that op0 + bitnum is sufficiently aligned. Testcases would need to annotate structs with packed/aligned attributes to get at these cases. For the testcase included in the patch, what does the patch end up doing? Not going the strict-volatile bitfield expansion path? That looks unnecessary on !strict-alignment targets but resonable on strict-align targets where the access would need to be splitted. So, why does it end up being splitted on !strict-align targets? Richard. > return false; > > /* Check for cases where the C++ memory model applies. */ > > > Trying to use an atomic access to a device register is pointless if the > memory context is not aligned to the MODE_BITSIZE, that has nothing > to do with MODE_ALIGNMENT, right? > > > Thanks > Bernd. > >
Hi, On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote: > > On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger > <bernd.edlinger@hotmail.de> wrote: >> Hi, >> >> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote: >>> >>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger >> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure. >> >> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode) >> Because the strict volatile bitfields handling will inevitably try to use >> the fieldmode to access the memory. >> >> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode), >> because it is easier to explain when some one asks, when we guarantee the semantics >> of strict volatile bitfield? > > But on non-strict-align targets you can even for 1-byte aligned MEMs > access an SImode field directly. So the old code looks correct to me > here and the fix needs to be done somewhere else. > But this SImode access is split up in several QImode or HImode accesses, in the processor's execution pipeline, finally on an external bus like AXI all memory transactions are aligned. The difference is just that some processors can split up the unaligned accesses and some need help from the compiler, but even on an x86 we have a different semantics for unaligned acceses, that is they are no longer atomic, while an aligned access is always executed as an atomic transaction on an x86 processor. >> Probably there is already something in the logic in expr.c that prevents these cases, >> because otherwise it would be way to easy to find an example for unaligned accesses >> to unaligned memory on STRICT_ALIGNMENT targets. >> >> >> Ok, what would you think about this variant? >> >> --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100 >> +++ expmed.c 2015-03-05 11:50:09.400444416 +0100 >> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns >> return false; >> >> /* Check for cases of unaligned fields that must be split. */ >> - if (bitnum % BITS_PER_UNIT + bitsize> modesize >> - || (STRICT_ALIGNMENT >> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize)) >> + if (bitnum % modesize + bitsize> modesize) >> + return false; >> + >> + /* Check that the memory is sufficiently aligned. */ >> + if (MEM_ALIGN (op0) < modesize) > > I think that only applies to strict-align targets and then only for > GET_MODE_ALIGNMENT (modesize). And of course what matters > is the alignment at bitnum - even though op0 may be not sufficiently > aligned it may have known misalignment so that op0 + bitnum is > sufficiently aligned. > > Testcases would need to annotate structs with packed/aligned attributes > to get at these cases. > > For the testcase included in the patch, what does the patch end up doing? > Not going the strict-volatile bitfield expansion path? That looks unnecessary > on !strict-alignment targets but resonable on strict-align targets where the > access would need to be splitted. So, why does it end up being splitted > on !strict-align targets? > gcc -fstrict-volatile-bitfields -S 20150304-1.c without patch we find this in 20150304-1.s: main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movzbl global+1(%rip), %eax orl $-1, %eax movb %al, global+1(%rip) movzbl global+2(%rip), %eax orl $-1, %eax movb %al, global+2(%rip) movzbl global+3(%rip), %eax orl $-1, %eax movb %al, global+3(%rip) movzbl global+4(%rip), %eax orl $127, %eax movb %al, global+4(%rip) movl global(%rip), %eax shrl $8, %eax andl $2147483647, %eax cmpl $2147483647, %eax je .L2 call abort .L2: movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc so the write path is OK, because strict_volatile_bitfield_p returns false, because /* Check for cases where the C++ memory model applies. */ if (bitregion_end != 0 && (bitnum - bitnum % modesize < bitregion_start || bitnum - bitnum % modesize + modesize - 1> bitregion_end)) return false; bitregion_start=8, bitregion_end=39 bitnum - bitnum % modesize = 0 bitnum - bitnum % modesize + modesize - 1 = 31 this does not happen in the read path, and the problem is the access here does not work: str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum, &bitnum); /* Explicitly override the C/C++ memory model; ignore the bit range so that we can do the access in the mode mandated by -fstrict-volatile-bitfields instead. */ store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value); str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1 can not handle that. BTW: I can make the write code path also fail if I change this bit in the test case add ":8" to char x: struct s { char x:8; unsigned int y:31; } __attribute__((packed)); now the bit region is from 0..39, and we get this code: main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl global(%rip), %eax orl $-256, %eax movl %eax, global(%rip) movl global(%rip), %eax shrl $8, %eax andl $2147483647, %eax cmpl $2147483647, %eax je .L2 call abort .L2: movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc The patch (I think, I would still prefer my initial proposal) fixes this by skipping the strict alingment code path. That looks OK for me, because the structure is always packed if that happens, and it can't be used to access "device registers", which are by definition always naturally aligned, at least to the machine word size. with the patch we get: main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movzbl global+1(%rip), %eax orl $-1, %eax movb %al, global+1(%rip) movzbl global+2(%rip), %eax orl $-1, %eax movb %al, global+2(%rip) movzbl global+3(%rip), %eax orl $-1, %eax movb %al, global+3(%rip) movzbl global+4(%rip), %eax orl $127, %eax movb %al, global+4(%rip) movzbl global+1(%rip), %eax movzbl %al, %eax movzbl global+2(%rip), %edx movzbl %dl, %edx salq $8, %rdx orq %rax, %rdx movzbl global+3(%rip), %eax movzbl %al, %eax salq $16, %rax orq %rax, %rdx movzbl global+4(%rip), %eax movzbl %al, %eax andl $127, %eax salq $24, %rax orq %rdx, %rax cmpl $2147483647, %eax je .L2 call abort .L2: movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc every access is split in 4 QImode accesses. but that is as expected, because the structure is byte aligned. Thanks Bernd.
On Thu, Mar 5, 2015 at 4:05 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > Hi, > > On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote: >> >> On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger >> <bernd.edlinger@hotmail.de> wrote: >>> Hi, >>> >>> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote: >>>> >>>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger >>> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure. >>> >>> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode) >>> Because the strict volatile bitfields handling will inevitably try to use >>> the fieldmode to access the memory. >>> >>> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode), >>> because it is easier to explain when some one asks, when we guarantee the semantics >>> of strict volatile bitfield? >> >> But on non-strict-align targets you can even for 1-byte aligned MEMs >> access an SImode field directly. So the old code looks correct to me >> here and the fix needs to be done somewhere else. >> > > But this SImode access is split up in several QImode or HImode accesses, > in the processor's execution pipeline, finally on an external bus like AXI all memory > transactions are aligned. The difference is just that some processors can split up > the unaligned accesses and some need help from the compiler, but even on an > x86 we have a different semantics for unaligned acceses, that is they are no longer atomic, > while an aligned access is always executed as an atomic transaction on an x86 processor. > >>> Probably there is already something in the logic in expr.c that prevents these cases, >>> because otherwise it would be way to easy to find an example for unaligned accesses >>> to unaligned memory on STRICT_ALIGNMENT targets. >>> >>> >>> Ok, what would you think about this variant? >>> >>> --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100 >>> +++ expmed.c 2015-03-05 11:50:09.400444416 +0100 >>> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns >>> return false; >>> >>> /* Check for cases of unaligned fields that must be split. */ >>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize >>> - || (STRICT_ALIGNMENT >>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize)) >>> + if (bitnum % modesize + bitsize> modesize) >>> + return false; >>> + >>> + /* Check that the memory is sufficiently aligned. */ >>> + if (MEM_ALIGN (op0) < modesize) >> >> I think that only applies to strict-align targets and then only for >> GET_MODE_ALIGNMENT (modesize). And of course what matters >> is the alignment at bitnum - even though op0 may be not sufficiently >> aligned it may have known misalignment so that op0 + bitnum is >> sufficiently aligned. >> >> Testcases would need to annotate structs with packed/aligned attributes >> to get at these cases. >> >> For the testcase included in the patch, what does the patch end up doing? >> Not going the strict-volatile bitfield expansion path? That looks unnecessary >> on !strict-alignment targets but resonable on strict-align targets where the >> access would need to be splitted. So, why does it end up being splitted >> on !strict-align targets? >> > > gcc -fstrict-volatile-bitfields -S 20150304-1.c > without patch we find this in 20150304-1.s: > > main: > .LFB0: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > movzbl global+1(%rip), %eax > orl $-1, %eax > movb %al, global+1(%rip) > movzbl global+2(%rip), %eax > orl $-1, %eax > movb %al, global+2(%rip) > movzbl global+3(%rip), %eax > orl $-1, %eax > movb %al, global+3(%rip) > movzbl global+4(%rip), %eax > orl $127, %eax > movb %al, global+4(%rip) > movl global(%rip), %eax > shrl $8, %eax > andl $2147483647, %eax > cmpl $2147483647, %eax > je .L2 > call abort > .L2: > movl $0, %eax > popq %rbp > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > > > so the write path is OK, because strict_volatile_bitfield_p returns false, > because > > /* Check for cases where the C++ memory model applies. */ > if (bitregion_end != 0 > && (bitnum - bitnum % modesize < bitregion_start > || bitnum - bitnum % modesize + modesize - 1> bitregion_end)) > return false; > > > bitregion_start=8, bitregion_end=39 > bitnum - bitnum % modesize = 0 > bitnum - bitnum % modesize + modesize - 1 = 31 > > > this does not happen in the read path, and the problem is the access > here does not work: > > str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum, > &bitnum); > /* Explicitly override the C/C++ memory model; ignore the > bit range so that we can do the access in the mode mandated > by -fstrict-volatile-bitfields instead. */ > store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value); > > > str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1 > can not handle that. > > BTW: I can make the write code path also fail if I change this bit > in the test case add ":8" to char x: > > struct s > { > char x:8; > unsigned int y:31; > } __attribute__((packed)); > > now the bit region is from 0..39, and we get this code: > > main: > .LFB0: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > movl global(%rip), %eax > orl $-256, %eax > movl %eax, global(%rip) > movl global(%rip), %eax > shrl $8, %eax > andl $2147483647, %eax > cmpl $2147483647, %eax > je .L2 > call abort > .L2: > movl $0, %eax > popq %rbp > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > > > > The patch (I think, I would still prefer my initial proposal) fixes > this by skipping the strict alingment code path. That looks OK for me, > because the structure is always packed if that happens, and it can't > be used to access "device registers", which are by definition always > naturally aligned, at least to the machine word size. > > > with the patch we get: > > main: > .LFB0: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > movzbl global+1(%rip), %eax > orl $-1, %eax > movb %al, global+1(%rip) > movzbl global+2(%rip), %eax > orl $-1, %eax > movb %al, global+2(%rip) > movzbl global+3(%rip), %eax > orl $-1, %eax > movb %al, global+3(%rip) > movzbl global+4(%rip), %eax > orl $127, %eax > movb %al, global+4(%rip) > movzbl global+1(%rip), %eax > movzbl %al, %eax > movzbl global+2(%rip), %edx > movzbl %dl, %edx > salq $8, %rdx > orq %rax, %rdx > movzbl global+3(%rip), %eax > movzbl %al, %eax > salq $16, %rax > orq %rax, %rdx > movzbl global+4(%rip), %eax > movzbl %al, %eax > andl $127, %eax > salq $24, %rax > orq %rdx, %rax > cmpl $2147483647, %eax > je .L2 > call abort > .L2: > movl $0, %eax > popq %rbp > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > > > every access is split in 4 QImode accesses. but that is as > expected, because the structure is byte aligned. No, it is not expected because the CPU can handle unaligned SImode reads/writes just fine (even if not as an atomic operation). The C++ memory model allows an SImode read to s.y (-fstrict-volatile-bitfields would, as well, but the CPU doesn't guarantee atomic operation here) Richard. > > Thanks > Bernd. >
--- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100 +++ expmed.c 2015-03-05 11:50:09.400444416 +0100 @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns return false; /* Check for cases of unaligned fields that must be split. */ - if (bitnum % BITS_PER_UNIT + bitsize> modesize - || (STRICT_ALIGNMENT - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize)) + if (bitnum % modesize + bitsize> modesize) + return false; + + /* Check that the memory is sufficiently aligned. */ + if (MEM_ALIGN (op0) < modesize) return false; /* Check for cases where the C++ memory model applies. */