diff mbox

[expr.c] PR middle-end/71700: zero-extend sub-word value when widening constructor element

Message ID 57763554.2030807@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov July 1, 2016, 9:18 a.m. UTC
Hi all,

In this arm wrong-code PR the struct assignment goes wrong when expanding constructor elements to a register destination
when the constructor elements are signed bitfields less than a word wide.
In this testcase we're intialising a struct with a 16-bit signed bitfield to -1 followed by a 1-bit bitfield to 0.
Before it starts storing the elements it zeroes out the register.
  The code in store_constructor extends the first field to word size because it appears at the beginning of a word.
It sign-extends the -1 to word size. However, when it later tries to store the 0 to bitposition 16 it has some logic
to avoid redundant zeroing since the destination was originally cleared, so it doesn't emit the zero store.
But the previous sign-extended -1 took up the whole word, so the position of the second bitfield contains a set bit.

This patch fixes the problem by zeroing out the bits of the widened field that did not appear in the original value,
so that we can safely avoid storing the second zero in the constructor.

With this patch the testcase passes at all optimisation levels (it only passed at -O0 before).

The initialisation of the struct to {-1, 0} now emits the RTL:
(insn 5 2 6 2 (set (reg/v:SI 112 [ e ])
         (const_int 0 [0]))
      (nil))
(insn 6 5 7 2 (set (reg/v:SI 112 [ e ])
         (const_int 65535 [0xffff]))
      (nil))

whereas before it generated:
(insn 5 4 6 (set (reg/v:SI 112 [ e ])
         (const_int 0 [0]))
      (nil))

(insn 6 5 7 (set (reg/v:SI 112 [ e ])
         (const_int -1 [0xffffffffffffffff]))
      (nil))


Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is gated on WORD_REGISTER_OPERATIONS I didn't
expect any effect on aarch64 and x86_64 anyway.

Ok for trunk?

This bug appears on all versions from 4.9 onwards so I'll be testing it on the branches as well.

Thanks,
Kyrill

2016-07-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/71700
     * expr.c (store_constructor): Mask sign-extended bits when widening
     sub-word constructor element at the start of a word.

2016-07-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/71700
     * gcc.c-torture/execute/pr71700.c: New test.

Comments

Bernd Schmidt July 4, 2016, 6:02 p.m. UTC | #1
On 07/01/2016 11:18 AM, Kyrill Tkachov wrote:
> In this arm wrong-code PR the struct assignment goes wrong when
> expanding constructor elements to a register destination
> when the constructor elements are signed bitfields less than a word wide.
> In this testcase we're intialising a struct with a 16-bit signed
> bitfield to -1 followed by a 1-bit bitfield to 0.
> Before it starts storing the elements it zeroes out the register.
>  The code in store_constructor extends the first field to word size
> because it appears at the beginning of a word.
> It sign-extends the -1 to word size. However, when it later tries to
> store the 0 to bitposition 16 it has some logic
> to avoid redundant zeroing since the destination was originally cleared,
> so it doesn't emit the zero store.
> But the previous sign-extended -1 took up the whole word, so the
> position of the second bitfield contains a set bit.
>
> This patch fixes the problem by zeroing out the bits of the widened
> field that did not appear in the original value,
> so that we can safely avoid storing the second zero in the constructor.
[...]

> Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is
> gated on WORD_REGISTER_OPERATIONS I didn't
> expect any effect on aarch64 and x86_64 anyway.

So - that code path starts with this comment:

             /* If this initializes a field that is smaller than a
                word, at the start of a word, try to widen it to a full
                word.  This special case allows us to output C++ member
                function initializations in a form that the optimizers
                can understand.  */

Doesn't your patch completely defeat the purpose of this? Would you get 
better/identical code by just deleting this block? It seems unfortunate 
to have two different code generation approaches like this.

It would be interesting to know the effects of your patch, and the 
effects of removing this code entirely, on generated code. Try to find 
the motivating C++ member function example perhaps? Maybe another 
possibility is to ensure this doesn't happen if the value would be 
interpreted as signed.


Bernd
Kyrill Tkachov July 5, 2016, 12:57 p.m. UTC | #2
Hi Bernd,

On 04/07/16 19:02, Bernd Schmidt wrote:
> On 07/01/2016 11:18 AM, Kyrill Tkachov wrote:
>> In this arm wrong-code PR the struct assignment goes wrong when
>> expanding constructor elements to a register destination
>> when the constructor elements are signed bitfields less than a word wide.
>> In this testcase we're intialising a struct with a 16-bit signed
>> bitfield to -1 followed by a 1-bit bitfield to 0.
>> Before it starts storing the elements it zeroes out the register.
>>  The code in store_constructor extends the first field to word size
>> because it appears at the beginning of a word.
>> It sign-extends the -1 to word size. However, when it later tries to
>> store the 0 to bitposition 16 it has some logic
>> to avoid redundant zeroing since the destination was originally cleared,
>> so it doesn't emit the zero store.
>> But the previous sign-extended -1 took up the whole word, so the
>> position of the second bitfield contains a set bit.
>>
>> This patch fixes the problem by zeroing out the bits of the widened
>> field that did not appear in the original value,
>> so that we can safely avoid storing the second zero in the constructor.
> [...]
>
>> Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is
>> gated on WORD_REGISTER_OPERATIONS I didn't
>> expect any effect on aarch64 and x86_64 anyway.
>
> So - that code path starts with this comment:
>
>             /* If this initializes a field that is smaller than a
>                word, at the start of a word, try to widen it to a full
>                word.  This special case allows us to output C++ member
>                function initializations in a form that the optimizers
>                can understand.  */
>
> Doesn't your patch completely defeat the purpose of this? Would you get better/identical code by just deleting this block? It seems unfortunate to have two different code generation approaches like this.
>
> It would be interesting to know the effects of your patch, and the effects of removing this code entirely, on generated code. Try to find the motivating C++ member function example perhaps? Maybe another possibility is to ensure this 
> doesn't happen if the value would be interpreted as signed.
>

Doing some archaeology shows this code was added back in 1998 with no testcase (r22567) so I'd have to do more digging.

My interpretation of that comment was that for WORD_REGISTER_OPERATIONS targets it's more beneficial to have word-size
operations, so the expansion code tries to emit as many of the operations in word_mode as it safely can.
With my patch it still emits a word_mode operation, it's just that the immediate that is moved in word_mode has it's
top bits zeroed out instead of sign-extended.

I'll build SPEC2006 on arm (a WORD_REGISTER_OPERATIONS target) and inspect the assembly to see if I see any interesting
effects, but I suspect there won't be much change.

Thanks for looking at this,
Kyrill


>
> Bernd
Kyrill Tkachov July 11, 2016, 2:52 p.m. UTC | #3
On 05/07/16 13:57, Kyrill Tkachov wrote:
> Hi Bernd,
>
> On 04/07/16 19:02, Bernd Schmidt wrote:
>> On 07/01/2016 11:18 AM, Kyrill Tkachov wrote:
>>> In this arm wrong-code PR the struct assignment goes wrong when
>>> expanding constructor elements to a register destination
>>> when the constructor elements are signed bitfields less than a word wide.
>>> In this testcase we're intialising a struct with a 16-bit signed
>>> bitfield to -1 followed by a 1-bit bitfield to 0.
>>> Before it starts storing the elements it zeroes out the register.
>>>  The code in store_constructor extends the first field to word size
>>> because it appears at the beginning of a word.
>>> It sign-extends the -1 to word size. However, when it later tries to
>>> store the 0 to bitposition 16 it has some logic
>>> to avoid redundant zeroing since the destination was originally cleared,
>>> so it doesn't emit the zero store.
>>> But the previous sign-extended -1 took up the whole word, so the
>>> position of the second bitfield contains a set bit.
>>>
>>> This patch fixes the problem by zeroing out the bits of the widened
>>> field that did not appear in the original value,
>>> so that we can safely avoid storing the second zero in the constructor.
>> [...]
>>
>>> Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is
>>> gated on WORD_REGISTER_OPERATIONS I didn't
>>> expect any effect on aarch64 and x86_64 anyway.
>>
>> So - that code path starts with this comment:
>>
>>             /* If this initializes a field that is smaller than a
>>                word, at the start of a word, try to widen it to a full
>>                word.  This special case allows us to output C++ member
>>                function initializations in a form that the optimizers
>>                can understand.  */
>>
>> Doesn't your patch completely defeat the purpose of this? Would you get better/identical code by just deleting this block? It seems unfortunate to have two different code generation approaches like this.
>>
>> It would be interesting to know the effects of your patch, and the effects of removing this code entirely, on generated code. Try to find the motivating C++ member function example perhaps? Maybe another possibility is to ensure this 
>> doesn't happen if the value would be interpreted as signed.
>>
>
> Doing some archaeology shows this code was added back in 1998 with no testcase (r22567) so I'd have to do more digging.
>
> My interpretation of that comment was that for WORD_REGISTER_OPERATIONS targets it's more beneficial to have word-size
> operations, so the expansion code tries to emit as many of the operations in word_mode as it safely can.
> With my patch it still emits a word_mode operation, it's just that the immediate that is moved in word_mode has it's
> top bits zeroed out instead of sign-extended.
>
> I'll build SPEC2006 on arm (a WORD_REGISTER_OPERATIONS target) and inspect the assembly to see if I see any interesting
> effects, but I suspect there won't be much change.
>

Ok, I found a bit of time to investigate.
On SPEC2006 I didn't see a difference in codegen with or without that whole block
(or with and without this patch).
The differences in codegen can be observed on the testcase in the patch.
Currently (without this patch) we generate the wrong-code:
main:
         strd    r3, lr, [sp, #-8]!
         movw    r3, #:lower16:d
         mov     r2, #-1
         movt    r3, #:upper16:d
         str     r2, [r3]
         bl      abort


With this patch that performs the zero extension on the loaded immediate but still
widens the operation to word_mode we generate (the correct):
main:
     strd    r3, lr, [sp, #-8]!
     movw    r3, #:lower16:d
     movw    r2, #65535
     movt    r3, #:upper16:d
     mov    r0, #0
     str    r2, [r3]
     cbnz    r0, .L5
     pop    {r3, pc}
.L5:
     bl    abort

If I remove that whole block of code we end up emitting a short operation with a bitfield
move operation, so we generate:
main:
     strd    r3, lr, [sp, #-8]!
     mov    r2, #0
     mov    r1, #-1
     movw    r3, #:lower16:d
     bfi    r2, r1, #0, #16
     movt    r3, #:upper16:d
     mov    r0, #0
     str    r2, [r3]
     cbnz    r0, .L5
     pop    {r3, pc}
.L5:
     bl    abort

which is correct, but suboptimal compared to the codegen with this patch.
Based on that, I think that code block is a useful optimisation, we just need
to take care with immediates.

What do you think?

Thanks,
Kyrill
Bernd Schmidt July 11, 2016, 5:55 p.m. UTC | #4
On 07/11/2016 04:52 PM, Kyrill Tkachov wrote:
> Based on that, I think that code block is a useful optimisation, we just
> need
> to take care with immediates.
>
> What do you think?

Yeah, I think the patch is ok.


Bernd
Kyrill Tkachov Aug. 5, 2016, 2:20 p.m. UTC | #5
On 01/08/16 15:31, Kyrill Tkachov wrote:
>
> On 11/07/16 18:55, Bernd Schmidt wrote:
>> On 07/11/2016 04:52 PM, Kyrill Tkachov wrote:
>>> Based on that, I think that code block is a useful optimisation, we just
>>> need
>>> to take care with immediates.
>>>
>>> What do you think?
>>
>> Yeah, I think the patch is ok.
>>
>
> This patch (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00017.html) has been in trunk
> for a month with no reported problems.
>
> I'd like to backport it to the GCC 5 and 6 branches.
> I've bootstrapped and tested it there on arm-none-linux-gnueabihf.
>

Sorry for the early ping but given that GCC 6.2 is due shortly and I'm away next week,
could I please backport this patch to the GCC 6 branch?

Thanks,
Kyrill

> Ok?
>
> Thanks,
> Kyrill
>
>>
>> Bernd
>>
>
Richard Biener Aug. 9, 2016, 8:32 a.m. UTC | #6
On Fri, 5 Aug 2016, Kyrill Tkachov wrote:

> 
> On 01/08/16 15:31, Kyrill Tkachov wrote:
> > 
> > On 11/07/16 18:55, Bernd Schmidt wrote:
> > > On 07/11/2016 04:52 PM, Kyrill Tkachov wrote:
> > > > Based on that, I think that code block is a useful optimisation, we just
> > > > need
> > > > to take care with immediates.
> > > > 
> > > > What do you think?
> > > 
> > > Yeah, I think the patch is ok.
> > > 
> > 
> > This patch (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00017.html) has
> > been in trunk
> > for a month with no reported problems.
> > 
> > I'd like to backport it to the GCC 5 and 6 branches.
> > I've bootstrapped and tested it there on arm-none-linux-gnueabihf.
> > 
> 
> Sorry for the early ping but given that GCC 6.2 is due shortly and I'm away
> next week,
> could I please backport this patch to the GCC 6 branch?

Generally patches approved for the trunk that fix regressions can be
backported without further approval.

Richard.

> Thanks,
> Kyrill
> 
> > Ok?
> > 
> > Thanks,
> > Kyrill
> > 
> > > 
> > > Bernd
> > > 
> > 
> 
>
diff mbox

Patch

diff --git a/gcc/expr.c b/gcc/expr.c
index 9733401e09052457678a6a8817fe16f8a737927e..abb4416e41b60ab69f6f9d844e3a40853b0d1cde 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -6281,6 +6281,13 @@  store_constructor (tree exp, rtx target, int cleared, HOST_WIDE_INT size,
 		    type = lang_hooks.types.type_for_mode
 		      (word_mode, TYPE_UNSIGNED (type));
 		    value = fold_convert (type, value);
+		    /* Make sure the bits beyond the original bitsize are zero
+		       so that we can correctly avoid extra zeroing stores in
+		       later constructor elements.  */
+		    tree bitsize_mask
+		      = wide_int_to_tree (type, wi::mask (bitsize, false,
+							   BITS_PER_WORD));
+		    value = fold_build2 (BIT_AND_EXPR, type, value, bitsize_mask);
 		  }
 
 		if (BYTES_BIG_ENDIAN)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71700.c b/gcc/testsuite/gcc.c-torture/execute/pr71700.c
new file mode 100644
index 0000000000000000000000000000000000000000..80afd3809c3abd24135fb36b757ace9f41ba3112
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr71700.c
@@ -0,0 +1,19 @@ 
+struct S
+{
+  signed f0 : 16;
+  unsigned f1 : 1;
+};
+
+int b;
+static struct S c[] = {{-1, 0}, {-1, 0}};
+struct S d;
+
+int
+main ()
+{
+  struct S e = c[0];
+  d = e;
+  if (d.f1 != 0)
+    __builtin_abort ();
+  return 0;
+}