diff mbox

shift/extract SHIFT_COUNT_TRUNCATED combine bug

Message ID 8F47DDC3-F9FE-4E94-90F7-3A16A3FD47CE@comcast.net
State New
Headers show

Commit Message

Mike Stump April 8, 2014, 8:07 p.m. UTC
Something broke in the compiler to cause combine to incorrectly optimize:

(insn 12 11 13 3 (set (reg:SI 604 [ D.6102 ])
        (lshiftrt:SI (subreg/s/u:SI (reg/v:DI 601 [ x ]) 0)
            (reg:SI 602 [ D.6103 ]))) t.c:47 4436 {lshrsi3}
     (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ])
        (nil)))
(insn 13 12 14 3 (set (reg:SI 605)
        (and:SI (reg:SI 604 [ D.6102 ])
            (const_int 1 [0x1]))) t.c:47 3658 {andsi3}
     (expr_list:REG_DEAD (reg:SI 604 [ D.6102 ])
        (nil)))
(insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ])
        (zero_extend:DI (reg:SI 605))) t.c:47 4616 {zero_extendsidi2}
     (expr_list:REG_DEAD (reg:SI 605)
        (nil)))

into:

(insn 11 10 12 3 (set (reg:SI 602 [ D.6103 ])
        (not:SI (subreg:SI (reg:DI 595 [ D.6102 ]) 0))) t.c:47 3732 {one_cmplsi2}
     (expr_list:REG_DEAD (reg:DI 595 [ D.6102 ])
        (nil)))
(note 12 11 13 3 NOTE_INSN_DELETED)
(note 13 12 14 3 NOTE_INSN_DELETED)
(insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ])
        (zero_extract:DI (reg/v:DI 601 [ x ])
            (const_int 1 [0x1])
            (reg:SI 602 [ D.6103 ]))) t.c:47 4668 {c2_extzvdi}
     (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ])
        (nil)))

This shows up in:

  FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution,  -Og -g

for me.


is sufficient to never widen variable extracts on SHIFT_COUNT_TRUNCATED machines.  So, the question is, how did people expect this to work?  I didn’t spot what changed recently to cause the bad code-gen.  The optimization of sub into not is ok, despite how funny it looks, because is feeds into extract which we know by SHIFT_COUNT_TRUNCATED is safe.

Is the patch a reasonable way to fix this?

Comments

Jeff Law Jan. 12, 2015, 10:12 p.m. UTC | #1
On 04/08/14 14:07, Mike Stump wrote:
> Something broke in the compiler to cause combine to incorrectly optimize:
>
> (insn 12 11 13 3 (set (reg:SI 604 [ D.6102 ])
>          (lshiftrt:SI (subreg/s/u:SI (reg/v:DI 601 [ x ]) 0)
>              (reg:SI 602 [ D.6103 ]))) t.c:47 4436 {lshrsi3}
>       (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ])
>          (nil)))
> (insn 13 12 14 3 (set (reg:SI 605)
>          (and:SI (reg:SI 604 [ D.6102 ])
>              (const_int 1 [0x1]))) t.c:47 3658 {andsi3}
>       (expr_list:REG_DEAD (reg:SI 604 [ D.6102 ])
>          (nil)))
> (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ])
>          (zero_extend:DI (reg:SI 605))) t.c:47 4616 {zero_extendsidi2}
>       (expr_list:REG_DEAD (reg:SI 605)
>          (nil)))
>
> into:
>
> (insn 11 10 12 3 (set (reg:SI 602 [ D.6103 ])
>          (not:SI (subreg:SI (reg:DI 595 [ D.6102 ]) 0))) t.c:47 3732 {one_cmplsi2}
>       (expr_list:REG_DEAD (reg:DI 595 [ D.6102 ])
>          (nil)))
> (note 12 11 13 3 NOTE_INSN_DELETED)
> (note 13 12 14 3 NOTE_INSN_DELETED)
> (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ])
>          (zero_extract:DI (reg/v:DI 601 [ x ])
>              (const_int 1 [0x1])
>              (reg:SI 602 [ D.6103 ]))) t.c:47 4668 {c2_extzvdi}
>       (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ])
>          (nil)))
>
> This shows up in:
>
>    FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution,  -Og -g
>
> for me.
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 708691f..c1f50ff 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -7245,6 +7245,18 @@ make_extraction (enum machine_mode mode, rtx inner, HOST_WIDE_INT pos,
>         extraction_mode = insn.field_mode;
>       }
>
> +  /* On a SHIFT_COUNT_TRUNCATED machine, we can't promote the mode of
> +     the extract to a larger size on a variable extract, as previously
> +     the position might have been optimized to change a bit of the
> +     index of the starting bit that would have been ignored before,
> +     but, with a larger mode, will then not be.  If we wanted to do
> +     this, we'd have to mask out those bits or prove that those bits
> +     are 0.  */
> +  if (SHIFT_COUNT_TRUNCATED
> +      && pos_rtx
> +      && GET_MODE_BITSIZE (extraction_mode) > GET_MODE_BITSIZE (mode))
> +    extraction_mode = mode;
> +
>     /* Never narrow an object, since that might not be safe.  */
>
>     if (mode != VOIDmode
>
> is sufficient to never widen variable extracts on SHIFT_COUNT_TRUNCATED machines.  So, the question is, how did people expect this to work?  I didn’t spot what changed recently to cause the bad code-gen.  The optimization of sub into not is ok, despite how funny it looks, because is feeds into extract which we know by SHIFT_COUNT_TRUNCATED is safe.
>
> Is the patch a reasonable way to fix this?
On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a 
shift, variable or constant.

In the case of a variable shift, we could easily have eliminated the 
masking code before or during combine.  For a constant shift amount we 
could have adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c)

I think it's just an oversight and it has simply never bit us before.

jeff
Richard Biener Jan. 13, 2015, 9:51 a.m. UTC | #2
On Mon, Jan 12, 2015 at 11:12 PM, Jeff Law <law@redhat.com> wrote:
> On 04/08/14 14:07, Mike Stump wrote:
>>
>> Something broke in the compiler to cause combine to incorrectly optimize:
>>
>> (insn 12 11 13 3 (set (reg:SI 604 [ D.6102 ])
>>          (lshiftrt:SI (subreg/s/u:SI (reg/v:DI 601 [ x ]) 0)
>>              (reg:SI 602 [ D.6103 ]))) t.c:47 4436 {lshrsi3}
>>       (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ])
>>          (nil)))
>> (insn 13 12 14 3 (set (reg:SI 605)
>>          (and:SI (reg:SI 604 [ D.6102 ])
>>              (const_int 1 [0x1]))) t.c:47 3658 {andsi3}
>>       (expr_list:REG_DEAD (reg:SI 604 [ D.6102 ])
>>          (nil)))
>> (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ])
>>          (zero_extend:DI (reg:SI 605))) t.c:47 4616 {zero_extendsidi2}
>>       (expr_list:REG_DEAD (reg:SI 605)
>>          (nil)))
>>
>> into:
>>
>> (insn 11 10 12 3 (set (reg:SI 602 [ D.6103 ])
>>          (not:SI (subreg:SI (reg:DI 595 [ D.6102 ]) 0))) t.c:47 3732
>> {one_cmplsi2}
>>       (expr_list:REG_DEAD (reg:DI 595 [ D.6102 ])
>>          (nil)))
>> (note 12 11 13 3 NOTE_INSN_DELETED)
>> (note 13 12 14 3 NOTE_INSN_DELETED)
>> (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ])
>>          (zero_extract:DI (reg/v:DI 601 [ x ])
>>              (const_int 1 [0x1])
>>              (reg:SI 602 [ D.6103 ]))) t.c:47 4668 {c2_extzvdi}
>>       (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ])
>>          (nil)))
>>
>> This shows up in:
>>
>>    FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution,  -Og -g
>>
>> for me.
>>
>> diff --git a/gcc/combine.c b/gcc/combine.c
>> index 708691f..c1f50ff 100644
>> --- a/gcc/combine.c
>> +++ b/gcc/combine.c
>> @@ -7245,6 +7245,18 @@ make_extraction (enum machine_mode mode, rtx inner,
>> HOST_WIDE_INT pos,
>>         extraction_mode = insn.field_mode;
>>       }
>>
>> +  /* On a SHIFT_COUNT_TRUNCATED machine, we can't promote the mode of
>> +     the extract to a larger size on a variable extract, as previously
>> +     the position might have been optimized to change a bit of the
>> +     index of the starting bit that would have been ignored before,
>> +     but, with a larger mode, will then not be.  If we wanted to do
>> +     this, we'd have to mask out those bits or prove that those bits
>> +     are 0.  */
>> +  if (SHIFT_COUNT_TRUNCATED
>> +      && pos_rtx
>> +      && GET_MODE_BITSIZE (extraction_mode) > GET_MODE_BITSIZE (mode))
>> +    extraction_mode = mode;
>> +
>>     /* Never narrow an object, since that might not be safe.  */
>>
>>     if (mode != VOIDmode
>>
>> is sufficient to never widen variable extracts on SHIFT_COUNT_TRUNCATED
>> machines.  So, the question is, how did people expect this to work?  I
>> didn’t spot what changed recently to cause the bad code-gen.  The
>> optimization of sub into not is ok, despite how funny it looks, because is
>> feeds into extract which we know by SHIFT_COUNT_TRUNCATED is safe.
>>
>> Is the patch a reasonable way to fix this?
>
> On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a
> shift, variable or constant.
>
> In the case of a variable shift, we could easily have eliminated the masking
> code before or during combine.  For a constant shift amount we could have
> adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c)
>
> I think it's just an oversight and it has simply never bit us before.

IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
backends should provide shift patterns with a (and:QI ...) for the
shift amount which simply will omit that operation if suitable.

Richard.

> jeff
Jeff Law Jan. 13, 2015, 4:38 p.m. UTC | #3
On 01/13/15 02:51, Richard Biener wrote:
>> On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a
>> shift, variable or constant.
>>
>> In the case of a variable shift, we could easily have eliminated the masking
>> code before or during combine.  For a constant shift amount we could have
>> adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c)
>>
>> I think it's just an oversight and it has simply never bit us before.
>
> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
> backends should provide shift patterns with a (and:QI ...) for the
> shift amount which simply will omit that operation if suitable.
Perhaps.  I'm certainly not wed to concept of SHIFT_COUNT_TRUNCATED.  I 
don't see that getting addressed in the gcc-5 timeframe.



aarch64, alpha, epiphany, iq2000, lm32, m32r, mep, microblaze, mips, 
mn103, nds32, pa, sparc, stormy16, tilepro, v850 and xtensa are the 
current SHIFT_COUNT_TRUNCATED targets.


Jeff
Segher Boessenkool Jan. 13, 2015, 5:38 p.m. UTC | #4
On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote:
> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
> backends should provide shift patterns with a (and:QI ...) for the
> shift amount which simply will omit that operation if suitable.

Note that that catches less though, e.g. in

int f(int x, int n) { return x << ((2*n) & 31); }

without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30,
not with 31.


Segher
Richard Biener Jan. 14, 2015, 9:10 a.m. UTC | #5
On Tue, Jan 13, 2015 at 6:38 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote:
>> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
>> backends should provide shift patterns with a (and:QI ...) for the
>> shift amount which simply will omit that operation if suitable.
>
> Note that that catches less though, e.g. in
>
> int f(int x, int n) { return x << ((2*n) & 31); }
>
> without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30,
> not with 31.

But even with SHIFT_COUNT_TRUNCATED you cannot omit the
and as it clears the LSB.  Only at a higher level we might be tempted
to drop the & 31 while it still persists in its original form (not sure
if fold does that - I don't see SHIFT_COUNT_TRUNCATED mentioned there).

Richard.

>
> Segher
Segher Boessenkool Jan. 14, 2015, 2:27 p.m. UTC | #6
On Wed, Jan 14, 2015 at 10:10:24AM +0100, Richard Biener wrote:
> On Tue, Jan 13, 2015 at 6:38 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote:
> >> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead
> >> backends should provide shift patterns with a (and:QI ...) for the
> >> shift amount which simply will omit that operation if suitable.
> >
> > Note that that catches less though, e.g. in
> >
> > int f(int x, int n) { return x << ((2*n) & 31); }
> >
> > without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30,
> > not with 31.
> 
> But even with SHIFT_COUNT_TRUNCATED you cannot omit the
> and as it clears the LSB.

The 2*n already does that.

Before combine, we have something like

	t1 = n << 1
	t2 = t1 & 30
	ret = x << t2

(it actually has some register copies to more temporaries), and on
SHIFT_COUNT_TRUNCATED targets where the first two insns don't combine,
e.g. m32r, currently combine ends up with

	t1 = n << 1
	ret = x << t1

while it doesn't without SHIFT_COUNT_TRUNCATED if you only have a
x << (n & 31)  pattern.

I'm all for eradicating SHIFT_COUNT_TRUNCATED; just pointing out that
it is not trivial to fully replace (just the important, obvious cases
are easy).


Segher
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 708691f..c1f50ff 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7245,6 +7245,18 @@  make_extraction (enum machine_mode mode, rtx inner, HOST_WIDE_INT pos,
       extraction_mode = insn.field_mode;
     }
 
+  /* On a SHIFT_COUNT_TRUNCATED machine, we can't promote the mode of
+     the extract to a larger size on a variable extract, as previously
+     the position might have been optimized to change a bit of the
+     index of the starting bit that would have been ignored before,
+     but, with a larger mode, will then not be.  If we wanted to do
+     this, we'd have to mask out those bits or prove that those bits
+     are 0.  */
+  if (SHIFT_COUNT_TRUNCATED
+      && pos_rtx
+      && GET_MODE_BITSIZE (extraction_mode) > GET_MODE_BITSIZE (mode))
+    extraction_mode = mode;
+
   /* Never narrow an object, since that might not be safe.  */
 
   if (mode != VOIDmode