diff mbox

PR63404, gcc 5 miscompiles linux block layer

Message ID 5429A102.4000602@arm.com
State New
Headers show

Commit Message

Jiong Wang Sept. 29, 2014, 6:12 p.m. UTC
it's exposed by linux kernel for x86.

the root cause is current "single_set" will ignore CLOBBER & USE,
while we need to take them into account when handling shrink-wrap.

this patch add one parameter to single_set_2 to support return
NULL_RTX if we want to remove any side-effect. add a new helper function
"single_set_no_clobber_use" added.

pass x86-64 bootstrap & check-gcc/g++, also manually checked ths issue
reported at 63404 is gone away.

also no regression on aarch64-none-elf regression test.

comments?

thanks.

2014-09-26  Jiong Wang  <jiong.wang@arm.com>

         * rtl.h (single_set_no_clobber_use): New function.
         (single_set_2): New parameter "fail_on_clobber_use".
         (single_set): Likewise.
         * config/ia64/ia64.c (ia64_single_set): Likewise.
         * rtlanal.c (single_set_2): Return NULL_RTX if fail_on_clobber_use be true.
         * shrink-wrap.c (move_insn_for_shrink_wrap): Use single_set_no_clobber_use.

Comments

Richard Henderson Sept. 29, 2014, 6:32 p.m. UTC | #1
On 09/29/2014 11:12 AM, Jiong Wang wrote:
> +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
> +{
> +  if (!INSN_P (insn))
> +    return NULL_RTX;
> +
> +  if (GET_CODE (PATTERN (insn)) == SET)
> +    return PATTERN (insn);
> +
> +  /* Defer to the more expensive case, and return NULL_RTX if there is
> +     USE or CLOBBER.  */
> +  return single_set_2 (insn, PATTERN (insn), true);
>  }

What more expensive case?

If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.

I think this function is somewhat useless, and should not be added.

An adjustment to move_insn_for_shrink_wrap may be reasonable though.  I haven't
tried to understand the miscompilation yet.  I can imagine that this would
disable quite a bit of shrink wrapping for x86 though.  Can we do better in
understanding when the clobbered register is live at the location to which we'd
like to move then insns?


r~
Jiong Wang Sept. 29, 2014, 7:24 p.m. UTC | #2
On 29/09/14 19:32, Richard Henderson wrote:
> On 09/29/2014 11:12 AM, Jiong Wang wrote:
>> +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
>> +{
>> +  if (!INSN_P (insn))
>> +    return NULL_RTX;
>> +
>> +  if (GET_CODE (PATTERN (insn)) == SET)
>> +    return PATTERN (insn);
>> +
>> +  /* Defer to the more expensive case, and return NULL_RTX if there is
>> +     USE or CLOBBER.  */
>> +  return single_set_2 (insn, PATTERN (insn), true);
>>   }

Richard,

   thanks for review.

> What more expensive case?

single_set_no_clobber_use is just a clone of single_set, I copied the comments with
only minor modifications.

I think the "more expensive case" here means the case where there are PARALLEL that
we need to check the inner rtx.

>
> If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.
>
> I think this function is somewhat useless, and should not be added.
>
> An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't
> tried to understand the miscompilation yet.  I can imagine that this would
> disable quite a bit of shrink wrapping for x86 though.

I don't think so. from the x86-64 bootstrap, there is no regression on the number
of functions shrink-wrapped. actually speaking, previously only single mov dest, src
handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which
was allowed previously.

and I am afraid if we don't reuse single_set_2, then there will be another loop
to check all those inner rtx which single_set_2 already does.

so, IMHO, just modify single_set_2 will be more efficient.

> Can we do better in
> understanding when the clobbered register is live at the location to which we'd
> like to move then insns?

currently, the generic code in move_insn_for_shrink_wrap only handle dest/src be single
register, so if there is clobber or use, then we might need to check multiply regs, then there might
be a few modifications. and I think that's better be done after all single dest/src issues fixed.

--
Jiong


>
>
> r~
>
>
>
David Malcolm Sept. 29, 2014, 8:34 p.m. UTC | #3
On Mon, 2014-09-29 at 20:24 +0100, Jiong Wang wrote:
> On 29/09/14 19:32, Richard Henderson wrote:
> > On 09/29/2014 11:12 AM, Jiong Wang wrote:
> >> +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
> >> +{
> >> +  if (!INSN_P (insn))
> >> +    return NULL_RTX;
> >> +
> >> +  if (GET_CODE (PATTERN (insn)) == SET)
> >> +    return PATTERN (insn);
> >> +
> >> +  /* Defer to the more expensive case, and return NULL_RTX if there is
> >> +     USE or CLOBBER.  */
> >> +  return single_set_2 (insn, PATTERN (insn), true);
> >>   }
> 
> Richard,
> 
>    thanks for review.
> 
> > What more expensive case?
> 
> single_set_no_clobber_use is just a clone of single_set, I copied the comments with
> only minor modifications.

I introduced that comment to single_set, in r215089, when making
single_set into an inline function (so that it could check that it
received an rtx_insn *, rather than an rtx).

> I think the "more expensive case" here means the case where there are PARALLEL that
> we need to check the inner rtx.

My comment may have been misleading, sorry.  IIRC, what I was thinking
that the old implementation had split single_set into a macro and a
function.  This was by Honza (CCed), 14 years ago to the day back in
r36664 (on 2000-09-29):
https://gcc.gnu.org/ml/gcc-patches/2000-09/msg00893.html


/* Single set is implemented as macro for performance reasons.  */
#define single_set(I) (INSN_P (I) \
                      ? (GET_CODE (PATTERN (I)) == SET \
                         ? PATTERN (I) : single_set_1 (I)) \
                      : NULL_RTX)

I think by "the more expensive case" I meant having to make a function
call to handle the less-common cases (which indeed covers the PARALLEL
case), rather than having logic inline; preserving that inlined vs
not-inlined split was one of my aims for r215089.

Perhaps it should be rewritten to "Defer to a function call to handle
the less common cases", or somesuch?

[...snip rest of post...]

Dave
Jeff Law Sept. 30, 2014, 4:21 a.m. UTC | #4
On 09/29/14 13:24, Jiong Wang wrote:
>
> I don't think so. from the x86-64 bootstrap, there is no regression
> on the number of functions shrink-wrapped. actually speaking,
> previously only single mov dest, src handled, so the disallowing
> USE/CLOBBER will not disallow shrink-wrap opportunity which was
> allowed previously.
This is the key, of course.  shrink-wrapping is very restrictive in its 
ability to sink insns.  The only forms it'll currently shrink are simple 
moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing 
USE/CLOBBER does not impact the x86 port in any significant way.

I do agree with Richard that it would be useful to see the insns that 
are incorrectly sunk and the surrounding context.
Jeff
Richard Earnshaw Sept. 30, 2014, 2:15 p.m. UTC | #5
On 29/09/14 19:32, Richard Henderson wrote:
> On 09/29/2014 11:12 AM, Jiong Wang wrote:
>> +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
>> +{
>> +  if (!INSN_P (insn))
>> +    return NULL_RTX;
>> +
>> +  if (GET_CODE (PATTERN (insn)) == SET)
>> +    return PATTERN (insn);
>> +
>> +  /* Defer to the more expensive case, and return NULL_RTX if there is
>> +     USE or CLOBBER.  */
>> +  return single_set_2 (insn, PATTERN (insn), true);
>>  }
> 
> What more expensive case?
> 
> If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.
> 
> I think this function is somewhat useless, and should not be added.
> 
> An adjustment to move_insn_for_shrink_wrap may be reasonable though.  I haven't
> tried to understand the miscompilation yet.  I can imagine that this would
> disable quite a bit of shrink wrapping for x86 though.  Can we do better in
> understanding when the clobbered register is live at the location to which we'd
> like to move then insns?
> 
> 
> r~
> 

I think part of the problem is in the naming of single_set().  From the
name it's not entirely obvious to users that this includes insns that
clobber registers or which write other registers that are unused after
that point.  I've previously had to fix a bug where this assumption was
made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)

Most uses of single_set prior to register allocation are probably safe;
but later uses are fraught with potential problems of this nature and
may well be bugs waiting to happen.

R.
Jiong Wang Sept. 30, 2014, 2:37 p.m. UTC | #6
On 30/09/14 05:21, Jeff Law wrote:
> On 09/29/14 13:24, Jiong Wang wrote:
>> I don't think so. from the x86-64 bootstrap, there is no regression
>> on the number of functions shrink-wrapped. actually speaking,
>> previously only single mov dest, src handled, so the disallowing
>> USE/CLOBBER will not disallow shrink-wrap opportunity which was
>> allowed previously.
> This is the key, of course.  shrink-wrapping is very restrictive in its
> ability to sink insns.  The only forms it'll currently shrink are simple
> moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing
> USE/CLOBBER does not impact the x86 port in any significant way.

yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped
after we sinking more insn except simple "mov dest, src"  on x86-64 bootstrap. and
I remember the similar percentage on glibc build.

while on aarch64, the overall functions shrink-wrapped increased +25% on some
programs. maybe we could gain the same on other RISC backend.

>
> I do agree with Richard that it would be useful to see the insns that
> are incorrectly sunk and the surrounding context.

insn 14 and 182 are sunk incorrectly. below is the details.

(insn 14 173 174 2 (parallel [

             (set (reg:QI 37 r8 [orig:86 D.32480 ] [86])

                 (lshiftrt:QI (reg:QI 37 r8 [orig:86 D.32480 ] [86])

                     (const_int 2 [0x2])))

             (clobber (reg:CC 17 flags))

         ]) /home/andi/lsrc/linux/block/blk-flush2.c:50 547 {*lshrqi3_1}

      (expr_list:REG_EQUAL (lshiftrt:QI (mem:QI (plus:DI (reg/v/f:DI 43 r14 [orig:85 q ] [85])

                     (const_int 1612 [0x64c])) [20 *q_7+1612 S1 A32])

             (const_int 2 [0x2]))

         (nil)))

(insn 174 14 182 2 (set (reg:QI 44 r15 [orig:86 D.32480 ] [86])

         (reg:QI 37 r8 [orig:86 D.32480 ] [86])) /home/andi/lsrc/linux/block/blk-flush2.c:50 93 {*movqi_internal}

      (nil))

(insn 182 174 16 2 (parallel [

             (set (reg:SI 44 r15 [orig:86 D.32480 ] [86])

                 (and:SI (reg:SI 44 r15 [orig:86 D.32480 ] [86])

                     (const_int 1 [0x1])))

             (clobber (reg:CC 17 flags))

         ]) /home/andi/lsrc/linux/block/blk-flush2.c:50 376 {*andsi_1}

      (nil))

> Jeff
>
>
>
Jeff Law Sept. 30, 2014, 4:30 p.m. UTC | #7
On 09/30/14 08:37, Jiong Wang wrote:
>
> On 30/09/14 05:21, Jeff Law wrote:
>> On 09/29/14 13:24, Jiong Wang wrote:
>>> I don't think so. from the x86-64 bootstrap, there is no regression
>>> on the number of functions shrink-wrapped. actually speaking,
>>> previously only single mov dest, src handled, so the disallowing
>>> USE/CLOBBER will not disallow shrink-wrap opportunity which was
>>> allowed previously.
>> This is the key, of course.  shrink-wrapping is very restrictive in its
>> ability to sink insns.  The only forms it'll currently shrink are simple
>> moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing
>> USE/CLOBBER does not impact the x86 port in any significant way.
>
> yes, and we could get +1.22% (2567 compared with 2536) functions
> shrink-wrapped
> after we sinking more insn except simple "mov dest, src"  on x86-64
> bootstrap. and
> I remember the similar percentage on glibc build.
>
> while on aarch64, the overall functions shrink-wrapped increased +25% on
> some
> programs. maybe we could gain the same on other RISC backend.
>
>>
>> I do agree with Richard that it would be useful to see the insns that
>> are incorrectly sunk and the surrounding context.
So I must be missing something.  I thought the shrink-wrapping code 
wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 
182.  I thought it was limited to reg-reg copies and constant 
initializations.

Jeff
Jeff Law Sept. 30, 2014, 4:45 p.m. UTC | #8
On 09/30/14 08:15, Richard Earnshaw wrote:
>
> I think part of the problem is in the naming of single_set().  From the
> name it's not entirely obvious to users that this includes insns that
> clobber registers or which write other registers that are unused after
> that point.  I've previously had to fix a bug where this assumption was
> made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)
>
> Most uses of single_set prior to register allocation are probably safe;
> but later uses are fraught with potential problems of this nature and
> may well be bugs waiting to happen.
Very possibly.  There's a bit of a natural tension here in that often we 
don't much care about the additional CLOBBERS, but when we get it wrong, 
obviously it's bad.

I haven't done any research, but I suspect the change it ignore clobbers 
in single_set came in as part of exposing the CC register and avoiding 
regressions all over the place as a result.

I wonder what would happen if we ignored prior to register allocation, 
then rejected insns with those CLOBBERs once register allocation started.

Jeff
Richard Earnshaw Sept. 30, 2014, 4:51 p.m. UTC | #9
On 30/09/14 17:45, Jeff Law wrote:
> On 09/30/14 08:15, Richard Earnshaw wrote:
>>
>> I think part of the problem is in the naming of single_set().  From the
>> name it's not entirely obvious to users that this includes insns that
>> clobber registers or which write other registers that are unused after
>> that point.  I've previously had to fix a bug where this assumption was
>> made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)
>>
>> Most uses of single_set prior to register allocation are probably safe;
>> but later uses are fraught with potential problems of this nature and
>> may well be bugs waiting to happen.
> Very possibly.  There's a bit of a natural tension here in that often we 
> don't much care about the additional CLOBBERS, but when we get it wrong, 
> obviously it's bad.
> 
> I haven't done any research, but I suspect the change it ignore clobbers 
> in single_set came in as part of exposing the CC register and avoiding 
> regressions all over the place as a result.

It's not just clobbers; it ignores patterns like

(parallel
 [(set (a) (...)
  (set (b) (...)])
[(reg_note (reg_unused(b))]

Which is probably fine before register allocation but definitely
something you have to think about afterwards.

> 
> I wonder what would happen if we ignored prior to register allocation, 
> then rejected insns with those CLOBBERs once register allocation started.
> 

Might work; but it might miss some useful cases as well...

R.
Wang Jiong Sept. 30, 2014, 6:36 p.m. UTC | #10
2014-09-30 17:30 GMT+01:00 Jeff Law <law@redhat.com>:
> On 09/30/14 08:37, Jiong Wang wrote:
>>
>>
>> On 30/09/14 05:21, Jeff Law wrote:
>>>
>>> On 09/29/14 13:24, Jiong Wang wrote:
>>>>
>>>> I don't think so. from the x86-64 bootstrap, there is no regression
>>>> on the number of functions shrink-wrapped. actually speaking,
>>>> previously only single mov dest, src handled, so the disallowing
>>>> USE/CLOBBER will not disallow shrink-wrap opportunity which was
>>>> allowed previously.
>>>
>>> This is the key, of course.  shrink-wrapping is very restrictive in its
>>> ability to sink insns.  The only forms it'll currently shrink are simple
>>> moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing
>>> USE/CLOBBER does not impact the x86 port in any significant way.
>>
>>
>> yes, and we could get +1.22% (2567 compared with 2536) functions
>> shrink-wrapped
>> after we sinking more insn except simple "mov dest, src"  on x86-64
>> bootstrap. and
>> I remember the similar percentage on glibc build.
>>
>> while on aarch64, the overall functions shrink-wrapped increased +25% on
>> some
>> programs. maybe we could gain the same on other RISC backend.
>>
>>>
>>> I do agree with Richard that it would be useful to see the insns that
>>> are incorrectly sunk and the surrounding context.
>
> So I must be missing something.  I thought the shrink-wrapping code wouldn't
> sink arithmetic/logical insns like we see with insn 14 and insn 182.  I
> thought it was limited to reg-reg copies and constant initializations.

yes, it was limited to reg-reg copies, and my previous sink improvement aimed to
sink any rtx

  A: be single_set
  B: the src operand be any combination of no more than one register
and no non-constant objects.

while some operator like shift may have side effect. IMHO, all side
effects are reflected on RTX,
together with this fail_on_clobber_use modification, the rtx returned
by single_set_no_clobber_use is
safe to sink if it meets the above limit B and pass later register
use/def check in move_insn_for_shrink_wrap ?

Regards,
Jiong

>
> Jeff
>
>
Steven Bosscher Sept. 30, 2014, 8:17 p.m. UTC | #11
On Tue, Sep 30, 2014 at 6:51 PM, Richard Earnshaw wrote:
> It's not just clobbers; it ignores patterns like
>
> (parallel
>  [(set (a) (...)
>   (set (b) (...)])
> [(reg_note (reg_unused(b))]
>
> Which is probably fine before register allocation but definitely
> something you have to think about afterwards.

Even before RA this isn't always fine. We have checks for
!multiple_sets for this.

Ciao!
Steven
diff mbox

Patch

diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index 9337be1..09d3c4a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -7172,7 +7172,7 @@  ia64_single_set (rtx_insn *insn)
       break;
 
     default:
-      ret = single_set_2 (insn, x);
+      ret = single_set_2 (insn, x, false);
       break;
     }
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e73f731..7c40d5a 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2797,7 +2797,7 @@  extern void set_insn_deleted (rtx);
 
 /* Functions in rtlanal.c */
 
-extern rtx single_set_2 (const rtx_insn *, const_rtx);
+extern rtx single_set_2 (const rtx_insn *, const_rtx, bool fail_on_clobber_use);
 
 /* Handle the cheap and common cases inline for performance.  */
 
@@ -2810,7 +2810,20 @@  inline rtx single_set (const rtx_insn *insn)
     return PATTERN (insn);
 
   /* Defer to the more expensive case.  */
-  return single_set_2 (insn, PATTERN (insn));
+  return single_set_2 (insn, PATTERN (insn), false);
+}
+
+inline rtx single_set_no_clobber_use (const rtx_insn *insn)
+{
+  if (!INSN_P (insn))
+    return NULL_RTX;
+
+  if (GET_CODE (PATTERN (insn)) == SET)
+    return PATTERN (insn);
+
+  /* Defer to the more expensive case, and return NULL_RTX if there is
+     USE or CLOBBER.  */
+  return single_set_2 (insn, PATTERN (insn), true);
 }
 
 extern enum machine_mode get_address_mode (rtx mem);
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 3063458..cb5e36a 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1182,7 +1182,7 @@  record_hard_reg_uses (rtx *px, void *data)
    will not be used, which we ignore.  */
 
 rtx
-single_set_2 (const rtx_insn *insn, const_rtx pat)
+single_set_2 (const rtx_insn *insn, const_rtx pat, bool fail_on_clobber_use)
 {
   rtx set = NULL;
   int set_verified = 1;
@@ -1197,6 +1197,8 @@  single_set_2 (const rtx_insn *insn, const_rtx pat)
 	    {
 	    case USE:
 	    case CLOBBER:
+	      if (fail_on_clobber_use)
+		return NULL_RTX;
 	      break;
 
 	    case SET:
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b1ff8a2..5624ef7 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -177,7 +177,7 @@  move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   edge live_edge;
 
   /* Look for a simple register copy.  */
-  set = single_set (insn);
+  set = single_set_no_clobber_use (insn);
   if (!set)
     return false;
   src = SET_SRC (set);