Patchwork [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

login
register
mail settings
Submitter Hans-Peter Nilsson
Date Nov. 19, 2012, 3:54 a.m.
Message ID <alpine.BSF.2.02.1211182231310.47660@dair.pair.com>
Download mbox | patch
Permalink /patch/199919/
State New
Headers show

Comments

Hans-Peter Nilsson - Nov. 19, 2012, 3:54 a.m.
On Mon, 12 Nov 2012, Eric Botcazou wrote:
> > This is a target-specific blockage insn, but with the general form
> > found in all targets defining it.  The default blockage is an empty
> > asm-volatile, which is what cse_insn recognized.  The blockage insn is
> > there to "prevent scheduling" of the critical insns and register
> > values.  It's almost obvious that an unspec_volatile should have that
> > effect "too"; at least that's intended by the code in
> > expand_builtin_setjmp_receiver.  Luckily (or unluckily regarding the
> > presence of the bug) *this* cse code is not run post-frame-layout
> > (post-reload-cse does not use this code), or it seems people would
> > soon notice register values used from the wrong side of the blockage,
> > considering its critical use at the restore of the stack-pointer.
> > As mentioned in the previous patch,
> > <http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html>, clobbering
> > the frame-pointer (and then using it) does not seem the logical
> > alternative to the patch below; the blockage insn should just do its job.
>
> Agreed.
>
> > I updated comments and documentation so it's more apparent that
> > register uses should also not be moved across the blockage; see the
> > patched comments.
> >
> > Tested native x86_64-unknown-linux-gnu w./wo. -m32 and before/after
> > 192677.  Ok to commit?
> >
> > gcc:
> > 	PR middle-end/55030
> > 	* builtins.c (expand_builtin_setjmp_receiver): Update comment
> > 	regarding purpose of blockage.
> > 	* emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for
> > 	the head comment.
> > 	* doc/md.texi (blockage): Update similarly.  Change wording to
> > 	require one of two forms, rather than implying a wider choice.
> > 	* cse.c (cse_insn): Where checking for blocking insns, treat
> > 	UNSPEC_VOLATILE as blocking, besides volatile ASM.
>
> That's fine on principle, but there is a predicate for this (volatile_insn_p)
> so I think we should use it here.  Moreover, cselib_process_insn has the same
> check so we should adjust it as well, which in turn means that dse.c:scan_insn
> should probably be adjusted too.  OK with these changes, thanks.

Doh, I should have looked for that construct in other places.
Thanks for the review.  Here's an updated patch with that.
Unfortunately, it causes regressions; read on for a very brief
analysis.

For x86_64-linux (base multilib):

Running /home/hp/gcctop/tmp/pr55030-2n/gcc/gcc/testsuite/gcc.dg/guality/guality.exp ...
...
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30

And for -m32:
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O2  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -fomit-frame-pointer  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O3 -g  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -Os  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 12 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg6 == 6
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 14 arg7 == 30
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 12 arg1 == 1
FAIL: gcc.dg/guality/pr36728-1.c  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg1 == 1

I looked into gcc.dg/guality/pr36728-1.c at base -O1.

The generated assembly code modulo debug info, is the same.  The
value of arg7 is optimized out, says gdb in gcc.log.  The
problem doesn't seem to be any md-generated
frame-related-barrier as was my first guess, but the volatile
asms in the source(!).  The first one looks like this (and the
second one similar) in pr36728-1.c.168r.cse1 (r193583):

(insn 26 25 27 2 (parallel [
            (set (mem/c:SI (plus:DI (reg/f:DI 20 frame)
                        (const_int -32 [0xffffffffffffffe0])) [0 y+0 S4 A256])
                (asm_operands/v:SI ("") ("=m") 0 [
                        (mem/c:SI (plus:DI (reg/f:DI 20 frame)
                                (const_int -32 [0xffffffffffffffe0])) [0 y+0 S4 A256])
                    ]
                     [
                        (asm_input:SI ("m") (null):0)
                    ]
                     [] /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12))
            (clobber (reg:QI 18 fpsr))
            (clobber (reg:QI 17 flags))
        ]) /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12 -1
     (nil))

It's not caught by the previous test:
-      && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
-      && MEM_VOLATILE_P (PATTERN (insn)))

...but since it is a volatile asm (as seen in the source and by
the /v on the asm_operands) volatile_insn_p does catch it (as
expected and intended) and down the road for some reason, gcc
can't recover the arg7 contents.  Somewhat expected, but this
volatile asm is also more volatile than intended; a clobber list
for example as seen above inserted by the md, is now redundant.

I'm not sure what to do with this.  Try changing volatile_insn_p
adding a parameter to optionally allow volatile asms with
outputs to pass?  But then, when *should* that distinction be
done, to let such volatile asms be allowed to pass as
not-really-as-volatile-as-we-look-for?  I'd think "never" for
any of the the patched code, or maybe "only when looking at
effects on memory".

Maybe there's some known gotcha in debug handling.  I'm CC:ing
Alexandre, in case he happens to have input to this off the top
of his head (or has copious spare time :)

PS. without the dse.c and cselib.c changes, it passed testing.

gcc:
	PR middle-end/55030
	* builtins.c (expand_builtin_setjmp_receiver): Update comment
	regarding purpose of blockage.
	* emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for
	the head comment.
	* rtlanal.c (volatile_insn_p): Ditto.
	* doc/md.texi (blockage): Update similarly.  Change wording to
	require one of two forms, rather than implying a wider choice.
	* cse.c (cse_insn): Where checking for blocking insns, use
	volatile_insn_p instead of manual check for volatile ASM.
	* dse.c (scan_insn): Ditto.
	* cselib.c (cselib_process_insn): Ditto.


brgds, H-P

Patch

Index: gcc/dse.c
===================================================================
--- gcc/dse.c	(revision 192677)
+++ gcc/dse.c	(working copy)
@@ -2522,8 +2522,7 @@  scan_insn (bb_info_t bb_info, rtx insn)
   /* Cselib clears the table for this case, so we have to essentially
      do the same.  */
   if (NONJUMP_INSN_P (insn)
-      && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
-      && MEM_VOLATILE_P (PATTERN (insn)))
+      && volatile_insn_p (PATTERN (insn)))
     {
       add_wild_read (bb_info);
       insn_info->cannot_delete = true;
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	(revision 192677)
+++ gcc/rtlanal.c	(working copy)
@@ -2083,8 +2083,8 @@  remove_node_from_expr_list (const_rtx no

 /* Nonzero if X contains any volatile instructions.  These are instructions
    which may cause unpredictable machine state instructions, and thus no
-   instructions should be moved or combined across them.  This includes
-   only volatile asms and UNSPEC_VOLATILE instructions.  */
+   instructions or register uses should be moved or combined across them.
+   This includes only volatile asms and UNSPEC_VOLATILE instructions.  */

 int
 volatile_insn_p (const_rtx x)
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 192677)
+++ gcc/builtins.c	(working copy)
@@ -963,7 +963,8 @@  expand_builtin_setjmp_receiver (rtx rece

   /* We must not allow the code we just generated to be reordered by
      scheduling.  Specifically, the update of the frame pointer must
-     happen immediately, not later.  */
+     happen immediately, not later.  Similarly, we must block
+     (frame-related) register values to be used across this code.  */
   emit_insn (gen_blockage ());
 }

Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c	(revision 192677)
+++ gcc/cselib.c	(working copy)
@@ -2607,13 +2607,12 @@  cselib_process_insn (rtx insn)

   cselib_current_insn = insn;

-  /* Forget everything at a CODE_LABEL, a volatile asm, or a setjmp.  */
+  /* Forget everything at a CODE_LABEL, a volatile insn, or a setjmp.  */
   if (LABEL_P (insn)
       || (CALL_P (insn)
 	  && find_reg_note (insn, REG_SETJMP, NULL))
       || (NONJUMP_INSN_P (insn)
-	  && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
-	  && MEM_VOLATILE_P (PATTERN (insn))))
+	  && volatile_insn_p (PATTERN (insn))))
     {
       cselib_reset_table (next_uid);
       cselib_current_insn = NULL_RTX;
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	(revision 192677)
+++ gcc/cse.c	(working copy)
@@ -5660,10 +5660,9 @@  cse_insn (rtx insn)
 	  invalidate (XEXP (dest, 0), GET_MODE (dest));
       }

-  /* A volatile ASM invalidates everything.  */
+  /* A volatile ASM or an UNSPEC_VOLATILE invalidates everything.  */
   if (NONJUMP_INSN_P (insn)
-      && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
-      && MEM_VOLATILE_P (PATTERN (insn)))
+      && volatile_insn_p (PATTERN (insn)))
     flush_hash_table ();

   /* Don't cse over a call to setjmp; on some machines (eg VAX)
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 192677)
+++ gcc/emit-rtl.c	(working copy)
@@ -364,8 +364,8 @@  get_reg_attrs (tree decl, int offset)


 #if !HAVE_blockage
-/* Generate an empty ASM_INPUT, which is used to block attempts to schedule
-   across this insn. */
+/* Generate an empty ASM_INPUT, which is used to block attempts to schedule,
+   and to block register equivalences to be seen across this insn.  */

 rtx
 gen_blockage (void)
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 192677)
+++ gcc/doc/md.texi	(working copy)
@@ -5832,8 +5832,9 @@  the values of operands 1 and 2.
 @item @samp{blockage}

 This pattern defines a pseudo insn that prevents the instruction
-scheduler from moving instructions across the boundary defined by the
-blockage insn.  Normally an UNSPEC_VOLATILE pattern.
+scheduler and other passes from moving instructions and using register
+equivalences across the boundary defined by the blockage insn.
+This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.

 @cindex @code{memory_barrier} instruction pattern
 @item @samp{memory_barrier}