diff mbox

Fix ssa coalescing with inline asm (PR middle-end/70593)

Message ID 20160408145422.GU19207@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek April 8, 2016, 2:54 p.m. UTC
Hi!

xen is miscompiled since Alex' SSA coalescing changes.
The problem is that we have there inline asm that sets more than one
SSA_NAME, one of them is dead though (has zero uses) and because of
the zero uses coalescing doesn't see any conflicts and puts both
the SSA_NAMEs in the two GIMPLE_ASM outputs into the same partition.
During expansion we then emit the ASM_OPERANDS followed by copying of the
two outputs into DECL_RTL of their SSA_NAMEs.  But as both are in the same
partition, that is the same pseudo.  The first one is what we want,
and the second one is for the zero uses SSA_NAME, which then overwrites
the right one with unrelated value.
(insn 9 6 7 2 (parallel [
            (set (reg:DI 89 [ c ])
                (asm_operands/v:DI ("xorl	%k1, %k1
	movl	$7, %k0") ("=c") 0 [
                        (reg/v:DI 87 [ <retval> ])
                        (reg/v:DI 87 [ <retval> ])
                    ]
                     [
                        (asm_input:DI ("0") pr70593.c:9)
                        (asm_input:DI ("1") pr70593.c:9)
                    ]
                     [] pr70593.c:9))
            (set (reg:DI 90 [ a ])
                (asm_operands/v:DI ("xorl	%k1, %k1
	movl	$7, %k0") ("=a") 1 [
                        (reg/v:DI 87 [ <retval> ])
                        (reg/v:DI 87 [ <retval> ])
                    ]
                     [
                        (asm_input:DI ("0") pr70593.c:9)
                        (asm_input:DI ("1") pr70593.c:9)
                    ]
                     [] pr70593.c:9))
            (clobber (mem:BLK (scratch) [0  A8]))
            (clobber (reg:CCFP 18 fpsr))
            (clobber (reg:CC 17 flags))
        ]) pr70593.c:9 -1
     (nil))
(insn 7 9 8 2 (set (reg/v:DI 87 [ <retval> ])
        (reg:DI 89 [ c ])) pr70593.c:9 -1
     (nil))
(insn 8 7 13 2 (set (reg/v:DI 87 [ <retval> ])
        (reg:DI 90 [ a ])) pr70593.c:9 -1
     (nil))
The following patch handles this by making sure we record conflicts
between multiple SSA_NAME outputs of the GIMPLE_ASM, even when some of them
are not live, by emulating what really happens during expansion - that
they are live in the moves after the asm insn.  The
live_track_process_def calls later on remove it again from the live
partitions and thus undo the live_track_process_use effect, so all
the patch changes is that during the processing of e.g. the first
SSA_NAME output it sees the second (and perhaps third ...) SSA_NAME output
as live and adds conflict, similarly when processing the second one
if there are more than two, it again sees the third one and adds conflict
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-08  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/70593
	* tree-ssa-coalesce.c (build_ssa_conflict_graph): For GIMPLE_ASM
	with multiple SSA_NAME defs, force the outputs other than first
	to be live before calling live_track_process_def on each output.

	* gcc.target/i386/pr70593.c: New test.


	Jakub

Comments

Richard Biener April 8, 2016, 4:04 p.m. UTC | #1
On April 8, 2016 4:54:22 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>xen is miscompiled since Alex' SSA coalescing changes.
>The problem is that we have there inline asm that sets more than one
>SSA_NAME, one of them is dead though (has zero uses) and because of
>the zero uses coalescing doesn't see any conflicts and puts both
>the SSA_NAMEs in the two GIMPLE_ASM outputs into the same partition.
>During expansion we then emit the ASM_OPERANDS followed by copying of
>the
>two outputs into DECL_RTL of their SSA_NAMEs.  But as both are in the
>same
>partition, that is the same pseudo.  The first one is what we want,
>and the second one is for the zero uses SSA_NAME, which then overwrites
>the right one with unrelated value.
>(insn 9 6 7 2 (parallel [
>            (set (reg:DI 89 [ c ])
>                (asm_operands/v:DI ("xorl	%k1, %k1
>	movl	$7, %k0") ("=c") 0 [
>                        (reg/v:DI 87 [ <retval> ])
>                        (reg/v:DI 87 [ <retval> ])
>                    ]
>                     [
>                        (asm_input:DI ("0") pr70593.c:9)
>                        (asm_input:DI ("1") pr70593.c:9)
>                    ]
>                     [] pr70593.c:9))
>            (set (reg:DI 90 [ a ])
>                (asm_operands/v:DI ("xorl	%k1, %k1
>	movl	$7, %k0") ("=a") 1 [
>                        (reg/v:DI 87 [ <retval> ])
>                        (reg/v:DI 87 [ <retval> ])
>                    ]
>                     [
>                        (asm_input:DI ("0") pr70593.c:9)
>                        (asm_input:DI ("1") pr70593.c:9)
>                    ]
>                     [] pr70593.c:9))
>            (clobber (mem:BLK (scratch) [0  A8]))
>            (clobber (reg:CCFP 18 fpsr))
>            (clobber (reg:CC 17 flags))
>        ]) pr70593.c:9 -1
>     (nil))
>(insn 7 9 8 2 (set (reg/v:DI 87 [ <retval> ])
>        (reg:DI 89 [ c ])) pr70593.c:9 -1
>     (nil))
>(insn 8 7 13 2 (set (reg/v:DI 87 [ <retval> ])
>        (reg:DI 90 [ a ])) pr70593.c:9 -1
>     (nil))
>The following patch handles this by making sure we record conflicts
>between multiple SSA_NAME outputs of the GIMPLE_ASM, even when some of
>them
>are not live, by emulating what really happens during expansion - that
>they are live in the moves after the asm insn.  The
>live_track_process_def calls later on remove it again from the live
>partitions and thus undo the live_track_process_use effect, so all
>the patch changes is that during the processing of e.g. the first
>SSA_NAME output it sees the second (and perhaps third ...) SSA_NAME
>output
>as live and adds conflict, similarly when processing the second one
>if there are more than two, it again sees the third one and adds
>conflict
>etc.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hmm, don't we simply want to do this for all stmts (OK, only asm have multiple defs...)?

Otherwise OK.thanks,
Richard.

>2016-04-08  Jakub Jelinek  <jakub@redhat.com>
>
>	PR middle-end/70593
>	* tree-ssa-coalesce.c (build_ssa_conflict_graph): For GIMPLE_ASM
>	with multiple SSA_NAME defs, force the outputs other than first
>	to be live before calling live_track_process_def on each output.
>
>	* gcc.target/i386/pr70593.c: New test.
>
>--- gcc/tree-ssa-coalesce.c.jj	2016-03-30 16:00:17.000000000 +0200
>+++ gcc/tree-ssa-coalesce.c	2016-04-08 12:36:26.409403204 +0200
>@@ -905,6 +905,27 @@ build_ssa_conflict_graph (tree_live_info
> 	    }
> 	  else if (is_gimple_debug (stmt))
> 	    continue;
>+	  else if (gimple_code (stmt) == GIMPLE_ASM
>+		   && gimple_asm_noutputs (as_a <gasm *> (stmt)) > 1)
>+	    {
>+	      /* For GIMPLE_ASM as the only statement which can have
>+		 more than one SSA_NAME definition, pretend all the
>+		 SSA_NAME outputs but the first one are live at this point,
>+		 so that conflicts are added in between all those even
>+		 when they are actually not really live after the asm,
>+		 because expansion might copy those into pseudos after
>+		 the asm and if multiple outputs share the same partition,
>+		 it might overwrite those that should be live.  E.g.
>+		 asm volatile (".." : "=r" (a) : "=r" (b) : "0" (a), "1" (a));
>+		 return a;
>+		 See PR70593.  */
>+	      bool first = true;
>+	      FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
>+		if (first)
>+		  first = false;
>+		else
>+		  live_track_process_use (live, var);
>+	    }
> 
> 	  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
> 	    live_track_process_def (live, var, graph);
>--- gcc/testsuite/gcc.target/i386/pr70593.c.jj	2016-04-08
>12:59:27.352278471 +0200
>+++ gcc/testsuite/gcc.target/i386/pr70593.c	2016-04-08
>12:48:27.000000000 +0200
>@@ -0,0 +1,19 @@
>+/* PR middle-end/70593 */
>+/* { dg-do run } */
>+/* { dg-options "-O2" } */
>+
>+__attribute__((noinline, noclone)) unsigned long
>+foo (unsigned x)
>+{
>+  unsigned long a, c = x;
>+  asm volatile ("xorl\t%k1, %k1\n\tmovl\t$7, %k0" : "=c" (c), "=a" (a)
>: "0" (c), "1" (c) : "memory");
>+  return c;
>+}
>+
>+int
>+main ()
>+{
>+  if (foo (3) != 7)
>+    __builtin_abort ();
>+  return 0;
>+}
>
>	Jakub
Jakub Jelinek April 8, 2016, 4:10 p.m. UTC | #2
On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
> Hmm, don't we simply want to do this for all stmts (OK, only asm have multiple defs...)?

For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
Though, of course, if you want, unconditionally doing:
	  bool first = true;
	  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
	    if (first)
	      first = false;
	    else
	      live_track_process_use (live, var);
would work too and would be prepared for future hypothetical stmts
with multiple defs.  The reason why I'm not calling live_track_process_use
on the first SSA_OP_DEF operand is that it is completely useless,
because the first live_track_process_def will undo it immediately.
For the above, I guess I'd just slightly adjust the comment, instead of
	For GIMPLE_ASM as the only statement which can have
	more than one SSA_NAME definition, ...
say
	For stmts with more than one SSA_NAME definition, ...

	Jakub
Richard Biener April 8, 2016, 5:08 p.m. UTC | #3
On April 8, 2016 6:10:05 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
>> Hmm, don't we simply want to do this for all stmts (OK, only asm have
>multiple defs...)?
>
>For all stmts that have multiple defs (which is only GIMPLE_ASM right
>now).
>Though, of course, if you want, unconditionally doing:
>	  bool first = true;
>	  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
>	    if (first)
>	      first = false;
>	    else
>	      live_track_process_use (live, var);
>would work too and would be prepared for future hypothetical stmts
>with multiple defs.

Yes, that's what I am suggesting.

  The reason why I'm not calling
>live_track_process_use
>on the first SSA_OP_DEF operand is that it is completely useless,
>because the first live_track_process_def will undo it immediately.
>For the above, I guess I'd just slightly adjust the comment, instead of
>	For GIMPLE_ASM as the only statement which can have
>	more than one SSA_NAME definition, ...
>say
>	For stmts with more than one SSA_NAME definition, ...

Sounds good.

Richard.

>	Jakub
Jeff Law April 11, 2016, 8:48 p.m. UTC | #4
On 04/08/2016 10:10 AM, Jakub Jelinek wrote:
> On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
>> Hmm, don't we simply want to do this for all stmts (OK, only asm have multiple defs...)?
>
> For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
I thought the atomic support added support for statements with multiple 
outputs?

Jeff
Richard Biener April 12, 2016, 8:35 a.m. UTC | #5
On Mon, 11 Apr 2016, Jeff Law wrote:

> On 04/08/2016 10:10 AM, Jakub Jelinek wrote:
> > On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
> > > Hmm, don't we simply want to do this for all stmts (OK, only asm have
> > > multiple defs...)?
> > 
> > For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
> I thought the atomic support added support for statements with multiple
> outputs?

Maybe on a branch.  Historically all gimples had the ability to have
multiple DEFs but this was removed as having excessive memory requirement.
Later DEFs handling was simplified and we no longer keep def "operands"
explicit but the iterators know how to get at all possible defs.  Which
makes it "currently only ASMs can have multiple (non-virtual) defs".

See ssa-iterators.h:op_iter_init and op_iter_next_def.

Richard.
Andrew MacLeod April 12, 2016, 1:01 p.m. UTC | #6
On 04/12/2016 04:35 AM, Richard Biener wrote:
> On Mon, 11 Apr 2016, Jeff Law wrote:
>
>> On 04/08/2016 10:10 AM, Jakub Jelinek wrote:
>>> On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
>>>> Hmm, don't we simply want to do this for all stmts (OK, only asm have
>>>> multiple defs...)?
>>> For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
>> I thought the atomic support added support for statements with multiple
>> outputs?
> Maybe on a branch.  Historically all gimples had the ability to have
> multiple DEFs but this was removed as having excessive memory requirement.
> Later DEFs handling was simplified and we no longer keep def "operands"
> explicit but the iterators know how to get at all possible defs.  Which
> makes it "currently only ASMs can have multiple (non-virtual) defs".
>
> See ssa-iterators.h:op_iter_init and op_iter_next_def.
>
> Richard.
I poked around for a while and found a patch from late 2012 where I was 
attempting to implement atomics as a gimple primitive (GIMPLE_ATOMIC).  
That work did indeed add the ability of the CAS primitive to return 2 
values to op_iter_init and friends.  I recall there were maybe a half 
dozen places in the compiler that needed to be updated to deal with 2 
results for a gimple statement.

It never went anywhere because I ran into a bunch of other problems that 
frustrated me and led me down the frontend/backend separation path :-P

Today, we still just let RTL patterns optimize the conditional result 
and expected-value address parameter from a compare_exchange... so we 
simply bypass the 2 results issue for a CAS and push it to the target :-).

Andrew
Jakub Jelinek April 12, 2016, 1:06 p.m. UTC | #7
On Tue, Apr 12, 2016 at 09:01:58AM -0400, Andrew MacLeod wrote:
> I poked around for a while and found a patch from late 2012 where I was
> attempting to implement atomics as a gimple primitive (GIMPLE_ATOMIC).  That
> work did indeed add the ability of the CAS primitive to return 2 values to
> op_iter_init and friends.  I recall there were maybe a half dozen places in
> the compiler that needed to be updated to deal with 2 results for a gimple
> statement.
> 
> It never went anywhere because I ran into a bunch of other problems that
> frustrated me and led me down the frontend/backend separation path :-P
> 
> Today, we still just let RTL patterns optimize the conditional result and
> expected-value address parameter from a compare_exchange... so we simply
> bypass the 2 results issue for a CAS and push it to the target :-).

For the __builtin_*_overflow builtins, we lower them into internal functions
that return complex integer that contains both the values we want (result
and overflow flag) and then just extract the real and imag part of that.
Ugly, but it works without major changes to all optimizers.

	Jakub
Andrew MacLeod April 12, 2016, 1:19 p.m. UTC | #8
On 04/12/2016 09:06 AM, Jakub Jelinek wrote:
> On Tue, Apr 12, 2016 at 09:01:58AM -0400, Andrew MacLeod wrote:
>> I poked around for a while and found a patch from late 2012 where I was
>> attempting to implement atomics as a gimple primitive (GIMPLE_ATOMIC).  That
>> work did indeed add the ability of the CAS primitive to return 2 values to
>> op_iter_init and friends.  I recall there were maybe a half dozen places in
>> the compiler that needed to be updated to deal with 2 results for a gimple
>> statement.
>>
>> It never went anywhere because I ran into a bunch of other problems that
>> frustrated me and led me down the frontend/backend separation path :-P
>>
>> Today, we still just let RTL patterns optimize the conditional result and
>> expected-value address parameter from a compare_exchange... so we simply
>> bypass the 2 results issue for a CAS and push it to the target :-).
> For the __builtin_*_overflow builtins, we lower them into internal functions
> that return complex integer that contains both the values we want (result
> and overflow flag) and then just extract the real and imag part of that.
> Ugly, but it works without major changes to all optimizers.
>
> 	Jakub
I suppose it shouldn't be too hard to abstract it a bit so that we 
aren't special casing GIMPLE_ASM, but rather check a flag on the 
statement kind and use a call back to handle the multiple defs that need 
handling as an exception.. And do it in some way so it doesn't slow down 
the normal case.

I expect everything required is buried somewhere in my patch.  Maybe 
someday when I'm looking for a something smallish and different to do 
I'll have a look and see if there is anything worth pursuing.  It does 
seem to come up every year or two.

Andrew
diff mbox

Patch

--- gcc/tree-ssa-coalesce.c.jj	2016-03-30 16:00:17.000000000 +0200
+++ gcc/tree-ssa-coalesce.c	2016-04-08 12:36:26.409403204 +0200
@@ -905,6 +905,27 @@  build_ssa_conflict_graph (tree_live_info
 	    }
 	  else if (is_gimple_debug (stmt))
 	    continue;
+	  else if (gimple_code (stmt) == GIMPLE_ASM
+		   && gimple_asm_noutputs (as_a <gasm *> (stmt)) > 1)
+	    {
+	      /* For GIMPLE_ASM as the only statement which can have
+		 more than one SSA_NAME definition, pretend all the
+		 SSA_NAME outputs but the first one are live at this point,
+		 so that conflicts are added in between all those even
+		 when they are actually not really live after the asm,
+		 because expansion might copy those into pseudos after
+		 the asm and if multiple outputs share the same partition,
+		 it might overwrite those that should be live.  E.g.
+		 asm volatile (".." : "=r" (a) : "=r" (b) : "0" (a), "1" (a));
+		 return a;
+		 See PR70593.  */
+	      bool first = true;
+	      FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
+		if (first)
+		  first = false;
+		else
+		  live_track_process_use (live, var);
+	    }
 
 	  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
 	    live_track_process_def (live, var, graph);
--- gcc/testsuite/gcc.target/i386/pr70593.c.jj	2016-04-08 12:59:27.352278471 +0200
+++ gcc/testsuite/gcc.target/i386/pr70593.c	2016-04-08 12:48:27.000000000 +0200
@@ -0,0 +1,19 @@ 
+/* PR middle-end/70593 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__((noinline, noclone)) unsigned long
+foo (unsigned x)
+{
+  unsigned long a, c = x;
+  asm volatile ("xorl\t%k1, %k1\n\tmovl\t$7, %k0" : "=c" (c), "=a" (a) : "0" (c), "1" (c) : "memory");
+  return c;
+}
+
+int
+main ()
+{
+  if (foo (3) != 7)
+    __builtin_abort ();
+  return 0;
+}