Patchwork [dataflow] : Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper

login
register
mail settings
Submitter Uros Bizjak
Date Jan. 6, 2013, 3:48 p.m.
Message ID <CAFULd4asYbD30GODKEYOUyoCH3mKRtm+zbXEW0+nK8O=6wZ7hw@mail.gmail.com>
Download mbox | patch
Permalink /patch/209778/
State New
Headers show

Comments

Uros Bizjak - Jan. 6, 2013, 3:48 p.m.
Hello!

Attached patch fixes runtime comparison failure of 454.calculix due to
wrong movement of vzeroupper in jump2 pass. It turns out, that
can_move_insns_accross function does not special-case
unspec_volatiles, so vzeroupper is allowed to pass various 256bit avx
instructions.

The patch rejects moves of unspec_volatile insns in can_move_insn_accross.

2012-01-06  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/55845
	* df-problems.c (can_move_insns_across): Stop scanning at
	unspec_volatile source instruction.

2012-01-06  Uros Bizjak  <ubizjak@gmail.com>
	    Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

	PR rtl-optimization/55845
	* gcc.target/i386/pr55845.c: New test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} AVX target.

OK for mainline and 4.7 branch?

Uros.
Jakub Jelinek - Jan. 6, 2013, 4:22 p.m.
On Sun, Jan 06, 2013 at 04:48:03PM +0100, Uros Bizjak wrote:
> --- df-problems.c	(revision 194945)
> +++ df-problems.c	(working copy)
> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>  	break;
>        if (NONDEBUG_INSN_P (insn))
>  	{
> +	  /* Do not move unspec_volatile insns.  */
> +	  if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
> +	    break;
> +

Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
across_to loop?  Both UNSPEC_VOLATILE and volatile asm are handled there
just with
	trapping_insns_in_across |= may_trap_p (PATTERN (insn));
but your new change doesn't prevent moving just trapping insns across
UNSPEC_VOLATILE, but any insns whatsoever.  So supposedly for UNSPEC_VOLATILE
the first loop should just return false; (or fail = 1; ?).
For asm volatile I guess the code is fine as is, it must always describe
what exactly it modifies, so supposedly non-trapping insns can be moved
across asm volatile.

>  	  if (may_trap_or_fault_p (PATTERN (insn))
>  	      && (trapping_insns_in_across || other_branch_live != NULL))
>  	    break;

You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
may trap.

BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
pattern and all other UNSPEC_VOLATILE insns must describe in detail what
exactly they are changing?  This really needs to be better documented.

	Jakub
Uros Bizjak - Jan. 7, 2013, 4:52 p.m.
On Sun, Jan 6, 2013 at 5:22 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>> --- df-problems.c     (revision 194945)
>> +++ df-problems.c     (working copy)
>> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>>       break;
>>        if (NONDEBUG_INSN_P (insn))
>>       {
>> +       /* Do not move unspec_volatile insns.  */
>> +       if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
>> +         break;
>> +
>
> Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
> across_to loop?  Both UNSPEC_VOLATILE and volatile asm are handled there
> just with
>         trapping_insns_in_across |= may_trap_p (PATTERN (insn));
> but your new change doesn't prevent moving just trapping insns across
> UNSPEC_VOLATILE, but any insns whatsoever.  So supposedly for UNSPEC_VOLATILE
> the first loop should just return false; (or fail = 1; ?).
> For asm volatile I guess the code is fine as is, it must always describe
> what exactly it modifies, so supposedly non-trapping insns can be moved
> across asm volatile.
>
>>         if (may_trap_or_fault_p (PATTERN (insn))
>>             && (trapping_insns_in_across || other_branch_live != NULL))
>>           break;
>
> You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
> may trap.
>
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing?  This really needs to be better documented.

TBH, I'm not that familiar with the RTL infrastructure enough to
answer these questions. While I can spend some time on this problem,
and probably waste quite some reviewer's time, the problem is not that
trivial as I hoped to be, so I would kindly ask someone with better
understanding of this part of the compiler for the proper solution.

Uros.

Patch

Index: df-problems.c
===================================================================
--- df-problems.c	(revision 194945)
+++ df-problems.c	(working copy)
@@ -3916,6 +3916,10 @@  can_move_insns_across (rtx from, rtx to, rtx acros
 	break;
       if (NONDEBUG_INSN_P (insn))
 	{
+	  /* Do not move unspec_volatile insns.  */
+	  if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
+	    break;
+
 	  if (may_trap_or_fault_p (PATTERN (insn))
 	      && (trapping_insns_in_across || other_branch_live != NULL))
 	    break;
Index: testsuite/gcc.target/i386/pr55845.c
===================================================================
--- testsuite/gcc.target/i386/pr55845.c	(revision 0)
+++ testsuite/gcc.target/i386/pr55845.c	(working copy)
@@ -0,0 +1,39 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#define N 100
+
+double
+__attribute__((noinline))
+foo (int size, double y[], double x[])
+{
+  double sum = 0.0;
+  int i;
+  for (i = 0, sum = 0.; i < size; i++)
+    sum += y[i] * x[i];
+  return (sum);
+}
+
+static void
+__attribute__ ((noinline))
+avx_test ()
+{
+  double x[N];
+  double y[N];
+  double s;
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      x[i] = i;
+      y[i] = i;
+    }
+
+  s = foo (N, y, x);
+
+  if (s != 328350.0)
+    abort ();
+}