Patchwork PATCH: Properly check the end of basic block

login
register
mail settings
Submitter H.J. Lu
Date Nov. 17, 2010, 1:33 p.m.
Message ID <AANLkTikP-oy1DqUS8EmbghVF0-f35yi1bmPU0aj-1kXk@mail.gmail.com>
Download mbox | patch
Permalink /patch/71567/
State New
Headers show

Comments

H.J. Lu - Nov. 17, 2010, 1:33 p.m.
On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>
>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>
>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>> move_or_delete_vzeroupper_2.  This patch does it.
>
> Huh? The loop does simple linear scan of all insns in the bb, so it
> can't miss BB_END. IIUC, in your case the bb does not have BB_END
> (bb), but it has NEXT_INSN (BB_END (bb))?

It has BB_END, but it won't be visited by NEXT_INSN starting from
BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
end of the BB everywhere in gcc.

> Can you please provide a test case that illustrates this?
>

I am enclosing a work in progress.  We noticed that we are
missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
One isssue is we may have

foo:

       call bar <<<<< Missing vzeroupper

       256bit vectorized insn
       goto foo

We miss vzeroupper before call bar.  We don't have a small testcase.
But this patch fixes this case by inspection. We are checking other
cases.

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 11820cf..704a67d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -108,7 +108,7 @@  check_avx256_stores (rtx dest, const_rtx set, void *data)
 static void
 move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 {
-  rtx insn;
+  rtx insn, last;
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
@@ -118,9 +118,12 @@  move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     bb->index, upper_128bits_set);
 
   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;
 
       if (!NONDEBUG_INSN_P (insn))
 	continue;
@@ -10970,7 +10973,7 @@  ix86_expand_epilogue (int style)
 
   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
       && !cfun->machine->caller_return_avx256_p)
     {
       cfun->machine->use_vzeroupper_p = 1;
@@ -21661,7 +21664,8 @@  ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }
 
   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize))
     {
       rtx unspec;
       int avx256;
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */