Patchwork PATCH: Properly check the end of basic block

login
register
mail settings
Submitter Uros Bizjak
Date Nov. 17, 2010, 7:23 p.m.
Message ID <AANLkTikKrZ9kODo051O+-wQZ6UOnrK0Civp6Cq_28Ly4@mail.gmail.com>
Download mbox | patch
Permalink /patch/71612/
State New
Headers show

Comments

Uros Bizjak - Nov. 17, 2010, 7:23 p.m.
On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>
>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>
>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>> move_or_delete_vzeroupper_2.  This patch does it.
>>
>> Huh? The loop does simple linear scan of all insns in the bb, so it
>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>> (bb), but it has NEXT_INSN (BB_END (bb))?
>
> It has BB_END, but it won't be visited by NEXT_INSN starting from
> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
> end of the BB everywhere in gcc.
>
>> Can you please provide a test case that illustrates this?
>>
>
> I am enclosing a work in progress.  We noticed that we are
> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
> One isssue is we may have
>
> foo:
>
>       call bar <<<<< Missing vzeroupper
>
>       256bit vectorized insn
>       goto foo
>
> We miss vzeroupper before call bar.  We don't have a small testcase.
> But this patch fixes this case by inspection. We are checking other
> cases.

@@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
upper_128bits_set)
 	     bb->index, upper_128bits_set);

   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;

       if (!NONDEBUG_INSN_P (insn))
 	continue;

The change above is not needed. The new check is never triggered - the
loop terminates when "insn == BB_END (bb)" at "while", so I fail to
see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
(BB_END (bb))" is needed.

(The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
with NEXT_INSN.)

@@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)

   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
       && !cfun->machine->caller_return_avx256_p)
     {
       cfun->machine->use_vzeroupper_p = 1;
@@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }

   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize))

Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
-ftree-vectorize?! It looks that parts (or state machine) that set
...->use_avx256_p flag should be fixed.

     {
       rtx unspec;
       int avx256;

Hm, this testcase doesn't go together with the above change. There is
no vectorization involved, and the scan checks that vzeroupper is NOT
emitted.

Uros.
H.J. Lu - Nov. 17, 2010, 11:36 p.m.
On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>
>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>
>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>> move_or_delete_vzeroupper_2.  This patch does it.
>>>
>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>
>> It has BB_END, but it won't be visited by NEXT_INSN starting from
>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
>> end of the BB everywhere in gcc.
>>
>>> Can you please provide a test case that illustrates this?
>>>
>>
>> I am enclosing a work in progress.  We noticed that we are
>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
>> One isssue is we may have
>>
>> foo:
>>
>>       call bar <<<<< Missing vzeroupper
>>
>>       256bit vectorized insn
>>       goto foo
>>
>> We miss vzeroupper before call bar.  We don't have a small testcase.
>> But this patch fixes this case by inspection. We are checking other
>> cases.
>
> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
> upper_128bits_set)
>             bb->index, upper_128bits_set);
>
>   insn = BB_HEAD (bb);
> +  last = NEXT_INSN (BB_END (bb));
>   while (insn != BB_END (bb))
>     {
>       insn = NEXT_INSN (insn);
> +      if (insn == last)
> +       break;
>
>       if (!NONDEBUG_INSN_P (insn))
>        continue;
>
> The change above is not needed. The new check is never triggered - the
> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
> (BB_END (bb))" is needed.

Here is the patch for

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519

We have 2 blocks pointing to each others. This patch first scans
all blocks without moving vzeroupper so that we can have accurate
information about upper 128bits at block entry.

> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
> with NEXT_INSN.)

Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
trigger this condition.

> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
>
>   /* Emit vzeroupper if needed.  */
>   if (TARGET_VZEROUPPER
> -      && cfun->machine->use_avx256_p
> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
>       && !cfun->machine->caller_return_avx256_p)
>     {
>       cfun->machine->use_vzeroupper_p = 1;
> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
>     }
>
>   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
> -  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
> +  if (TARGET_VZEROUPPER
> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize))
>
> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
> -ftree-vectorize?! It looks that parts (or state machine) that set
> ...->use_avx256_p flag should be fixed.

There are:

foo:

      call bar <<<<< Missing vzeroupper

      256bit vectorized insn
      goto foo

I couldn't find a hook to set use_avx256_p before RTL expansion
starts.

>     {
>       rtx unspec;
>       int avx256;
> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> new file mode 100644
> index 0000000..3301083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
> +
> +extern void free (void *);
> +void
> +bar (void *ncstrp)
> +{
> +  if(ncstrp==((void *)0))
> +    return;
> +  free(ncstrp);
> +}
> +
> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>
> Hm, this testcase doesn't go together with the above change. There is
> no vectorization involved, and the scan checks that vzeroupper is NOT
> emitted.
>

This testcase is for

insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
Uros Bizjak - Nov. 18, 2010, 8:12 a.m.
On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

>> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
>> upper_128bits_set)
>>             bb->index, upper_128bits_set);
>>
>>   insn = BB_HEAD (bb);
>> +  last = NEXT_INSN (BB_END (bb));
>>   while (insn != BB_END (bb))
>>     {
>>       insn = NEXT_INSN (insn);
>> +      if (insn == last)
>> +       break;
>>
>>       if (!NONDEBUG_INSN_P (insn))
>>        continue;
>>
>> The change above is not needed. The new check is never triggered - the
>> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
>> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
>> (BB_END (bb))" is needed.
>
> Here is the patch for
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>
> We have 2 blocks pointing to each others. This patch first scans
> all blocks without moving vzeroupper so that we can have accurate
> information about upper 128bits at block entry.
>
>> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
>> with NEXT_INSN.)
>
> Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
> trigger this condition.

It doesn't even trigger move_or_delete_vzeroupper.

OK, I have hacked gcc a bit to blindly trigger the function in order
to check the loop iself. For your testcase we process following insn
stream:

 2 bb 2  [10000]
 3 bb 3  [7836]
 4 bb 4  [2164]
    1 NOTE_INSN_DELETED
    4 NOTE_INSN_BASIC_BLOCK
   17 NOTE_INSN_PROLOGUE_END
    3 NOTE_INSN_FUNCTION_BEG
    6 flags:CCZ=cmp(di:DI,0)
    7 pc={(flags:CCZ==0)?L14:pc}
      REG_DEAD: flags:CCZ
      REG_BR_PROB: 0x874
    8 NOTE_INSN_BASIC_BLOCK
   20 NOTE_INSN_EPILOGUE_BEG
   10 call <...>
      REG_DEAD: di:DI
      REG_EH_REGION: 0
i  11: barrier
L14:
   15 NOTE_INSN_BASIC_BLOCK
   19 return
i  18: barrier
   16 NOTE_INSN_DELETED

And the loop processed:

Assembling functions:
 bar
======
    4 NOTE_INSN_BASIC_BLOCK
++++++++
    6 flags:CCZ=cmp(di:DI,0)
++++++++
    7 pc={(flags:CCZ==0)?L14:pc}
      REG_DEAD: flags:CCZ
      REG_BR_PROB: 0x874
======
    8 NOTE_INSN_BASIC_BLOCK
++++++++
   10 call <...>
      REG_DEAD: di:DI
      REG_EH_REGION: 0
======
L14:
++++++++
   23 {return;unspec{0;};}

Where "===" precedes instruction, detected as BB_START and "+++"
precedes instructions that passed !NONDEBUG_INSN_P filter.

Please, can you show me the exact insn that was missed in this
particular testcase. I have used "-O3 -mavx -mtune=generic" and I have
to short-circuit the check for uses_vzeroupper_p at the end of
ix86_reorg.

Uros.
Uros Bizjak - Nov. 18, 2010, 9:11 a.m.
On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

> Here is the patch for
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>
> We have 2 blocks pointing to each others. This patch first scans
> all blocks without moving vzeroupper so that we can have accurate
> information about upper 128bits at block entry.

This introduces another insn scanning pass, almost the same as
existing vzeroupper pass (modulo CALL_INSN/JUMP_INSN handling).

So, if I understand correctly:
- The patch removes the detection if the function ever touches AVX registers.
- Due to this, all call_insn RTXes have to be decorated with
CALL_NEEDS_VZEROUPPER.
- A new pre-pass is required that scans all functions in order to
detect functions with live AVX registers at exit, and at the same time
marks the functions that *do not* use AVX registers.
- Existing pass then re-scans everything to again detect functions
with live AVX registers at exit and handles vzeroupper emission.

I don't think this approach is acceptable. Maybe a LCM infrastructure
can be used to handle this case?

Uros.

Patch

diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */