Message ID | AANLkTikKrZ9kODo051O+-wQZ6UOnrK0Civp6Cq_28Ly4@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>> >>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb)) >>>> >>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in >>>> move_or_delete_vzeroupper_2. This patch does it. >>> >>> Huh? The loop does simple linear scan of all insns in the bb, so it >>> can't miss BB_END. IIUC, in your case the bb does not have BB_END >>> (bb), but it has NEXT_INSN (BB_END (bb))? >> >> It has BB_END, but it won't be visited by NEXT_INSN starting from >> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the >> end of the BB everywhere in gcc. >> >>> Can you please provide a test case that illustrates this? >>> >> >> I am enclosing a work in progress. We noticed that we are >> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006. >> One isssue is we may have >> >> foo: >> >> call bar <<<<< Missing vzeroupper >> >> 256bit vectorized insn >> goto foo >> >> We miss vzeroupper before call bar. We don't have a small testcase. >> But this patch fixes this case by inspection. We are checking other >> cases. > > @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool > upper_128bits_set) > bb->index, upper_128bits_set); > > insn = BB_HEAD (bb); > + last = NEXT_INSN (BB_END (bb)); > while (insn != BB_END (bb)) > { > insn = NEXT_INSN (insn); > + if (insn == last) > + break; > > if (!NONDEBUG_INSN_P (insn)) > continue; > > The change above is not needed. The new check is never triggered - the > loop terminates when "insn == BB_END (bb)" at "while", so I fail to > see why additional termination for "NEXT_INSN (insn) == NEXT_INSN > (BB_END (bb))" is needed. Here is the patch for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519 We have 2 blocks pointing to each others. This patch first scans all blocks without moving vzeroupper so that we can have accurate information about upper 128bits at block entry. > (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped > with NEXT_INSN.) Please try gcc.target/i386/avx-vzeroupper-20.c. It will trigger this condition. > @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style) > > /* Emit vzeroupper if needed. */ > if (TARGET_VZEROUPPER > - && cfun->machine->use_avx256_p > + && (cfun->machine->use_avx256_p || flag_tree_vectorize) > && !cfun->machine->caller_return_avx256_p) > { > cfun->machine->use_vzeroupper_p = 1; > @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, > } > > /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration. */ > - if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p) > + if (TARGET_VZEROUPPER > + && (cfun->machine->use_avx256_p || flag_tree_vectorize)) > > Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with > -ftree-vectorize?! It looks that parts (or state machine) that set > ...->use_avx256_p flag should be fixed. There are: foo: call bar <<<<< Missing vzeroupper 256bit vectorized insn goto foo I couldn't find a hook to set use_avx256_p before RTL expansion starts. > { > rtx unspec; > int avx256; > diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c > b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c > new file mode 100644 > index 0000000..3301083 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */ > + > +extern void free (void *); > +void > +bar (void *ncstrp) > +{ > + if(ncstrp==((void *)0)) > + return; > + free(ncstrp); > +} > + > +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */ > > Hm, this testcase doesn't go together with the above change. There is > no vectorization involved, and the scan checks that vzeroupper is NOT > emitted. > This testcase is for insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool >> upper_128bits_set) >> bb->index, upper_128bits_set); >> >> insn = BB_HEAD (bb); >> + last = NEXT_INSN (BB_END (bb)); >> while (insn != BB_END (bb)) >> { >> insn = NEXT_INSN (insn); >> + if (insn == last) >> + break; >> >> if (!NONDEBUG_INSN_P (insn)) >> continue; >> >> The change above is not needed. The new check is never triggered - the >> loop terminates when "insn == BB_END (bb)" at "while", so I fail to >> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN >> (BB_END (bb))" is needed. > > Here is the patch for > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519 > > We have 2 blocks pointing to each others. This patch first scans > all blocks without moving vzeroupper so that we can have accurate > information about upper 128bits at block entry. > >> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped >> with NEXT_INSN.) > > Please try gcc.target/i386/avx-vzeroupper-20.c. It will > trigger this condition. It doesn't even trigger move_or_delete_vzeroupper. OK, I have hacked gcc a bit to blindly trigger the function in order to check the loop iself. For your testcase we process following insn stream: 2 bb 2 [10000] 3 bb 3 [7836] 4 bb 4 [2164] 1 NOTE_INSN_DELETED 4 NOTE_INSN_BASIC_BLOCK 17 NOTE_INSN_PROLOGUE_END 3 NOTE_INSN_FUNCTION_BEG 6 flags:CCZ=cmp(di:DI,0) 7 pc={(flags:CCZ==0)?L14:pc} REG_DEAD: flags:CCZ REG_BR_PROB: 0x874 8 NOTE_INSN_BASIC_BLOCK 20 NOTE_INSN_EPILOGUE_BEG 10 call <...> REG_DEAD: di:DI REG_EH_REGION: 0 i 11: barrier L14: 15 NOTE_INSN_BASIC_BLOCK 19 return i 18: barrier 16 NOTE_INSN_DELETED And the loop processed: Assembling functions: bar ====== 4 NOTE_INSN_BASIC_BLOCK ++++++++ 6 flags:CCZ=cmp(di:DI,0) ++++++++ 7 pc={(flags:CCZ==0)?L14:pc} REG_DEAD: flags:CCZ REG_BR_PROB: 0x874 ====== 8 NOTE_INSN_BASIC_BLOCK ++++++++ 10 call <...> REG_DEAD: di:DI REG_EH_REGION: 0 ====== L14: ++++++++ 23 {return;unspec{0;};} Where "===" precedes instruction, detected as BB_START and "+++" precedes instructions that passed !NONDEBUG_INSN_P filter. Please, can you show me the exact insn that was missed in this particular testcase. I have used "-O3 -mavx -mtune=generic" and I have to short-circuit the check for uses_vzeroupper_p at the end of ix86_reorg. Uros.
On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > Here is the patch for > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519 > > We have 2 blocks pointing to each others. This patch first scans > all blocks without moving vzeroupper so that we can have accurate > information about upper 128bits at block entry. This introduces another insn scanning pass, almost the same as existing vzeroupper pass (modulo CALL_INSN/JUMP_INSN handling). So, if I understand correctly: - The patch removes the detection if the function ever touches AVX registers. - Due to this, all call_insn RTXes have to be decorated with CALL_NEEDS_VZEROUPPER. - A new pre-pass is required that scans all functions in order to detect functions with live AVX registers at exit, and at the same time marks the functions that *do not* use AVX registers. - Existing pass then re-scans everything to again detect functions with live AVX registers at exit and handles vzeroupper emission. I don't think this approach is acceptable. Maybe a LCM infrastructure can be used to handle this case? Uros.
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c new file mode 100644 index 0000000..3301083 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */ + +extern void free (void *); +void +bar (void *ncstrp) +{ + if(ncstrp==((void *)0)) + return; + free(ncstrp); +} + +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */