diff mbox

[RFC] Masked load/store vectorization (take 5)

Message ID alpine.LNX.2.00.1311141618430.4261@zhemvz.fhfr.qr
State New
Headers show

Commit Message

Richard Biener Nov. 14, 2013, 3:21 p.m. UTC
On Thu, 14 Nov 2013, Jakub Jelinek wrote:

> On Wed, Nov 13, 2013 at 01:21:03PM +0100, Jakub Jelinek wrote:
> > Sergey has kindly tested the patch on SPEC2k6, but on 4 tests it revealed
> > an ICE.  Here is an incremental fix for that, for now it just punts on
> > those.  In theory the invariant conditional loads could be handled e.g.
> > using gather, or perhaps V{,P}MOVMSKP{S,D,B} on the mask followed by
> > conditional scalar load of the value if any bits in the mask are set,
> > then broadcasting it.
> 
> And another issue, ICE in predcom.  The problem is that the patch creates
> DR_READ data ref for the MASK_LOAD internal call and !DR_READ data ref
> for the MASK_STORE, which is handled properly in the vectorizer, but
> apparently other data ref consumers aren't prepared for it.
> 
> So, in the spirit of the recommended change to do some dataref processing
> privately in tree-vect-data-refs.c, the first patch (bootstrapped/regtested
> on x86_64-linux and i686-linux) handles MASK_LOAD and MASK_STORE solely in
> the vectorizer and leaves those as failing the data ref analysis otherwise.
> 
> Or, alternatively, we would need to handle those or punt in the other
> consumers.  Untested second patch does the punting in predcom and phiopt,
> but there are others still unverified.  Apparently e.g. predcom isn't
> prepared to handle data refs on any calls at all, which can happen already
> before my patches, and apparently other passes aren't either.
> E.g. I've tried to create a testcase for pcom (which asserts that DR_STMT
> is either gimple assign or PHI):
> 
> struct S { int i; };
> __attribute__((const, noinline, noclone))
> struct S foo (int x)
> {
>   struct S s;
>   s.i = x;
>   return s;
> }
> 
> int a[2048], b[2048], c[2048], d[2048];
> struct S e[2048];
> 
> __attribute__((noinline, noclone)) void
> bar (void)
> {
>   int i;
>   for (i = 0; i < 1024; i++)
>     {
>       e[i] = foo (i);
>       a[i+2] = a[i] + a[i+1];
>       b[10] = b[10] + i;
>       c[i] = c[2047 - i];
>       d[i] = d[i + 1];
>     }
> }
> 
> int
> main ()
> {
>   int i;
>   bar ();
>   for (i = 0; i < 1024; i++)
>     if (e[i].i != i)
>       __builtin_abort ();
>   return 0;
> }
> 
> but to my surprise pcom didn't fail on it (because the data reference
> doesn't have a gimple type and thus isn't suitable_reference_p),
> but e.g. ldist miscompiled it (completely ignored it as if there wasn't
> e[i] = foo (i); and let it out of the loop, so now it fails at runtime).

Ah, yeah - that's because I removed the "rest of the stores" handling
and rely on all stores be requested to distribute.  Fixed by


> Richard, any preferences on whether you'd prefer MASK_LOAD/MASK_STORE
> to be handled as data references only for the vectorization, or also
> other passes?  Even in the former case, we need to investigate the
> non-vectorizer data ref users how they handle calls and fix the ldist
> bug.

Well certainly calls with "known" data references should be handled
in the passes (I remember improving it at some point but never got
around committing it because nobody made use of the improvement).

So if you can generate valid DRs for MASK_LOAD/MASK_STORE then do
so.  Valid as in "conservative correct" - that should be enough
for calls.

Richard.
diff mbox

Patch

Index: tree-loop-distribution.c
===================================================================
--- tree-loop-distribution.c    (revision 204787)
+++ tree-loop-distribution.c    (working copy)
@@ -1723,8 +1723,7 @@  tree_loop_distribution (void)
              if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
                ;
              /* Otherwise only distribute stores for now.  */
-             else if (!gimple_assign_single_p (stmt)
-                      || is_gimple_reg (gimple_assign_lhs (stmt)))
+             else if (gimple_vdef (stmt))
                continue;
 
              work_list.safe_push (stmt);