Message ID | CAOvf_xwh02FHmgQDe+7Qj0Dec0c8LuqBjP5UaMkc4TwrdP11nA@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote: > + /* Figure out where permutation elements stay not in their > + respective lanes. */ > + for (i = 0, which = 0; i < nelt; ++i) > + { > + unsigned e = d->perm[i]; > + if (e != i) > + which |= (e < nelt ? 1 : 2); > + } > + /* We can pblend the part where elements stay not in their > + respective lanes only when these elements are all in one > + half of a permutation. > + {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective > + lanes, but both 8 and 9 >= 8 > + {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their > + respective lanes and 8 >= 8, but 2 not. */ > + if (which != 1 && which != 2) > + return false; I was about to suggest that you'd get more success by putting the blend first, and do the shuffle second. But I suppose it does cover a few cases that the other way would miss, e.g. { 0 4 7 3 } because we can't blend 0 and 4 (or 3 and 7) into the same vector. Whereas the direction you're trying can't handle { 0 6 6 1 } But that can be implemented with { 0 1 2 3 } { 4 5 6 7 } ----------- { 0 1 6 3 } (pblend) ----------- { 0 6 6 1 } (pshufb) So I guess we should cover these two cases in successive patches. > + if (!expand_vec_perm_blend (&dcopy1)) > + return false; > + > + return true; You should avoid doing any work in this function if the ISA isn't enabled. Don't wait until the last test for blend to fail. Separate that out from the start of expand_vec_perm_blend as a subroutine, perhaps. We should be able to prove that we've got a valid blend as input here, so I'd be more inclined to write ok = expand_vec_perm_blend (&dcopy1); gcc_assert (ok); return true; > + if (!expand_vec_perm_1 (&dcopy)) > + return false; If we know we have pblend, then we know we have pshufb, so again I don't see how expand_vec_perm_1 can fail. Another assert would be good. There is a point, earlier in the function, where we know whether we're going to succeed or not. I believe just after > + if (which != 1 && which != 2) > + return false; You should add a if (d->testing_p) return true; at that point. r~
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8827256..e1c8126 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -43185,6 +43185,76 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d) return ok; } +/* A subroutine of ix86_expand_vec_perm_const_1. Try to simplify + the permutation using the SSE4_1 pblendv instruction. Potentially + reduces permutaion from 2 pshufb and or to 1 pshufb and pblendv. */ + +static bool +expand_vec_perm_pblendv (struct expand_vec_perm_d *d) +{ + unsigned i, which, nelt = d->nelt; + struct expand_vec_perm_d dcopy, dcopy1; + + /* Figure out where permutation elements stay not in their + respective lanes. */ + for (i = 0, which = 0; i < nelt; ++i) + { + unsigned e = d->perm[i]; + if (e != i) + which |= (e < nelt ? 1 : 2); + } + /* We can pblend the part where elements stay not in their + respective lanes only when these elements are all in one + half of a permutation. + {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective + lanes, but both 8 and 9 >= 8 + {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their + respective lanes and 8 >= 8, but 2 not. */ + if (which != 1 && which != 2) + return false; + + /* First we apply one operand permutation to the part where + elements stay not in their respective lanes. */ + dcopy = *d; + if (which == 2) + dcopy.op0 = dcopy.op1 = d->op1; + else + dcopy.op0 = dcopy.op1 = d->op0; + dcopy.one_operand_p = true; + + for (i = 0; i < nelt; ++i) + { + unsigned e = d->perm[i]; + if (which == 2) + dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e); + } + + if (!expand_vec_perm_1 (&dcopy)) + return false; + + /* Next we put permuted elements into thier positions. */ + dcopy1 = *d; + if (which == 2) + dcopy1.op1 = dcopy.target; + else + dcopy1.op0 = dcopy.target; + + for (i = 0; i < nelt; ++i) + { + unsigned e = d->perm[i]; + if (which == 2) + dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e); + else + dcopy1.perm[i] = ((e < nelt) ? i : e); + } + + if (!expand_vec_perm_blend (&dcopy1)) + return false; + + return true; +} + static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d); /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to simplify @@ -44557,6 +44627,9 @@ ix86_expand_vec_perm_const_1 (struct