Patchwork [RFC,PR48941,/,51980] Rewrite arm_neon.h to use __builtin_shuffle

login
register
mail settings
Submitter Richard Guenther
Date June 12, 2012, 12:17 p.m.
Message ID <CAFiYyc2mzJEev0xjc6vKux5O+dMazGFEoPwTC+HYsMhW6UNAmQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/164402/
State New
Headers show

Comments

Richard Guenther - June 12, 2012, 12:17 p.m.
On Tue, Jun 12, 2012 at 2:12 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Jun 12, 2012 at 1:07 PM, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> On 12 June 2012 11:46, Richard Guenther <richard.guenther@gmail.com> wrote:
>>> On Tue, Jun 12, 2012 at 11:22 AM, Julian Brown <julian@codesourcery.com> wrote:
>>>> On Mon, 11 Jun 2012 16:46:27 +0100
>>>> Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I don't like the ML bits of the patch as it stands today and before
>>>>> committing I would like to clean up the ML bits quite a bit further
>>>>> especially in areas where I've put FIXMEs [...]
>>>>
>>>> I had a go at this, see attached. Untested. Note there are some
>>>> semantic differences in output:
>>>>
>>>>  vzipq_p8 (poly8x16_t __a, poly8x16_t __b)
>>>>  {
>>>>   poly8x16x2_t __rv;
>>>> -  uint8x16_t __mask1 = {0, 2};
>>>> -  uint8x16_t __mask2 = {1, 3};
>>>> -  __rv.val[0] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask1);
>>>> -  __rv.val[1] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask2);
>>>> +  uint8x16_t __mask1 = { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6,
>>>>  22, 7, 23 };
>>>> +  uint8x16_t __mask2 = { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29,
>>>>  14, 30, 15, 31 };
>>>> +  __rv.val[0] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask1);
>>>> +  __rv.val[1] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask2);
>>>
>>> You should get better code at -O0 when not using a temporary __mask1/__mask2
>>> but directly pasting the constant in the builtin call.
>>
>> I tried that yesterday but it didn't seem to help - from a quick peek
>> at the dumps it looks like we could do with some limited const prop
>> just for the vec_perm expand cases.
>>
>>      D.14032 = { 0, 8, 1, 9, 2, 10, 3, 11 };
>>      D.14044 = VEC_PERM_EXPR <__a, __b, D.14032>;
>>
>> That's what I see from the dumps and from a quick skim of the sources
>> - my suspicion is that lower_vec_perm in tree-vect-generic.c is where
>> we could try doing a limited constant propagation in this case. ? Is
>> that where one should attempt to fix this ?
>>
>> Consider the following testcase at O0 rewritten with just
>> __builtin_shuffle so that you can see it on other platforms as well
>> that have vec_perm_const defined for doing the interleave style
>> operations. and look at what you get for O1. On arm-linux-gnueabi with
>> -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a9 at O0 you'd see it use
>> the generic permute operations and at O1 you'd see a vzip.32
>> instruction
>>
>>
>> typedef int v4si __attribute__ ((vector_size (16)));
>>
>> v4si vs (v4si a, v4si b)
>> {
>>  return __builtin_shuffle (a, b, (v4si) {0, 4, 1, 5});
>> }
>
> Ok, I see the C frontend hands us this as
>
>  return  VEC_PERM_EXPR < a , b , <<< Unknown tree: compound_literal_expr
>    v4si D.1712 = { 0, 4, 1, 5 }; >>> > ;
>
> and gimplification in some way fails to gimplify it to { 0, 4, 1, 5 }.  Yes,
> tree-vect-generic.c could just lookup the SSA def stmt in this case - it
> does so in most cases already.

Like with


pre-approved if it passes bootstrap & regtest.

Richard.

> Richard.
>
>>
>> regards
>> Ramana
>>
>>>
>>>>   return __rv;
>>>>  }
>>>>
>>>> I wasn't quite sure which version was correct -- but your version
>>>> doesn't seem to have enough elements for these cases?
>>>>
>>>> HTH,
>>>>
>>>> Julian
Ramana Radhakrishnan - June 12, 2012, 12:23 p.m.
>
> +  if (TREE_CODE (mask) == SSA_NAME)
> +    {
> +      gimple def_stmt = SSA_NAME_DEF_STMT (mask);
> +      if (is_gimple_assign (def_stmt)
> +         && gimple_assign_rhs_code (def_stmt) == VECTOR_CST)
> +       mask = gimple_assign_rhs1 (def_stmt);
> +    }
> +
>   if (TREE_CODE (mask) == VECTOR_CST)
>     {
>       unsigned char *sel_int = XALLOCAVEC (unsigned char, elements);
>
> pre-approved if it passes bootstrap & regtest.

Thanks that's similar to what I was wiring up. I'm happy to bootstrap
and regtest that along with my other changes.

Ramana

Patch

Index: tree-vect-generic.c
===================================================================
--- tree-vect-generic.c (revision 188428)
+++ tree-vect-generic.c (working copy)
@@ -628,6 +628,14 @@  lower_vec_perm (gimple_stmt_iterator *gs
   location_t loc = gimple_location (gsi_stmt (*gsi));
   unsigned i;

+  if (TREE_CODE (mask) == SSA_NAME)
+    {
+      gimple def_stmt = SSA_NAME_DEF_STMT (mask);
+      if (is_gimple_assign (def_stmt)
+         && gimple_assign_rhs_code (def_stmt) == VECTOR_CST)
+       mask = gimple_assign_rhs1 (def_stmt);
+    }
+
   if (TREE_CODE (mask) == VECTOR_CST)
     {
       unsigned char *sel_int = XALLOCAVEC (unsigned char, elements);