diff mbox

[ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr

Message ID 569E5281.2050009@st.com
State New
Headers show

Commit Message

Christian Bruel Jan. 19, 2016, 3:13 p.m. UTC
On 01/19/2016 04:01 PM, Christian Bruel wrote:
> Hi Richard,
>
> thanks for your input,
>
> On 01/18/2016 12:36 PM, Richard Biener wrote:
>> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
>>> When compiling code with attribute targets on arm or aarch64,
>>> vector_type_mode returns different results (eg Vmode or BLKmode) depending
>>> on the current simd flags that are not set between functions.
>>>
>>> for example the following code:
>>>
>>> #include <arm_neon.h>
>>>
>>> extern int8x8_t a;
>>> extern int8x8_t b;
>>>
>>> int16x8_t
>>> __attribute__ ((target("fpu=neon")))
>>> foo(void)
>>> {
>>>      return vaddl_s8 (a, b);
>>> }
>>>
>>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
>>> because the mismatch and DECL_MODE current's TYPE_MODE used in
>>> expand_builtin for global variables.
>>>
>>> but the best explanation is in the vector_type_mode:
>>> /* Vector types need to re-check the target flags each time we report
>>>       the machine mode.  We need to do this because attribute target can
>>>       change the result of vector_mode_supported_p and have_regs_of_mode
>>>       on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>>       change on a per-function basis.  */
>>>
>>> I first tried to hack the 2 machine descriptions to insert convert_to_mode
>>> or relayout_decls here and there, but I found this very fragile. Instead a
>>> more central relayout the of type while expanding gave good results, as
>>> proposed here.
>>>
>>> bootstraped and tested with no regression for arm, aarch64 and i586.
>>>
>>> Does this look to be the right approach ?
>>>
>>> nb: for testing this patch is complementary with
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>>>
>>> thanks for your comments.
>> A x86 specific testcase that ICEs as well:
>>
>> typedef int v8si __attribute__((vector_size(32)));
>> v8si a;
>> v8si __attribute__((target("avx"))) foo()
>> {
>>     return a;
>> }
>>
>> in your patch not using the shared DECL_RTL of the global var
>> "fixes" this so I think a conceptually better fix would be to
>> "adjust" DECL_RTL from globals via a adjust_address (or so).
>>
>> Also given that we do
>>
>>         /* ... fall through ...  */
>>
>>       case FUNCTION_DECL:
>>       case RESULT_DECL:
>>         decl_rtl = DECL_RTL (exp);
>>       expand_decl_rtl:
>>         gcc_assert (decl_rtl);
>>         decl_rtl = copy_rtx (decl_rtl);
>>
>> thus always "unshare" DECL_RTL anyway it might be not so
>> bad to simply do
>>
>>        decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>>
>> instead of that to avoid one copy.
>>
>> Index: expr.c
>> ===================================================================
>> --- expr.c      (revision 232496)
>> +++ expr.c      (working copy)
>> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
>>          decl_rtl = DECL_RTL (exp);
>>        expand_decl_rtl:
>>          gcc_assert (decl_rtl);
>> -      decl_rtl = copy_rtx (decl_rtl);
>> +      if (MEM_P (decl_rtl))
>> +       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>> +      else
>> +       decl_rtl = copy_rtx (decl_rtl);
>>          /* Record writes to register variables.  */
>>          if (modifier == EXPAND_WRITE
>>             && REG_P (decl_rtl)
>>
>> untested apart from on the x86_64 testcase (which it fixes).  One could guard
>> this further to only apply on vector typed decls with mismatched mode of course.
>>
>> I think that re-layouting globals is not very good design.
>>
>> Richard.
> A few other ICEs with this implementation, for instance if the context
> is not in a function, such as
>
> typedef __simd64_int8_t int8x8_t;
>
> extern int8x8_t b;
> int8x8_t *a = &b;
>
> So, to avoid a var re-layout and a copy_rtx (implied by adjust_address
> btw). What about just calling 'change_address' ? like: (very lightly tested)
>
> Index: expr.c
> ===================================================================
> --- expr.c    (revision 232564)
> +++ expr.c    (working copy)
> @@ -9392,7 +9392,8 @@
>                enum expand_modifier modifier, rtx *alt_rtl,
>                bool inner_reference_p)
>    {
> -  rtx op0, op1, temp, decl_rtl;
> +  rtx op0, op1, temp;
> +  rtx decl_rtl = NULL_RTX;
>      tree type;
>      int unsignedp;
>      machine_mode mode, dmode;
> @@ -9590,11 +9591,22 @@
>          && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
>        layout_decl (exp, 0);
>
> +      decl_rtl = DECL_RTL (exp);
> +
> +      if (MEM_P (decl_rtl)
> +      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
> +    {
> +      if (current_function_decl
> +          && (! reload_completed && !reload_in_progress))
> +        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
> +    }
> +
>          /* ... fall through ...  */
>
>        case FUNCTION_DECL:
>        case RESULT_DECL:
> -      decl_rtl = DECL_RTL (exp);
> +      if (! decl_rtl)
> +    decl_rtl = DECL_RTL (exp);
>        expand_decl_rtl:
>          gcc_assert (decl_rtl);
>          decl_rtl = copy_rtx (decl_rtl);
>
> I'm not sure that moving the code in the 'expand_decl_rtl' label is
> best, as we'd need to test for exp and the case should only happen for
> global vars (not functions or results)

Here is the alternative implementation, shorter after all. testing in 
progress.


>
> thanks,
>

Comments

Richard Biener Jan. 19, 2016, 3:18 p.m. UTC | #1
On Tue, Jan 19, 2016 at 4:13 PM, Christian Bruel <christian.bruel@st.com> wrote:
>
>
> On 01/19/2016 04:01 PM, Christian Bruel wrote:
>>
>> Hi Richard,
>>
>> thanks for your input,
>>
>> On 01/18/2016 12:36 PM, Richard Biener wrote:
>>>
>>> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com>
>>> wrote:
>>>>
>>>> When compiling code with attribute targets on arm or aarch64,
>>>> vector_type_mode returns different results (eg Vmode or BLKmode)
>>>> depending
>>>> on the current simd flags that are not set between functions.
>>>>
>>>> for example the following code:
>>>>
>>>> #include <arm_neon.h>
>>>>
>>>> extern int8x8_t a;
>>>> extern int8x8_t b;
>>>>
>>>> int16x8_t
>>>> __attribute__ ((target("fpu=neon")))
>>>> foo(void)
>>>> {
>>>>      return vaddl_s8 (a, b);
>>>> }
>>>>
>>>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins
>>>> ,
>>>> because the mismatch and DECL_MODE current's TYPE_MODE used in
>>>> expand_builtin for global variables.
>>>>
>>>> but the best explanation is in the vector_type_mode:
>>>> /* Vector types need to re-check the target flags each time we report
>>>>       the machine mode.  We need to do this because attribute target can
>>>>       change the result of vector_mode_supported_p and have_regs_of_mode
>>>>       on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>>>       change on a per-function basis.  */
>>>>
>>>> I first tried to hack the 2 machine descriptions to insert
>>>> convert_to_mode
>>>> or relayout_decls here and there, but I found this very fragile. Instead
>>>> a
>>>> more central relayout the of type while expanding gave good results, as
>>>> proposed here.
>>>>
>>>> bootstraped and tested with no regression for arm, aarch64 and i586.
>>>>
>>>> Does this look to be the right approach ?
>>>>
>>>> nb: for testing this patch is complementary with
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>>>>
>>>> thanks for your comments.
>>>
>>> A x86 specific testcase that ICEs as well:
>>>
>>> typedef int v8si __attribute__((vector_size(32)));
>>> v8si a;
>>> v8si __attribute__((target("avx"))) foo()
>>> {
>>>     return a;
>>> }
>>>
>>> in your patch not using the shared DECL_RTL of the global var
>>> "fixes" this so I think a conceptually better fix would be to
>>> "adjust" DECL_RTL from globals via a adjust_address (or so).
>>>
>>> Also given that we do
>>>
>>>         /* ... fall through ...  */
>>>
>>>       case FUNCTION_DECL:
>>>       case RESULT_DECL:
>>>         decl_rtl = DECL_RTL (exp);
>>>       expand_decl_rtl:
>>>         gcc_assert (decl_rtl);
>>>         decl_rtl = copy_rtx (decl_rtl);
>>>
>>> thus always "unshare" DECL_RTL anyway it might be not so
>>> bad to simply do
>>>
>>>        decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>>>
>>> instead of that to avoid one copy.
>>>
>>> Index: expr.c
>>> ===================================================================
>>> --- expr.c      (revision 232496)
>>> +++ expr.c      (working copy)
>>> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
>>>          decl_rtl = DECL_RTL (exp);
>>>        expand_decl_rtl:
>>>          gcc_assert (decl_rtl);
>>> -      decl_rtl = copy_rtx (decl_rtl);
>>> +      if (MEM_P (decl_rtl))
>>> +       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>>> +      else
>>> +       decl_rtl = copy_rtx (decl_rtl);
>>>          /* Record writes to register variables.  */
>>>          if (modifier == EXPAND_WRITE
>>>             && REG_P (decl_rtl)
>>>
>>> untested apart from on the x86_64 testcase (which it fixes).  One could
>>> guard
>>> this further to only apply on vector typed decls with mismatched mode of
>>> course.
>>>
>>> I think that re-layouting globals is not very good design.
>>>
>>> Richard.
>>
>> A few other ICEs with this implementation, for instance if the context
>> is not in a function, such as
>>
>> typedef __simd64_int8_t int8x8_t;
>>
>> extern int8x8_t b;
>> int8x8_t *a = &b;
>>
>> So, to avoid a var re-layout and a copy_rtx (implied by adjust_address
>> btw). What about just calling 'change_address' ? like: (very lightly
>> tested)
>>
>> Index: expr.c
>> ===================================================================
>> --- expr.c    (revision 232564)
>> +++ expr.c    (working copy)
>> @@ -9392,7 +9392,8 @@
>>                enum expand_modifier modifier, rtx *alt_rtl,
>>                bool inner_reference_p)
>>    {
>> -  rtx op0, op1, temp, decl_rtl;
>> +  rtx op0, op1, temp;
>> +  rtx decl_rtl = NULL_RTX;
>>      tree type;
>>      int unsignedp;
>>      machine_mode mode, dmode;
>> @@ -9590,11 +9591,22 @@
>>          && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
>>        layout_decl (exp, 0);
>>
>> +      decl_rtl = DECL_RTL (exp);
>> +
>> +      if (MEM_P (decl_rtl)
>> +      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
>> +    {
>> +      if (current_function_decl
>> +          && (! reload_completed && !reload_in_progress))
>> +        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
>> +    }
>> +
>>          /* ... fall through ...  */
>>
>>        case FUNCTION_DECL:
>>        case RESULT_DECL:
>> -      decl_rtl = DECL_RTL (exp);
>> +      if (! decl_rtl)
>> +    decl_rtl = DECL_RTL (exp);
>>        expand_decl_rtl:
>>          gcc_assert (decl_rtl);
>>          decl_rtl = copy_rtx (decl_rtl);
>>
>> I'm not sure that moving the code in the 'expand_decl_rtl' label is
>> best, as we'd need to test for exp and the case should only happen for
>> global vars (not functions or results)
>
>
> Here is the alternative implementation, shorter after all. testing in
> progress.
>
> Index: expr.c
> ===================================================================
> --- expr.c    (revision 232570)
> +++ expr.c    (working copy)
> @@ -9597,6 +9597,15 @@
>        decl_rtl = DECL_RTL (exp);
>      expand_decl_rtl:
>        gcc_assert (decl_rtl);
> +
> +      if (exp && code == VAR_DECL && MEM_P (decl_rtl)
> +      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
> +    {
> +      if (current_function_decl
> +          && (! reload_completed && !reload_in_progress))

maybe just if (currently_expanding_to_rtl)?

But yes, this looks like a safe variant of the fix.

Richard.

> +        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
> +    }
> +
>        decl_rtl = copy_rtx (decl_rtl);
>        /* Record writes to register variables.  */
>        if (modifier == EXPAND_WRITE
>
>>
>> thanks,
>>
>
diff mbox

Patch

Index: expr.c
===================================================================
--- expr.c    (revision 232570)
+++ expr.c    (working copy)
@@ -9597,6 +9597,15 @@ 
        decl_rtl = DECL_RTL (exp);
      expand_decl_rtl:
        gcc_assert (decl_rtl);
+
+      if (exp && code == VAR_DECL && MEM_P (decl_rtl)
+      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
+    {
+      if (current_function_decl
+          && (! reload_completed && !reload_in_progress))
+        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
+    }
+
        decl_rtl = copy_rtx (decl_rtl);
        /* Record writes to register variables.  */
        if (modifier == EXPAND_WRITE