diff mbox

[ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr

Message ID 569E4FCA.3070704@st.com
State New
Headers show

Commit Message

Christian Bruel Jan. 19, 2016, 3:01 p.m. UTC
Hi Richard,

thanks for your input,

On 01/18/2016 12:36 PM, Richard Biener wrote:
> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
>> When compiling code with attribute targets on arm or aarch64,
>> vector_type_mode returns different results (eg Vmode or BLKmode) depending
>> on the current simd flags that are not set between functions.
>>
>> for example the following code:
>>
>> #include <arm_neon.h>
>>
>> extern int8x8_t a;
>> extern int8x8_t b;
>>
>> int16x8_t
>> __attribute__ ((target("fpu=neon")))
>> foo(void)
>> {
>>     return vaddl_s8 (a, b);
>> }
>>
>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
>> because the mismatch and DECL_MODE current's TYPE_MODE used in
>> expand_builtin for global variables.
>>
>> but the best explanation is in the vector_type_mode:
>> /* Vector types need to re-check the target flags each time we report
>>      the machine mode.  We need to do this because attribute target can
>>      change the result of vector_mode_supported_p and have_regs_of_mode
>>      on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>      change on a per-function basis.  */
>>
>> I first tried to hack the 2 machine descriptions to insert convert_to_mode
>> or relayout_decls here and there, but I found this very fragile. Instead a
>> more central relayout the of type while expanding gave good results, as
>> proposed here.
>>
>> bootstraped and tested with no regression for arm, aarch64 and i586.
>>
>> Does this look to be the right approach ?
>>
>> nb: for testing this patch is complementary with
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>>
>> thanks for your comments.
> A x86 specific testcase that ICEs as well:
>
> typedef int v8si __attribute__((vector_size(32)));
> v8si a;
> v8si __attribute__((target("avx"))) foo()
> {
>    return a;
> }
>
> in your patch not using the shared DECL_RTL of the global var
> "fixes" this so I think a conceptually better fix would be to
> "adjust" DECL_RTL from globals via a adjust_address (or so).
>
> Also given that we do
>
>        /* ... fall through ...  */
>
>      case FUNCTION_DECL:
>      case RESULT_DECL:
>        decl_rtl = DECL_RTL (exp);
>      expand_decl_rtl:
>        gcc_assert (decl_rtl);
>        decl_rtl = copy_rtx (decl_rtl);
>
> thus always "unshare" DECL_RTL anyway it might be not so
> bad to simply do
>
>       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>
> instead of that to avoid one copy.
>
> Index: expr.c
> ===================================================================
> --- expr.c      (revision 232496)
> +++ expr.c      (working copy)
> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
>         decl_rtl = DECL_RTL (exp);
>       expand_decl_rtl:
>         gcc_assert (decl_rtl);
> -      decl_rtl = copy_rtx (decl_rtl);
> +      if (MEM_P (decl_rtl))
> +       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
> +      else
> +       decl_rtl = copy_rtx (decl_rtl);
>         /* Record writes to register variables.  */
>         if (modifier == EXPAND_WRITE
>            && REG_P (decl_rtl)
>
> untested apart from on the x86_64 testcase (which it fixes).  One could guard
> this further to only apply on vector typed decls with mismatched mode of course.
>
> I think that re-layouting globals is not very good design.
>
> Richard.

A few other ICEs with this implementation, for instance if the context 
is not in a function, such as

typedef __simd64_int8_t int8x8_t;

extern int8x8_t b;
int8x8_t *a = &b;

So, to avoid a var re-layout and a copy_rtx (implied by adjust_address 
btw). What about just calling 'change_address' ? like: (very lightly tested)


I'm not sure that moving the code in the 'expand_decl_rtl' label is 
best, as we'd need to test for exp and the case should only happen for 
global vars (not functions or results)

thanks,
diff mbox

Patch

Index: expr.c
===================================================================
--- expr.c    (revision 232564)
+++ expr.c    (working copy)
@@ -9392,7 +9392,8 @@ 
              enum expand_modifier modifier, rtx *alt_rtl,
              bool inner_reference_p)
  {
-  rtx op0, op1, temp, decl_rtl;
+  rtx op0, op1, temp;
+  rtx decl_rtl = NULL_RTX;
    tree type;
    int unsignedp;
    machine_mode mode, dmode;
@@ -9590,11 +9591,22 @@ 
        && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
      layout_decl (exp, 0);

+      decl_rtl = DECL_RTL (exp);
+
+      if (MEM_P (decl_rtl)
+      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
+    {
+      if (current_function_decl
+          && (! reload_completed && !reload_in_progress))
+        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
+    }
+
        /* ... fall through ...  */

      case FUNCTION_DECL:
      case RESULT_DECL:
-      decl_rtl = DECL_RTL (exp);
+      if (! decl_rtl)
+    decl_rtl = DECL_RTL (exp);
      expand_decl_rtl:
        gcc_assert (decl_rtl);
        decl_rtl = copy_rtx (decl_rtl);