[ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
diff mbox

Message ID CAFiYyc1SNL14XMPY7WR6Sw4LvUsUmBXos2Gq_9nj87ka9Sxjyw@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Jan. 18, 2016, 11:36 a.m. UTC
On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
> When compiling code with attribute targets on arm or aarch64,
> vector_type_mode returns different results (eg Vmode or BLKmode) depending
> on the current simd flags that are not set between functions.
> for example the following code:
> #include <arm_neon.h>
> extern int8x8_t a;
> extern int8x8_t b;
> int16x8_t
> __attribute__ ((target("fpu=neon")))
> foo(void)
> {
>    return vaddl_s8 (a, b);
> }
> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
> because the mismatch and DECL_MODE current's TYPE_MODE used in
> expand_builtin for global variables.
> but the best explanation is in the vector_type_mode:
> /* Vector types need to re-check the target flags each time we report
>     the machine mode.  We need to do this because attribute target can
>     change the result of vector_mode_supported_p and have_regs_of_mode
>     on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>     change on a per-function basis.  */
> I first tried to hack the 2 machine descriptions to insert convert_to_mode
> or relayout_decls here and there, but I found this very fragile. Instead a
> more central relayout the of type while expanding gave good results, as
> proposed here.
> bootstraped and tested with no regression for arm, aarch64 and i586.
> Does this look to be the right approach ?
> nb: for testing this patch is complementary with
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
> thanks for your comments.

A x86 specific testcase that ICEs as well:

typedef int v8si __attribute__((vector_size(32)));
v8si a;
v8si __attribute__((target("avx"))) foo()
  return a;

in your patch not using the shared DECL_RTL of the global var
"fixes" this so I think a conceptually better fix would be to
"adjust" DECL_RTL from globals via a adjust_address (or so).

Also given that we do

      /* ... fall through ...  */

    case RESULT_DECL:
      decl_rtl = DECL_RTL (exp);
      gcc_assert (decl_rtl);
      decl_rtl = copy_rtx (decl_rtl);

thus always "unshare" DECL_RTL anyway it might be not so
bad to simply do

     decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);

instead of that to avoid one copy.

untested apart from on the x86_64 testcase (which it fixes).  One could guard
this further to only apply on vector typed decls with mismatched mode of course.

I think that re-layouting globals is not very good design.



diff mbox

Index: expr.c
--- expr.c      (revision 232496)
+++ expr.c      (working copy)
@@ -9597,7 +9597,10 @@  expand_expr_real_1 (tree exp, rtx target
       decl_rtl = DECL_RTL (exp);
       gcc_assert (decl_rtl);
-      decl_rtl = copy_rtx (decl_rtl);
+      if (MEM_P (decl_rtl))
+       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
+      else
+       decl_rtl = copy_rtx (decl_rtl);
       /* Record writes to register variables.  */
       if (modifier == EXPAND_WRITE
          && REG_P (decl_rtl)