From patchwork Sun Nov 4 09:29:55 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RFH] subreg of a vector without going through memory Date: Sat, 03 Nov 2012 23:29:55 -0000 From: Marc Glisse X-Patchwork-Id: 196984 Message-Id: To: gcc-patches@gcc.gnu.org Hello, trying to make some progress on PR 53101, I wrote the attached patch (it might be completely wrong for big endian, I don't know) (it is also missing a check that it isn't a paradoxical subreg) * simplify-rtx.c (simplify_subreg): For vectors, create a VEC_SELECT. However, when I compile this code on x86_64: typedef double v4 __attribute__((vector_size(32))); typedef double v2 __attribute__((vector_size(16))); v2 f(v4 x){ return *(v2*)&x; } I see in the *.combine dump: [...] Trying 6 -> 7: Successfully matched this instruction: (set (reg:V2DF 60 [ ]) (vec_select:V2DF (reg/v:V4DF 61 [ x ]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) ]))) rejecting combination of insns 6 and 7 original costs 4 + 16 = 20 replacement cost 32 [...] (note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 4 3 2 (set (reg/v:V4DF 61 [ x ]) (reg:V4DF 21 xmm0 [ x ])) v.cc:3 1123 {*movv4df_internal} (expr_list:REG_DEAD (reg:V4DF 21 xmm0 [ x ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (set (reg:OI 63 [ x ]) (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) v.cc:4 60 {*movoi_internal_avx} (expr_list:REG_DEAD (reg/v:V4DF 61 [ x ]) (nil))) (insn 7 6 11 2 (set (reg:V2DF 60 [ ]) (subreg:V2DF (reg:OI 63 [ x ]) 0)) v.cc:4 1124 {*movv2df_internal} (expr_list:REG_DEAD (reg:OI 63 [ x ]) (nil))) (insn 11 7 14 2 (set (reg/i:V2DF 21 xmm0) (reg:V2DF 60 [ ])) v.cc:5 1124 {*movv2df_internal} (expr_list:REG_DEAD (reg:V2DF 60 [ ]) (nil))) (insn 14 11 0 2 (use (reg/i:V2DF 21 xmm0)) v.cc:5 -1 (nil)) I am surprised by that high replacement cost that prevents the change. Is my approach wrong? Is there an issue with the evaluation of costs? The approach was suggested by Richard B: http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 193127) +++ simplify-rtx.c (working copy) @@ -5884,20 +5884,35 @@ simplify_subreg (enum machine_mode outer if (SCALAR_INT_MODE_P (outermode) && SCALAR_INT_MODE_P (innermode) && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) && byte == subreg_lowpart_offset (outermode, innermode)) { rtx tem = simplify_truncation (outermode, op, innermode); if (tem) return tem; } + if (VECTOR_MODE_P (innermode) + && GET_MODE_INNER (innermode) == (VECTOR_MODE_P (outermode) + ? GET_MODE_INNER (outermode) + : outermode)) + { + unsigned elem_size = GET_MODE_SIZE (GET_MODE_INNER (innermode)); + unsigned n = GET_MODE_SIZE (outermode) / elem_size; + unsigned start = byte / elem_size; + rtvec vec = rtvec_alloc (n); + for (unsigned i = 0; i < n; i++) + RTVEC_ELT (vec, i) = GEN_INT (start + i); + return simplify_gen_binary (VEC_SELECT, outermode, op, + gen_rtx_PARALLEL (VOIDmode, vec)); + } + return NULL_RTX; } /* Make a SUBREG operation or equivalent if it folds. */ rtx simplify_gen_subreg (enum machine_mode outermode, rtx op, enum machine_mode innermode, unsigned int byte) { rtx newx;