Message ID | 593669EA.9090005@foss.arm.com |
---|---|
State | New |
Headers | show |
On 06/06/2017 02:38 AM, Kyrill Tkachov wrote: > Hi all, > > Another vec_merge simplification that's missing from simplify-rtx.c is > transforming > a vec_merge of two vec_duplicates. For example: > (set (reg:V2DF 80) > (vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 84)) > (vec_duplicate:V2DF (reg:DF 81)) > (const_int 2))) > > Can be transformed into the simpler: > (set (reg:V2DF 80) > (vec_concat:V2DF (reg:DF 81) > (reg:DF 84))) > > I believe this should always be beneficial. > I'm still looking into finding a small testcase demonstrating this, but > on aarch64 SPEC > I've seen this eliminate some really bizzare codegen where GCC was > generating nonsense like: > ldr q18, [sp, 448] > ins v18.d[0], v23.d[0] > ins v18.d[1], v22.d[0] > > With q18 being pushed and popped off the stack in the prologue and > epilogue of the function! > These are large files from SPEC that I haven't been able to analyse yet > as to why GCC even attempts > to do that, but with this patch it doesn't try to load a register and > overwrite all its lanes. > This patch shaves off about 5k of code size from zeusmp on aarch64 at > -O3, so I believe it's a good > thing to do. > > Ok? > > Thanks, > Kyrill > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge > of two vec_duplicates into a vec_concat. OK. Though I'd really like to see a testcase to exercise the simplification. jeff
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 0727ca690e9d7f2c14907e3888e67da31ecb1ed6..ac7c4131c2ffef44e66cdc95f09b7bf4d4ce5192 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5760,6 +5760,24 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, if (!side_effects_p (otherop)) return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); } + + /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y) + (const_int n)) + with (vec_concat x y) or (vec_concat y x) depending on value + of N. */ + if (GET_CODE (op0) == VEC_DUPLICATE + && GET_CODE (op1) == VEC_DUPLICATE + && GET_MODE_NUNITS (GET_MODE (op0)) == 2 + && GET_MODE_NUNITS (GET_MODE (op1)) == 2 + && IN_RANGE (sel, 1, 2)) + { + rtx newop0 = XEXP (op0, 0); + rtx newop1 = XEXP (op1, 0); + if (sel == 2) + std::swap (newop0, newop1); + + return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); + } } if (rtx_equal_p (op0, op1)