Message ID | 01a001d7be7d$a7eaeea0$f7c0cbe0$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | x86_64: Some SUBREG related optimization tweaks to i386 backend. | expand |
On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch contains two SUBREG-related optimization enabling tweaks to > the x86 backend. > > The first change, to ix86_expand_vector_extract, cures the strange > -march=cascadelake related non-determinism that affected my new test > cases last week. Extracting a QImode or HImode element from an SSE > vector performs a zero-extension to SImode, which is currently > represented as: > > (set (subreg:SI (reg:QI target)) (zero_extend:SI (...)) > > Unfortunately, the semantics of this RTL doesn't quite match what was > intended. A set of a paradoxical subreg allows the high-bits to take > an arbitrary value (hence the non-determinism). A more correct > representation should be: > > (set (reg:SI temp) (zero_extend:SI (...)) > (set (reg:QI target) (subreg:QI (reg:SI temp)) > > Optionally with the SUBREG rtx annotated as SUBREG_PROMOTED_VAR_P to > indicate that value is already zero-extended in the SUBREG_REG. > > The second change is very similar, which is why I've included it in > this patch, where currently the early RTL optimizers can produce: > > (set (reg:V?? hardreg) (subreg ...)) > > where this instruction may require a spill/reload from memory when > the modes aren't tieable. Alas the presence of the hard register > prevents combine/gcse etc. optimizing this away, or reusing the result > which would increase the lifetime of the hard register before reload. > > The solution is to treat vector hard registers the same way as the > x86 backend handles scalar hard registers, and only allow sets from > pseudos before register allocation, which is achieved by checking > ix86_hardreg_mov_ok. Hence the above instruction is expanded and > maintained as: > > (set (reg:V?? pseudo) (subreg ...)) > (set (reg:V?? hardreg) (reg:V?? pseudo)) > > which allows the RTL optimizers freedom to optimize the SUBREG. > > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. In theory, my recent "obvious" > regexp fix to accommodate -march=cascadelake is no longer required, but > there's no harm leaving the testsuite as it is. > > Ok for mainline? > > > 2021-10-11 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > pseudo intermediate when moving a SUBREG into a hard register, > by checking ix86_hardreg_mov_ok. /* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () - && !register_operand (op0, mode) - && !register_operand (op1, mode)) + && (!ix86_hardreg_mov_ok (op0, op1) + || (!register_operand (op0, mode) + && !register_operand (op1, mode)))) { rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register. > (ix86_expand_vector_extract): Store zero-extended SImode > intermediate in a pseudo, then set target using a SUBREG_PROMOTED > annotated subreg. > * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating > complex (SUBREG) sets of (vector) hard registers before reload, by > checking ix86_hardreg_mov_ok. > > > Thanks in advance, > Roger > -- > -- BR, Hongtao
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 4780b99..44404bd 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -617,8 +617,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) /* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () - && !register_operand (op0, mode) - && !register_operand (op1, mode)) + && (!ix86_hardreg_mov_ok (op0, op1) + || (!register_operand (op0, mode) + && !register_operand (op1, mode)))) { rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); emit_move_insn (tmp, op1); @@ -16005,11 +16006,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) /* Let the rtl optimizers know about the zero extension performed. */ if (inner_mode == QImode || inner_mode == HImode) { + rtx reg = gen_reg_rtx (SImode); tmp = gen_rtx_ZERO_EXTEND (SImode, tmp); - target = gen_lowpart (SImode, target); + emit_move_insn (reg, tmp); + tmp = gen_lowpart (inner_mode, reg); + SUBREG_PROMOTED_VAR_P (tmp) = 1; + SUBREG_PROMOTED_SET (tmp, 1); } - emit_insn (gen_rtx_SET (target, tmp)); + emit_move_insn (target, tmp); } else { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4559b0c..e43f597 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1270,7 +1270,8 @@ " C,<sseconstm1>,vm,v"))] "TARGET_SSE && (register_operand (operands[0], <MODE>mode) - || register_operand (operands[1], <MODE>mode))" + || register_operand (operands[1], <MODE>mode)) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) {