Message ID | 20191114230909.GF7528@ibm-toto.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | , V7, #6 of 7, Fix issues with vector extract and prefixed instructions | expand |
Hi! On Thu, Nov 14, 2019 at 06:09:09PM -0500, Michael Meissner wrote: > In this case, the current code re-uses the temporary for calculating the offset > of the element to load up the address of the vector, losing the offset. Reusing the same pseudo is *always* a bad idea. You get better optimisation if most code is "SSA-like": write to every pseudo only once. Of course you need to violate this where you woule have PHIs in actual SSA. > I needed to add a new constraint (em) in addition to new predicate functions. > I discovered that with the predicate function alone, the register allocator > would re-create the address. The constraint prevents this combination. It's a reasonable thing to have as a contraint, too. Once again, how should things work in inline assembler? Can we allow prefixed addressing there, or should "m" in inline asm mean "em"? > I also modified the vector extract code to generate a single PC-relative load > if the vector has a PC-relative address and the offset is constant. That should be handled by the RTL optimisers anyway, no? Or is this for post-reload splitters (a bad idea always). > * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support > for optimizing extracting a constant vector element from a vector > that uses a prefixed address. If the element number is variable > and the address uses a prefixed address, abort. It doesn't abort. Erm, wait, it *does* ICE. Please make that more prominent (in the code). It's not clear why you mention it in the changelog while allt he other details are just "add support"? I find the control flow very hard to read here. > + /* Optimize PC-relative addresses with a constant offset. */ > + else if (pcrel_p && CONST_INT_P (element_offset)) > + { > + rtx addr2 = addr; addr isn't used after this anyway, so you can clobber it just fine. > + /* With only one temporary base register, we can't support a PC-relative > + address added to a variable offset. This is because the PADDI instruction > + requires RA to be 0 when doing a PC-relative add (i.e. no register to add > + to). */ > + else if (pcrel_p) > + gcc_unreachable (); That comment suggests we just ICE when we get unwanted input. Such a comment belongs where we prevent such code from being formed in the first place (or nowhere, if it is obvious). Segher
On Tue, Nov 26, 2019 at 01:20:20PM -0600, Segher Boessenkool wrote: > Hi! > > On Thu, Nov 14, 2019 at 06:09:09PM -0500, Michael Meissner wrote: > > In this case, the current code re-uses the temporary for calculating the offset > > of the element to load up the address of the vector, losing the offset. > > Reusing the same pseudo is *always* a bad idea. You get better > optimisation if most code is "SSA-like": write to every pseudo only > once. Of course you need to violate this where you woule have PHIs in > actual SSA. Yes. I was describing what the current code does (and why I'm fixing it). It is a bug. > > I needed to add a new constraint (em) in addition to new predicate functions. > > I discovered that with the predicate function alone, the register allocator > > would re-create the address. The constraint prevents this combination. > > It's a reasonable thing to have as a contraint, too. > > Once again, how should things work in inline assembler? Can we allow > prefixed addressing there, or should "m" in inline asm mean "em"? At the moment, I think we should just not allow prefixed addresses in asm constructs at all. > > > I also modified the vector extract code to generate a single PC-relative load > > if the vector has a PC-relative address and the offset is constant. > > That should be handled by the RTL optimisers anyway, no? Or is this > for post-reload splitters (a bad idea always). You have it backwards. It is the RTL optimizers that combines the vector extract with the load in the first place. My code allows the combination and generates a single instruction. Otherwise, it would do a PLA and then do the vector extract with the address loaded. > > * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support > > for optimizing extracting a constant vector element from a vector > > that uses a prefixed address. If the element number is variable > > and the address uses a prefixed address, abort. > > It doesn't abort. Erm, wait, it *does* ICE. Please make that more > prominent (in the code). It's not clear why you mention it in the > changelog while allt he other details are just "add support"? > > I find the control flow very hard to read here. > > > + /* Optimize PC-relative addresses with a constant offset. */ > > + else if (pcrel_p && CONST_INT_P (element_offset)) > > + { > > + rtx addr2 = addr; > > addr isn't used after this anyway, so you can clobber it just fine. Generally I prefer not to reuse variables. > > + /* With only one temporary base register, we can't support a PC-relative > > + address added to a variable offset. This is because the PADDI instruction > > + requires RA to be 0 when doing a PC-relative add (i.e. no register to add > > + to). */ > > + else if (pcrel_p) > > + gcc_unreachable (); > > That comment suggests we just ICE when we get unwanted input. Such a > comment belongs where we prevent such code from being formed in the > first place (or nowhere, if it is obvious). The constraint and predicate is where we prevent it from occuring. The gcc_unreachable is just there as insurance that it didn't get recreated later. During testing before I added the constraint, I found the register allocator combining the two, even though the predicate didn't allow the combination. So the test is just to ICE out if the combination took place.
On Tue, Dec 03, 2019 at 01:20:04PM -0500, Michael Meissner wrote: > On Tue, Nov 26, 2019 at 01:20:20PM -0600, Segher Boessenkool wrote: > > > I needed to add a new constraint (em) in addition to new predicate functions. > > > I discovered that with the predicate function alone, the register allocator > > > would re-create the address. The constraint prevents this combination. > > > > It's a reasonable thing to have as a contraint, too. > > > > Once again, how should things work in inline assembler? Can we allow > > prefixed addressing there, or should "m" in inline asm mean "em"? > > At the moment, I think we should just not allow prefixed addresses in asm > constructs at all. We need to think about this *before GCC 10*. And have patches for it. So you are saying that yes, "m" in inline asm should mean "em". This probably makes sense, there is so much existing asm that does not work with prefixed addresses -- this is somewhet similar to how "m" in inline asm is not the same as "m<>" (as it is in the machine description). But then we need an extra constraint to _do_ allow prefixed addresses: there needs to be a way to write inline asm for such memory as well. > > > I also modified the vector extract code to generate a single PC-relative load > > > if the vector has a PC-relative address and the offset is constant. > > > > That should be handled by the RTL optimisers anyway, no? Or is this > > for post-reload splitters (a bad idea always). > > You have it backwards. It is the RTL optimizers that combines the vector > extract with the load in the first place. My code allows the combination and > generates a single instruction. Otherwise, it would do a PLA and then do the > vector extract with the address loaded. So, separate patch please, for a separate improvement. So that I can understand what is what. > > > + /* Optimize PC-relative addresses with a constant offset. */ > > > + else if (pcrel_p && CONST_INT_P (element_offset)) > > > + { > > > + rtx addr2 = addr; > > > > addr isn't used after this anyway, so you can clobber it just fine. > > Generally I prefer not to reuse variables. You shouldn't fear that. Instead, you should factor big routines, and/or simplify control flow. "Not reusing variables" (for the *same purpose* btw) is not the problem. The problem is that the code is too big and complex and irregular to simply understand. > > > + /* With only one temporary base register, we can't support a PC-relative > > > + address added to a variable offset. This is because the PADDI instruction > > > + requires RA to be 0 when doing a PC-relative add (i.e. no register to add > > > + to). */ > > > + else if (pcrel_p) > > > + gcc_unreachable (); > > > > That comment suggests we just ICE when we get unwanted input. Such a > > comment belongs where we prevent such code from being formed in the > > first place (or nowhere, if it is obvious). > > The constraint and predicate is where we prevent it from occuring. The > gcc_unreachable is just there as insurance that it didn't get recreated later. > During testing before I added the constraint, I found the register allocator > combining the two, even though the predicate didn't allow the combination. So > the test is just to ICE out if the combination took place. So there should be no comment here? A gcc_assert could be useful, but that is all? And the assert should be as early as possible, in general, like (almost) immediately after the function start. If you find you want to do it later, congratulations, you found a good place to factor this routine :-) Segher
Index: gcc/config/rs6000/constraints.md =================================================================== --- gcc/config/rs6000/constraints.md (revision 278173) +++ gcc/config/rs6000/constraints.md (working copy) @@ -202,6 +202,11 @@ (define_constraint "H" ;; Memory constraints +(define_memory_constraint "em" + "A memory operand that does not contain a prefixed address." + (and (match_code "mem") + (match_test "non_prefixed_memory (op, mode)"))) + (define_memory_constraint "es" "A ``stable'' memory operand; that is, one which does not include any automodification of the base register. Unlike @samp{m}, this constraint Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (revision 278177) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1836,3 +1836,24 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a memory address that does not use a prefixed +;; address. +(define_predicate "non_prefixed_memory" + (match_code "mem") +{ + /* If the operand is not a valid memory operand even if it is not prefixed, + do not return true. */ + if (!memory_operand (op, mode)) + return false; + + return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); +}) + +;; Return true if the operand is either a register or it is a non-prefixed +;; memory operand. +(define_predicate "reg_or_non_prefixed_memory" + (match_code "reg,subreg,mem") +{ + return gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode); +}) Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 278178) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6734,6 +6734,7 @@ rs6000_adjust_vec_address (rtx scalar_re rtx element_offset; rtx new_addr; bool valid_addr_p; + bool pcrel_p = pcrel_local_address (addr, Pmode); /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY. */ gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC); @@ -6771,6 +6772,38 @@ rs6000_adjust_vec_address (rtx scalar_re else if (REG_P (addr) || SUBREG_P (addr)) new_addr = gen_rtx_PLUS (Pmode, addr, element_offset); + /* Optimize PC-relative addresses with a constant offset. */ + else if (pcrel_p && CONST_INT_P (element_offset)) + { + rtx addr2 = addr; + HOST_WIDE_INT offset = INTVAL (element_offset); + + if (GET_CODE (addr2) == CONST) + addr2 = XEXP (addr2, 0); + + if (GET_CODE (addr2) == PLUS) + { + offset += INTVAL (XEXP (addr2, 1)); + addr2 = XEXP (addr2, 0); + } + + gcc_assert (SIGNED_34BIT_OFFSET_P (offset)); + if (offset) + { + addr2 = gen_rtx_PLUS (Pmode, addr2, GEN_INT (offset)); + new_addr = gen_rtx_CONST (Pmode, addr2); + } + else + new_addr = addr2; + } + + /* With only one temporary base register, we can't support a PC-relative + address added to a variable offset. This is because the PADDI instruction + requires RA to be 0 when doing a PC-relative add (i.e. no register to add + to). */ + else if (pcrel_p) + gcc_unreachable (); + /* Optimize D-FORM addresses with constant offset with a constant element, to include the element offset in the address directly. */ else if (GET_CODE (addr) == PLUS) @@ -6785,8 +6818,11 @@ rs6000_adjust_vec_address (rtx scalar_re HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset); rtx offset_rtx = GEN_INT (offset); - if (IN_RANGE (offset, -32768, 32767) - && (scalar_size < 8 || (offset & 0x3) == 0)) + if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset)) + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + else if (SIGNED_16BIT_OFFSET_P (offset) + && (scalar_size < 8 || (offset & 0x3) == 0)) new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); else { @@ -6834,11 +6870,11 @@ rs6000_adjust_vec_address (rtx scalar_re new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } - /* If we have a PLUS, we need to see whether the particular register class - allows for D-FORM or X-FORM addressing. */ - if (GET_CODE (new_addr) == PLUS) + /* If we have a PLUS or a PC-relative address without the PLUS, we need to + see whether the particular register class allows for D-FORM or X-FORM + addressing. */ + if (GET_CODE (new_addr) == PLUS || pcrel_p) { - rtx op1 = XEXP (new_addr, 1); addr_mask_type addr_mask; unsigned int scalar_regno = reg_or_subregno (scalar_reg); @@ -6855,10 +6891,16 @@ rs6000_adjust_vec_address (rtx scalar_re else gcc_unreachable (); - if (REG_P (op1) || SUBREG_P (op1)) - valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; - else + if (pcrel_p) valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0; + else + { + rtx op1 = XEXP (new_addr, 1); + if (REG_P (op1) || SUBREG_P (op1)) + valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; + else + valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0; + } } else if (REG_P (new_addr) || SUBREG_P (new_addr)) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 278173) +++ gcc/config/rs6000/vsx.md (working copy) @@ -3248,9 +3248,10 @@ (define_insn "vsx_vslo_<mode>" ;; Variable V2DI/V2DF extract (define_insn_and_split "vsx_extract_<mode>_var" [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r") - (unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] - UNSPEC_VSX_EXTRACT)) + (unspec:<VS_scalar> + [(match_operand:VSX_D 1 "reg_or_non_prefixed_memory" "v,em,em") + (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) (clobber (match_scratch:V2DI 4 "=&v,X,X"))] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" @@ -3318,9 +3319,10 @@ (define_insn_and_split "*vsx_extract_v4s ;; Variable V4SF extract (define_insn_and_split "vsx_extract_v4sf_var" [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r") - (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] - UNSPEC_VSX_EXTRACT)) + (unspec:SF + [(match_operand:V4SF 1 "reg_or_non_prefixed_memory" "v,em,em") + (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) (clobber (match_scratch:V2DI 4 "=&v,X,X"))] "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT" @@ -3681,7 +3683,7 @@ (define_insn_and_split "*vsx_extract_<mo (define_insn_and_split "vsx_extract_<mode>_var" [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") (unspec:<VS_scalar> - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_prefixed_memory" "v,v,em") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,r,&b")) @@ -3701,7 +3703,7 @@ (define_insn_and_split "*vsx_extract_<mo [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") (zero_extend:<VS_scalar> (unspec:<VSX_EXTRACT_I:VS_scalar> - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_prefixed_memory" "v,v,em") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT))) (clobber (match_scratch:DI 3 "=r,r,&b")) Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (revision 278173) +++ gcc/doc/md.texi (working copy) @@ -3373,6 +3373,9 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va is not. +@item em +A memory operand that does not contain a prefixed address. + @item es A ``stable'' memory operand; that is, one which does not include any automodification of the base register. This used to be useful when