Message ID | 20140508014846.GA5162@bubble.grove.modra.org |
---|---|
State | New |
Headers | show |
On Wed, May 7, 2014 at 9:48 PM, Alan Modra <amodra@gmail.com> wrote: > On powerpc64, to set a large loop count we have code like the > following after split1: > > (insn 67 14 68 4 (set (reg:DI 160) > (const_int 99942400 [0x5f50000])) /home/amodra/unaligned_load.c:14 -1 > (nil)) > (insn 68 67 42 4 (set (reg:DI 160) > (ior:DI (reg:DI 160) > (const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1 > (expr_list:REG_EQUAL (const_int 100000000 [0x5f5e100]) > (nil))) > > and then test for loop exit with: > > (jump_insn 65 31 45 5 (parallel [ > (set (pc) > (if_then_else (ne (reg:DI 160) > (const_int 1 [0x1])) > (label_ref:DI 42) > (pc))) > (set (reg:DI 160) > (plus:DI (reg:DI 160) > (const_int -1 [0xffffffffffffffff]))) > (clobber (scratch:CC)) > (clobber (scratch:DI)) > ]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1} > (int_list:REG_BR_PROB 9899 (nil)) > -> 42) > > The jump_insn of course is meant for use with bdnz, which implies a > strong preference for reg 160 to live in the count register. Trouble > is, the count register doesn't do arithmetic. > > So, use a new psuedo for intermediate results. On looking at this, > I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was > broken, apparently expecting c1 and c2 to be the high and low 32 bits > of the constant. That's no longer true, so I've fixed that as well. > Bootstrapped and regression tested powerpc64-linux. OK for mainline > and branches? > > PR target/61098 > * config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded > params and return value. Simplify. Update comment. > (rs6000_emit_set_long_const): Remove unneeded param and return > value. Correct !TARGET_POWERPC64 handling of constants > 2G. > If we can, use a new pseudo for intermediate calculations. Alan, The history is 32 bit HWI. The ChangeLog does not mention the changes to rs6000.md nor rs6000-protos.h. Please do not remove all of the comments from the two functions. The comments should provide some documentation about the different purposes of the two functions other than setting DEST to a CONST. Why did you remove the test for NULL dest? - if (dest == NULL) - dest = gen_reg_rtx (mode); That could occur, at least it used to occur. I think that the way you rearranged the invocations of copy_rtx() in rs6000_emit_set_long_const() is okay, but it would be good for someone else to double check. Thanks, David
On Thu, May 08, 2014 at 09:48:35AM -0400, David Edelsohn wrote: > The history is 32 bit HWI. Right. > The ChangeLog does not mention the changes to rs6000.md nor rs6000-protos.h. Oops, added. * config/rs6000/rs6000.md (movsi_internal1_single+1): Update call to rs6000_emit_set_const in splitter. (movdi_internal64+2, +3): Likewise. * config/rs6000/rs6000-protos.h (rs6000_emit_set_const): Update prototype. > Please do not remove all of the comments from the two functions. The > comments should provide some documentation about the different > purposes of the two functions other than setting DEST to a CONST. I believe my updated comment covers the complete purpose of the function nowadays. The comments I removed are out-dated, and should have been removed a long time ago.. rs6000_emit_set_const does not even look at N, it always returns a non-zero result, and the return is only tested for non-zero. I removed MODE too, because that is always the same as GET_MODE (dest). > Why did you remove the test for NULL dest? > > - if (dest == NULL) > - dest = gen_reg_rtx (mode); > > That could occur, at least it used to occur. I'm sure we can't get a NULL dest nowadays. All (three) uses of rs6000_emit_set_const occur in splitters. They all must have passed a gpc_reg_operand constraint on operands[0] before calling rs6000_emit_set_const, so if NULL were possible we'd segfault in gpc_reg_operand. > I think that the way you rearranged the invocations of copy_rtx() in > rs6000_emit_set_long_const() is okay, but it would be good for someone > else to double check. Yeah, that function is a bit messy. I took the approach of always use a bare "dest" once in the last instruction emitted, with every other use getting hit with copy_rtx. The previous approach was similar, but used the bare "dest" on the first instruction emitted. Obviously you don't need copy_rtx anywhere with the new code when can_create_pseudo_p is true, but I felt it wasn't worth optimising that for the added source complication.
On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: >> Please do not remove all of the comments from the two functions. The >> comments should provide some documentation about the different >> purposes of the two functions other than setting DEST to a CONST. > > I believe my updated comment covers the complete purpose of the > function nowadays. The comments I removed are out-dated, and should > have been removed a long time ago.. rs6000_emit_set_const does not > even look at N, it always returns a non-zero result, and the return is > only tested for non-zero. I removed MODE too, because that is always > the same as GET_MODE (dest). It is helpful if the comment expresses more than restating the information one can glean from the function name. It's useful to note that rs6000_emit_set_long_const is a standard decomposition with a bounded number of instructions. >> I think that the way you rearranged the invocations of copy_rtx() in >> rs6000_emit_set_long_const() is okay, but it would be good for someone >> else to double check. > > Yeah, that function is a bit messy. I took the approach of always use > a bare "dest" once in the last instruction emitted, with every other > use getting hit with copy_rtx. The previous approach was similar, > but used the bare "dest" on the first instruction emitted. Obviously > you don't need copy_rtx anywhere with the new code when > can_create_pseudo_p is true, but I felt it wasn't worth optimising > that for the added source complication. Can you help clarify the removal of the code that tests if the splitter failed? The splitters in the Alpha port follow mostly the same rhythm, with a little bit of further cleanup and consolidation relative to the rs6000 port. alpha_split_const_mov() falls back on alpha_emit_set_long_const(), but checks that the target is valid and allows the splitter to fail. Either the Alpha port is doing unnecessary work or this cleanup patch is too aggressive. Either way, a comment seems necessary. Thanks, David
On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: > On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: > > rs6000_emit_set_const ... always returns a non-zero result ... > > Can you help clarify the removal of the code that tests if the > splitter failed? See above.
On Mon, May 12, 2014 at 08:23:16AM +0930, Alan Modra wrote: > On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: > > On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: > > > rs6000_emit_set_const ... always returns a non-zero result ... > > > > Can you help clarify the removal of the code that tests if the > > splitter failed? > > See above. On thinking some more, let me retract the patch for the time being. While it's true that I was only removing dead code in the splitters, the question of why this code has become dead is worth looking into. I suspect a previous patch to rs6000_emit_set_const was wrong, and that we should in fact be failing before reload (but never after), when an expansion would take too many instructions. "Too many" being a sequence that is slower than loading a 64-bit constant from the TOC. We try to make that sort of tradeoff in rs6000_emit_move (see num_insn_constant call), but that is really too early. Some manipulations of code modify constants. I've seen cases where rs6000_emit_move decided to inline a constant, but later changes to the value meant the expansion was five instructions.
On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: > On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: > > >> Please do not remove all of the comments from the two functions. The > >> comments should provide some documentation about the different > >> purposes of the two functions other than setting DEST to a CONST. > > > > I believe my updated comment covers the complete purpose of the > > function nowadays. The comments I removed are out-dated, and should > > have been removed a long time ago.. rs6000_emit_set_const does not > > even look at N, it always returns a non-zero result, and the return is > > only tested for non-zero. I removed MODE too, because that is always > > the same as GET_MODE (dest). > > It is helpful if the comment expresses more than restating the > information one can glean from the function name. It's useful to note > that rs6000_emit_set_long_const is a standard decomposition with a > bounded number of instructions. > > >> I think that the way you rearranged the invocations of copy_rtx() in > >> rs6000_emit_set_long_const() is okay, but it would be good for someone > >> else to double check. > > > > Yeah, that function is a bit messy. I took the approach of always use > > a bare "dest" once in the last instruction emitted, with every other > > use getting hit with copy_rtx. The previous approach was similar, > > but used the bare "dest" on the first instruction emitted. Obviously > > you don't need copy_rtx anywhere with the new code when > > can_create_pseudo_p is true, but I felt it wasn't worth optimising > > that for the added source complication. > > Can you help clarify the removal of the code that tests if the > splitter failed? The splitters in the Alpha port follow mostly the > same rhythm, with a little bit of further cleanup and consolidation > relative to the rs6000 port. alpha_split_const_mov() falls back on > alpha_emit_set_long_const(), but checks that the target is valid and > allows the splitter to fail. Either the Alpha port is doing > unnecessary work or this cleanup patch is too aggressive. Either way, > a comment seems necessary. OK, I've had a good look at the history of this code. rs6000_emit_set_const and rs6000_emit_set_long_const were introduced with revision 44516, a largish patch by Dan Berlin. As you hint above, it seems the functions were copied from alpha. So the parameters were unnecessary and the comments just plain wrong for the rs6000 version of code right from the initial commit. Worse, only half of necessary infrastructure was copied from alpha.. So let me lay out what I believe should be happening with (set (reg) (constant)) At expand time, if the above set can't be implemented in a single instruction, then it should be decomposed to the equivalent set high part, ori low part, and possibly shift instructions so long as the resulting sequence is small. I think we basically do this correctly in rs6000_emit_move. See the num_insns_constant call there. Constants that can't be evaluated inline by two (or three) instructions will be replaced with a load from the TOC. The same thing ought to happen in the splitters that use rs6000_emit_set_const. rs6000_emit_set_const should refuse to expand to too many instructions (just like alpha). We don't do this, but if we did, this would leave some (set (reg) (constant)) instructions in the RTL. Alternatively the splitters could generate loads from the TOC, but see pr57836, which shows the loads from the TOC we crafted at expand time being reduced back to (set (reg) (constant)). Finally, at reload time, any remaining (set (reg) (constant)) (ie. those that result in a long inline sequence) should be forced to the TOC. This is the missing part of the infrastructure that wasn't copied from alpha. Our legitimate_constant_p needs to reject some constants.. As it is, reload simply expands to a four or five inline instruction sequence. David, I'd like some help with the legitimate_constant_p implementation. I have something that seems to work (not yet regression tested) but there are a number of things that I'm not clear on (eg. the revision 20229 change) so likely will get it wrong.
Alan, Danny may have re-organized the code, but I thought that it originally came from Tom Rixx, if not earlier. I seem to remember problems in the past with late creation of TOC entries for constants causing problems, so it was easier to fall back to materializing all integer constants inline. I don't remember the PRs, but I think there were issues with creating a TOC if the late constant were the only TOC reference, or maybe the issue was buying a stack frame to materialize the TOC/GOT for a late constant. And maximum 5 instruction sequence is not really bad relative to a load from the TOC (even with medium model / data in TOC). There are a lot of trade-offs with respect to I$ expansion versus the load hitting in the L1 D$. Alpha emit_set_const does limit the number of instructions, but that is a search for a more efficient sequence than the naive sequence. The Alpha splitters use alpha_split_const_mov(), which tries alpha_emit_set_const() for an efficient sequence and then falls back to alpha_emit_set_long_const() for a naive sequence. Alpha uses PLUS instead of IOR because of the way its ISA works. alpha_emit_set_long_const() always will materialize the constant and does not check for a maximum number of instructions. This is why it's comment says "fall back to straight forward decomposition". However, alpha_emit_set_long_const() and alpha_split_const_mov() can fail, presumably because emit_move_insn() fails, not because of reaching a maximum number of instructions. alpha_legitimate_constant_p() rejecs expensive constants early. Once the splitter is invoked, it always tries to materialize the constant, but the splitter apparently can fail for other reasons. I don't mind exploring the benefits of tighening up rs6000_legitimate_const(), but I'm not sure it's an obvious win, especially with the potential failure corner cases. However, I want to have a better understanding about the part of the patch that removes the FAIL path from the splitters. Thanks, David On Tue, May 13, 2014 at 11:04 PM, Alan Modra <amodra@gmail.com> wrote: > On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: >> On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: >> >> >> Please do not remove all of the comments from the two functions. The >> >> comments should provide some documentation about the different >> >> purposes of the two functions other than setting DEST to a CONST. >> > >> > I believe my updated comment covers the complete purpose of the >> > function nowadays. The comments I removed are out-dated, and should >> > have been removed a long time ago.. rs6000_emit_set_const does not >> > even look at N, it always returns a non-zero result, and the return is >> > only tested for non-zero. I removed MODE too, because that is always >> > the same as GET_MODE (dest). >> >> It is helpful if the comment expresses more than restating the >> information one can glean from the function name. It's useful to note >> that rs6000_emit_set_long_const is a standard decomposition with a >> bounded number of instructions. >> >> >> I think that the way you rearranged the invocations of copy_rtx() in >> >> rs6000_emit_set_long_const() is okay, but it would be good for someone >> >> else to double check. >> > >> > Yeah, that function is a bit messy. I took the approach of always use >> > a bare "dest" once in the last instruction emitted, with every other >> > use getting hit with copy_rtx. The previous approach was similar, >> > but used the bare "dest" on the first instruction emitted. Obviously >> > you don't need copy_rtx anywhere with the new code when >> > can_create_pseudo_p is true, but I felt it wasn't worth optimising >> > that for the added source complication. >> >> Can you help clarify the removal of the code that tests if the >> splitter failed? The splitters in the Alpha port follow mostly the >> same rhythm, with a little bit of further cleanup and consolidation >> relative to the rs6000 port. alpha_split_const_mov() falls back on >> alpha_emit_set_long_const(), but checks that the target is valid and >> allows the splitter to fail. Either the Alpha port is doing >> unnecessary work or this cleanup patch is too aggressive. Either way, >> a comment seems necessary. > > OK, I've had a good look at the history of this code. > > rs6000_emit_set_const and rs6000_emit_set_long_const were introduced > with revision 44516, a largish patch by Dan Berlin. As you hint > above, it seems the functions were copied from alpha. So the > parameters were unnecessary and the comments just plain wrong for the > rs6000 version of code right from the initial commit. Worse, only > half of necessary infrastructure was copied from alpha.. > > So let me lay out what I believe should be happening with > (set (reg) (constant)) > > At expand time, if the above set can't be implemented in a single > instruction, then it should be decomposed to the equivalent set high > part, ori low part, and possibly shift instructions so long as the > resulting sequence is small. I think we basically do this correctly > in rs6000_emit_move. See the num_insns_constant call there. > Constants that can't be evaluated inline by two (or three) > instructions will be replaced with a load from the TOC. > > The same thing ought to happen in the splitters that use > rs6000_emit_set_const. rs6000_emit_set_const should refuse to expand > to too many instructions (just like alpha). We don't do this, but if > we did, this would leave some (set (reg) (constant)) instructions in > the RTL. Alternatively the splitters could generate loads from the > TOC, but see pr57836, which shows the loads from the TOC we crafted > at expand time being reduced back to (set (reg) (constant)). > > Finally, at reload time, any remaining (set (reg) (constant)) > (ie. those that result in a long inline sequence) should be forced to > the TOC. This is the missing part of the infrastructure that wasn't > copied from alpha. Our legitimate_constant_p needs to reject some > constants.. As it is, reload simply expands to a four or five > inline instruction sequence. > > David, I'd like some help with the legitimate_constant_p > implementation. I have something that seems to work (not yet > regression tested) but there are a number of things that I'm not clear > on (eg. the revision 20229 change) so likely will get it wrong. > > -- > Alan Modra > Australia Development Lab, IBM
Hi David, On Tue, May 13, 2014 at 11:46:20PM -0400, David Edelsohn wrote: > Danny may have re-organized the code, but I thought that it originally > came from Tom Rixx, if not earlier. OK, I'm not trying to apportion blame. My name is on plenty of dodgy code in the rs6000 backend too. :) > I seem to remember problems in the past with late creation of TOC > entries for constants causing problems, so it was easier to fall back > to materializing all integer constants inline. I don't remember the > PRs, but I think there were issues with creating a TOC if the late > constant were the only TOC reference, or maybe the issue was buying a > stack frame to materialize the TOC/GOT for a late constant. And > maximum 5 instruction sequence is not really bad relative to a load > from the TOC (even with medium model / data in TOC). There are a lot > of trade-offs with respect to I$ expansion versus the load hitting in > the L1 D$. Sure, but Steve will tell you that the 5 instruction sequence is both slower due to all the dependent ops, and results in larger code+data size. We definitely want to avoid it if possible, and pr67836 shows a case taken from glibc math library code where there should be no problem in using the TOC. > Alpha emit_set_const does limit the number of instructions, but that > is a search for a more efficient sequence than the naive sequence. The > Alpha splitters use alpha_split_const_mov(), which tries > alpha_emit_set_const() for an efficient sequence and then falls back > to alpha_emit_set_long_const() for a naive sequence. Alpha uses PLUS > instead of IOR because of the way its ISA works. > alpha_emit_set_long_const() always will materialize the constant and > does not check for a maximum number of instructions. This is why it's > comment says "fall back to straight forward decomposition". > > However, alpha_emit_set_long_const() and alpha_split_const_mov() can > fail, presumably because emit_move_insn() fails, not because of > reaching a maximum number of instructions. > > alpha_legitimate_constant_p() rejecs expensive constants early. Once > the splitter is invoked, it always tries to materialize the constant, > but the splitter apparently can fail for other reasons. No, that is wrong. alpha_emit_set_const does *not* always try to materialize the constant inline. It does so for constants that need more than three instructions only when TARGET_BUILD_CONSTANTS. > I don't mind exploring the benefits of tighening up > rs6000_legitimate_const(), but I'm not sure it's an obvious win, > especially with the potential failure corner cases. Yes, those potential corner cases have me worried too.. > However, I want to have a better understanding about the part of the > patch that removes the FAIL path from the splitters. That part was really quite simple. I was removing dead code. rs6000_emit_set_const has never returned anything but DEST, right from the initial commit. It can't be called with DEST == NULL, so "dest = gen_reg_rtx (mode)" is also dead code. However, I retracted that patch because I now think rs6000_emit_set_const should in fact sometimes result in the splitter failing, exactly as is done in the alpha port.
On Wed, May 14, 2014 at 5:56 AM, Alan Modra <amodra@gmail.com> wrote: >> I seem to remember problems in the past with late creation of TOC >> entries for constants causing problems, so it was easier to fall back >> to materializing all integer constants inline. I don't remember the >> PRs, but I think there were issues with creating a TOC if the late >> constant were the only TOC reference, or maybe the issue was buying a >> stack frame to materialize the TOC/GOT for a late constant. And >> maximum 5 instruction sequence is not really bad relative to a load >> from the TOC (even with medium model / data in TOC). There are a lot >> of trade-offs with respect to I$ expansion versus the load hitting in >> the L1 D$. > > Sure, but Steve will tell you that the 5 instruction sequence is both > slower due to all the dependent ops, and results in larger code+data > size. We definitely want to avoid it if possible, and pr67836 shows a > case taken from glibc math library code where there should be no > problem in using the TOC. I don't necessarily believe this is a win overall. If the constant reliably is in the L1 D$ (or maybe L2 D$) and accessed with a direct load (data in TOC or medium model), then yes. If it's farther away in the memory hierarchy, then it's not a win. I agree about the code expansion concern, which has its own secondary effects. If this is a constant in a tight loop, okay, but if it's a unique constant, it may not occur elsewhere in the code to be shared and may not be placed in the same cache line as other, recently accessed constants. This would push the load to L3 or farther. Also, remember that this same heuristic is used by AIX, which still defaults to small TOC model. So either the constant is in the TOC anchor constant pool, which hopefully will pre-load the anchor, or will be a constant in the TOC, possibly putting more pressure on TOC size and causing overflow. I am certain that there are anecdotal examples where it is a win for PPC64 Linux, but I would want more evidence that it's a general win. >> alpha_emit_set_long_const() always will materialize the constant and >> does not check for a maximum number of instructions. This is why it's >> comment says "fall back to straight forward decomposition". > No, that is wrong. alpha_emit_set_const does *not* always try to > materialize the constant inline. It does so for constants that need > more than three instructions only when TARGET_BUILD_CONSTANTS. I said that alpha_emit_set_long_const() always materializes the constant, but, as you say, it is not always called. alpha_emit_set_const() may fail if it requires too many instructions or the search depth is too deep. You seem to be referring to some of the logic in alpha_split_const_mov() as well. Again, this definitely is worth exploring. And I am confident that there are cases where loading the constant from memory is a win. I just don't have a good instinct if it is a win most of the time for a broad range of real-world applications. One optimization opportunity in GLIBC is not a general heuristic. I don't think that we know a lot about the context of the use of the constant to apply a finer-grained policy. I think the original code tried to put the constant in memory if it appeared before reload, when everything could be calculated correctly for prologue and materializing the TOC, but tried to materialze any constants that appeared during reload using splitters. That can avoid some of the problem corner cases. The code needs to handle PPC32 PPC64 eABI AIX Thanks, David
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 209926) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1068,7 +1069,7 @@ static tree rs6000_handle_longcall_attribute (tree static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *); static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); -static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool); static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool); static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t, @@ -7826,53 +7811,36 @@ rs6000_conditional_register_usage (void) } -/* Try to output insns to set TARGET equal to the constant C if it can - be done in less than N insns. Do all computations in MODE. - Returns the place where the output has been placed if it can be - done and the insns have been emitted. If it would take more than N - insns, zero is returned and no insns and emitted. */ +/* Output insns to set DEST equal to the constant SOURCE. */ -rtx -rs6000_emit_set_const (rtx dest, enum machine_mode mode, - rtx source, int n ATTRIBUTE_UNUSED) +void +rs6000_emit_set_const (rtx dest, rtx source) { - rtx result, insn, set; - HOST_WIDE_INT c0, c1; + enum machine_mode mode = GET_MODE (dest); + rtx temp, insn, set; + HOST_WIDE_INT c; + gcc_checking_assert (CONST_INT_P (source)); + c = INTVAL (source); switch (mode) { - case QImode: + case QImode: case HImode: - if (dest == NULL) - dest = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (VOIDmode, dest, source)); - return dest; + return; case SImode: - result = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); - emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (result), - GEN_INT (INTVAL (source) - & (~ (HOST_WIDE_INT) 0xffff)))); + emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (temp), + GEN_INT (c & (~ (HOST_WIDE_INT) 0xffff)))); emit_insn (gen_rtx_SET (VOIDmode, dest, - gen_rtx_IOR (SImode, copy_rtx (result), - GEN_INT (INTVAL (source) & 0xffff)))); - result = dest; + gen_rtx_IOR (SImode, copy_rtx (temp), + GEN_INT (c & 0xffff)))); break; case DImode: - switch (GET_CODE (source)) - { - case CONST_INT: - c0 = INTVAL (source); - c1 = -(c0 < 0); - break; - - default: - gcc_unreachable (); - } - - result = rs6000_emit_set_long_const (dest, c0, c1); + rs6000_emit_set_long_const (dest, c); break; default: @@ -7883,37 +7851,38 @@ rs6000_conditional_register_usage (void) set = single_set (insn); if (! CONSTANT_P (SET_SRC (set))) set_unique_reg_note (insn, REG_EQUAL, source); - - return result; } -/* Having failed to find a 3 insn sequence in rs6000_emit_set_const, - fall back to a straight forward decomposition. We do this to avoid - exponential run times encountered when looking for longer sequences - with rs6000_emit_set_const. */ -static rtx -rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c1, HOST_WIDE_INT c2) +/* Output insns to set a DImode DEST equal to the constant C. */ + +static void +rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) { + rtx temp; + if (!TARGET_POWERPC64) { - rtx operand1, operand2; + rtx hi, lo; - operand1 = operand_subword_force (dest, WORDS_BIG_ENDIAN == 0, - DImode); - operand2 = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN != 0, - DImode); - emit_move_insn (operand1, GEN_INT (c1)); - emit_move_insn (operand2, GEN_INT (c2)); + hi = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN == 0, + DImode); + lo = operand_subword_force (dest, WORDS_BIG_ENDIAN != 0, + DImode); + emit_move_insn (hi, GEN_INT (c >> 32)); + c = ((c & 0xffffffff) ^ 0x80000000) - 0x80000000; + emit_move_insn (lo, GEN_INT (c)); } else { HOST_WIDE_INT ud1, ud2, ud3, ud4; - ud1 = c1 & 0xffff; - ud2 = (c1 & 0xffff0000) >> 16; - c2 = c1 >> 32; - ud3 = c2 & 0xffff; - ud4 = (c2 & 0xffff0000) >> 16; + ud1 = c & 0xffff; + c = c >> 16; + ud2 = c & 0xffff; + c = c >> 16; + ud3 = c & 0xffff; + c = c >> 16; + ud4 = c & 0xffff; if ((ud4 == 0xffff && ud3 == 0xffff && ud2 == 0xffff && (ud1 & 0x8000)) || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000))) @@ -7922,67 +7891,74 @@ rs6000_conditional_register_usage (void) else if ((ud4 == 0xffff && ud3 == 0xffff && (ud2 & 0x8000)) || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000))) { - emit_move_insn (dest, GEN_INT (((ud2 << 16) ^ 0x80000000) - - 0x80000000)); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000)); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } else if (ud3 == 0 && ud4 == 0) { + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + gcc_assert (ud2 & 0x8000); - emit_move_insn (dest, GEN_INT (((ud2 << 16) ^ 0x80000000) - - 0x80000000)); + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000)); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); - emit_move_insn (copy_rtx (dest), + emit_move_insn (dest, gen_rtx_ZERO_EXTEND (DImode, gen_lowpart (SImode, - copy_rtx (dest)))); + copy_rtx (temp)))); } else if ((ud4 == 0xffff && (ud3 & 0x8000)) || (ud4 == 0 && ! (ud3 & 0x8000))) { - emit_move_insn (dest, GEN_INT (((ud3 << 16) ^ 0x80000000) - - 0x80000000)); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud3 << 16) ^ 0x80000000) - 0x80000000)); if (ud2 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud2))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ASHIFT (DImode, copy_rtx (dest), + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_ASHIFT (DImode, copy_rtx (temp), GEN_INT (16))); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } else { - emit_move_insn (dest, GEN_INT (((ud4 << 16) ^ 0x80000000) - - 0x80000000)); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); if (ud3 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud3))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ASHIFT (DImode, copy_rtx (dest), + emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_ASHIFT (DImode, copy_rtx (temp), GEN_INT (32))); if (ud2 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud2 << 16))); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } } - return dest; } /* Helper for the following. Get rid of [r+r] memory refs Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 209926) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -114,7 +114,7 @@ extern void rs6000_emit_cbranch (enum machine_mode extern char * output_cbranch (rtx, const char *, int, rtx); extern char * output_e500_flip_gt_bit (rtx, rtx); extern const char * output_probe_stack_range (rtx, rtx); -extern rtx rs6000_emit_set_const (rtx, enum machine_mode, rtx, int); +extern void rs6000_emit_set_const (rtx, rtx); extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx); extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx); extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx); Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 209926) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8978,12 +8978,8 @@ (ior:SI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], SImode, operands[1], 2); - - if (tem == operands[0]) - DONE; - else - FAIL; +{ rs6000_emit_set_const (operands[0], operands[1]); + DONE; }") (define_insn "*mov<mode>_internal2" @@ -10326,12 +10322,8 @@ [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], DImode, operands[1], 5); - - if (tem == operands[0]) - DONE; - else - FAIL; +{ rs6000_emit_set_const (operands[0], operands[1]); + DONE; }") (define_split @@ -10341,12 +10333,8 @@ [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], DImode, operands[1], 5); - - if (tem == operands[0]) - DONE; - else - FAIL; +{ rs6000_emit_set_const (operands[0], operands[1]); + DONE; }") ;; TImode/PTImode is similar, except that we usually want to compute the