Message ID | 4F5A4FFE.9000203@gjlay.de |
---|---|
State | New |
Headers | show |
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html Georg-Johann Lay wrote: > The problem with the PR is that lower-subreg.c happily splits multi-byte moves > from address spaces without knowing anything about the additional costs this is > causing. > > The TARGET_MODE_DEPENDENT_ADDRESS_P hook cannot be used for 16-bit addresses > because that hook is not sensitive to address spaces, but is is used for the > 24-bit address space to avoid subreg lowering for PSImode. > > For the 16-bit address spaces the mov expander now assigns the address register > by hand as post-increment. > > Luckily, post-increment is the only addressing mode that makes sense with the > non-generic address spaces and there is no choice for the address register > resp. addressing mode, anyway... > > This patch does not fix the PR issue, of course, it just avoids subreg lowering > by using/pretending mode-dependent addresses. > > Ok for trunk? > > Johann > > PR rtl-optimization/52543 > * config/avr/avr.c (avr_mode_dependent_address_p): New function. > (TARGET_MODE_DEPENDENT_ADDRESS_P): New define. > > * config/avr/avr.md (unspec): Add UNSPEC_LPM. > (load_<mode>_libgcc): Use UNSPEC_LPM instead of MEM. > (mov<mode>): For multi-byte move from non-generic > 16-bit address spaces: Expand to use Z++ as address for > inline code and use UNSPEC_LPM (Z) for code from libgcc. > (load<mode>_libgcc): Remove expander. > (split-lpmx): Remove split.
2012/3/19 Georg-Johann Lay <avr@gjlay.de>: > http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html > > Georg-Johann Lay wrote: >> The problem with the PR is that lower-subreg.c happily splits multi-byte moves >> from address spaces without knowing anything about the additional costs this is >> causing. >> >> The TARGET_MODE_DEPENDENT_ADDRESS_P hook cannot be used for 16-bit addresses >> because that hook is not sensitive to address spaces, but is is used for the >> 24-bit address space to avoid subreg lowering for PSImode. >> >> For the 16-bit address spaces the mov expander now assigns the address register >> by hand as post-increment. >> >> Luckily, post-increment is the only addressing mode that makes sense with the >> non-generic address spaces and there is no choice for the address register >> resp. addressing mode, anyway... >> >> This patch does not fix the PR issue, of course, it just avoids subreg lowering >> by using/pretending mode-dependent addresses. >> >> Ok for trunk? >> >> Johann >> >> PR rtl-optimization/52543 >> * config/avr/avr.c (avr_mode_dependent_address_p): New function. >> (TARGET_MODE_DEPENDENT_ADDRESS_P): New define. >> >> * config/avr/avr.md (unspec): Add UNSPEC_LPM. >> (load_<mode>_libgcc): Use UNSPEC_LPM instead of MEM. >> (mov<mode>): For multi-byte move from non-generic >> 16-bit address spaces: Expand to use Z++ as address for >> inline code and use UNSPEC_LPM (Z) for code from libgcc. >> (load<mode>_libgcc): Remove expander. >> (split-lpmx): Remove split. > Approved. Denis.
On Mar 9, 2012, at 10:46 AM, Georg-Johann Lay wrote: > The problem with the PR is that lower-subreg.c happily splits multi-byte moves > from address spaces without knowing anything about the additional costs this is > causing. Nasty, arm hit this sort of problem recently as well, and I've hit this sort of problem; here's to wishing the underlying problem gets fixed soon.
Dropping the first patch which does not work because at expand-time there must not be pre-/post-modify addressing :-( This solutions turns completely away from MEM and addressing modes: It represents loads from the 16-bits address-spaces as UNSPEC. The code is as expected now with the additional improvement that loads to RAMPZ can be factored out if the value is known to be the same (at least the LDI part; the OUT part is still needed). Moreover, the code gets simpler because loading the value to OUT to RAMPZ can be open coded and need not to be hidden in the insn because reload cannot handle these complicated addresses. And the patch fixes some more issues: - avr_load_libgcc_p must only allow __flash because __load_3/4 use LPM. - Resetting RAMPZ after ELPM for EBI-devices in avr_out_lpm was void in some situations because of premature return. This is fixed now; the new code is located in avr_load_lpm. Test suite results look good. There is just a ICE for gcc.target/avr/torture/addr-space-2-x.c -O3 -g which appears to be PR middle-end/52472 Ok to commit? Johann PR rtl-optimization/52543 PR target/52461 * config/avr/avr-protos.h (avr_load_lpm): New prototype. * config/avr/avr.c (avr_mode_dependent_address_p): New function. (TARGET_MODE_DEPENDENT_ADDRESS_P): New define. (avr_load_libgcc_p): Restrict to __flash loads. (avr_out_lpm): Only handle 1-byte loads from __flash. (avr_load_lpm): New function. (avr_find_unused_d_reg): Remove. (avr_out_lpm_no_lpmx): Remove. (adjust_insn_length): Handle ADJUST_LEN_LOAD_LPM. * config/avr/avr.md (unspec): Add UNSPEC_LPM. (load_<mode>_libgcc): Use UNSPEC_LPM instead of MEM. (load_<mode>, load_<mode>_clobber): New insns. (mov<mode>): For multi-byte move from non-generic 16-bit address spaces: Expand to load_<mode> resp. load_<mode>_clobber. (load<mode>_libgcc): Remove expander. (split-lpmx): Remove split. Georg-Johann Lay wrote: > The problem with the PR is that lower-subreg.c happily splits multi-byte moves > from address spaces without knowing anything about the additional costs this is > causing. > > The TARGET_MODE_DEPENDENT_ADDRESS_P hook cannot be used for 16-bit addresses > because that hook is not sensitive to address spaces, but is is used for the > 24-bit address space to avoid subreg lowering for PSImode. > > For the 16-bit address spaces the mov expander now assigns the address register > by hand as post-increment. > > Luckily, post-increment is the only addressing mode that makes sense with the > non-generic address spaces and there is no choice for the address register > resp. addressing mode, anyway... > > This patch does not fix the PR issue, of course, it just avoids subreg lowering > by using/pretending mode-dependent addresses. > > Ok for trunk? > > Johann > > PR rtl-optimization/52543 > * config/avr/avr.c (avr_mode_dependent_address_p): New function. > (TARGET_MODE_DEPENDENT_ADDRESS_P): New define. > > * config/avr/avr.md (unspec): Add UNSPEC_LPM. > (load_<mode>_libgcc): Use UNSPEC_LPM instead of MEM. > (mov<mode>): For multi-byte move from non-generic > 16-bit address spaces: Expand to use Z++ as address for > inline code and use UNSPEC_LPM (Z) for code from libgcc. > (load<mode>_libgcc): Remove expander. > (split-lpmx): Remove split.
On Tue, Mar 20, 2012 at 8:54 PM, Georg-Johann Lay <avr@gjlay.de> wrote: > Dropping the first patch which does not work because at expand-time there > must not be pre-/post-modify addressing :-( Have you tried to fix that, instead? Or at least ask around a bit to see what people would think about that idea? The reasons why things are the way they are, may not be applicable anymore. For example, perhaps the only reason for not having pre-/post-modify addressing modes earlier is that the old "flow" dataflow frame work didn't handle them. And it doesn't seem to be so black-and-white: The very pass you ran into problems with first, cprop, does handle pre-/post-modify addresses in local cprop. Some other passes simply take the conservative path and drop pre-/post-modify (like CSE, which doesn't record values from them). It may be a relatively small job to make everything accept them, and you may be something that's also helpful for other targets. Ciao! Steven
Steven Bosscher wrote: > On Tue, Mar 20, 2012 at 8:54 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >> Dropping the first patch which does not work because at expand-time there >> must not be pre-/post-modify addressing :-( > > Have you tried to fix that, instead? Or at least ask around a bit to > see what people would think about that idea? The reasons why things > are the way they are, may not be applicable anymore. No, I didn't try to fix it. I am not experienced enough in that field. Moreover, at least as far as avr is concerned, using post-inc would just be a hack, too. > For example, perhaps the only reason for not having pre-/post-modify > addressing modes earlier is that the old "flow" dataflow frame work > didn't handle them. And it doesn't seem to be so black-and-white: The > very pass you ran into problems with first, cprop, does handle > pre-/post-modify addresses in local cprop. Some other passes simply > take the conservative path and drop pre-/post-modify (like CSE, which The problems were not only in cprop but also in cselib. > doesn't record values from them). It may be a relatively small job to > make everything accept them, and you may be something that's also > helpful for other targets. > > Ciao! > Steven
Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 185105) +++ config/avr/avr.md (working copy) @@ -63,6 +63,7 @@ (define_c_enum "unspec" [UNSPEC_STRLEN UNSPEC_MOVMEM UNSPEC_INDEX_JMP + UNSPEC_LPM UNSPEC_FMUL UNSPEC_FMULS UNSPEC_FMULSU @@ -364,43 +365,24 @@ (define_split ;;======================================================================== ;; Move stuff around -;; "loadqi_libgcc" -;; "loadhi_libgcc" -;; "loadpsi_libgcc" -;; "loadsi_libgcc" -;; "loadsf_libgcc" -(define_expand "load<mode>_libgcc" - [(set (match_dup 3) - (match_dup 2)) - (set (reg:MOVMODE 22) - (match_operand:MOVMODE 1 "memory_operand" "")) - (set (match_operand:MOVMODE 0 "register_operand" "") - (reg:MOVMODE 22))] - "avr_load_libgcc_p (operands[1])" - { - operands[3] = gen_rtx_REG (HImode, REG_Z); - operands[2] = force_operand (XEXP (operands[1], 0), NULL_RTX); - operands[1] = replace_equiv_address (operands[1], operands[3]); - set_mem_addr_space (operands[1], ADDR_SPACE_FLASH); - }) +;; Represent a load from __flash that needs libgcc support as UNSPEC. +;; This is legal because we read from non-changing memory. +;; For rationale see the FIXME below. -;; "load_qi_libgcc" -;; "load_hi_libgcc" ;; "load_psi_libgcc" ;; "load_si_libgcc" ;; "load_sf_libgcc" (define_insn "load_<mode>_libgcc" [(set (reg:MOVMODE 22) - (match_operand:MOVMODE 0 "memory_operand" "m,m"))] - "avr_load_libgcc_p (operands[0]) - && REG_P (XEXP (operands[0], 0)) - && REG_Z == REGNO (XEXP (operands[0], 0))" + (unspec:MOVMODE [(reg:HI REG_Z)] + UNSPEC_LPM))] + "" { - operands[0] = GEN_INT (GET_MODE_SIZE (<MODE>mode)); - return "%~call __load_%0"; + rtx n_bytes = GEN_INT (GET_MODE_SIZE (<MODE>mode)); + output_asm_insn ("%~call __load_%0", &n_bytes); + return ""; } - [(set_attr "length" "1,2") - (set_attr "isa" "rjmp,jmp") + [(set_attr "type" "xcall") (set_attr "cc" "clobber")]) @@ -549,10 +531,53 @@ (define_expand "mov<mode>" DONE; } - if (avr_load_libgcc_p (src)) + /* ; FIXME: Load from non-generic 16-bit address spaces by means of + ; POST_INC or a call to a support routine from libgcc. Reason is the + ; extreme code bloat caused by subreg lowering which splits multi-byte + ; moves without knowing anything about the additional costs caused by + ; these splits, see PR rtl-optimization/52543. + ; We assign the address to Z already at expand-time because there is + ; no choice for the address register, anyway, and pre-post-increment + ; optimization is virtually non-existent. */ + + if (avr_load_libgcc_p (src) + || (GET_MODE_SIZE (<MODE>mode) > 1 + && avr_mem_flash_p (src) + && can_create_pseudo_p() + && (GET_CODE (XEXP (src, 0)) != POST_INC + || (GET_CODE (XEXP (src, 0)) == POST_INC + && REGNO (XEXP (XEXP (src, 0), 0)) != REG_Z)))) { - /* For the small devices, do loads per libgcc call. */ - emit_insn (gen_load<mode>_libgcc (dest, src)); + rtx addr = XEXP (src, 0); + rtx regz = gen_rtx_REG (Pmode, REG_Z); + + if (GET_CODE (addr) == POST_INC) + emit_move_insn (regz, XEXP (addr, 0)); + else + emit_move_insn (regz, force_reg (Pmode, addr)); + + if (avr_load_libgcc_p (src)) + { + /* For old devices without LPMX, prefer loads per libcall. */ + + emit_insn (gen_load_<mode>_libgcc ()); + emit_move_insn (operands[0], gen_rtx_REG (<MODE>mode, 22)); + if (GET_CODE (addr) == POST_INC) + emit_move_insn (XEXP (addr, 0), + plus_constant (XEXP (addr, 0), + GET_MODE_SIZE (<MODE>mode))); + } + else + { + /* For remaining multi-byte cases hard-code the address as Z++. */ + + emit_move_insn (operands[0], + replace_equiv_address (operands[1], + gen_rtx_POST_INC (Pmode, regz))); + if (GET_CODE (addr) == POST_INC) + emit_move_insn (XEXP (addr, 0), regz); + } + DONE; } }) @@ -694,40 +719,6 @@ (define_peephole2 ; movw_r operands[5] = gen_rtx_REG (HImode, REGNO (operands[3])); }) -;; For LPM loads from AS1 we split -;; R = *Z -;; to -;; R = *Z++ -;; Z = Z - sizeof (R) -;; -;; so that the second instruction can be optimized out. - -(define_split ; "split-lpmx" - [(set (match_operand:HISI 0 "register_operand" "") - (match_operand:HISI 1 "memory_operand" ""))] - "reload_completed - && AVR_HAVE_LPMX" - [(set (match_dup 0) - (match_dup 2)) - (set (match_dup 3) - (plus:HI (match_dup 3) - (match_dup 4)))] - { - rtx addr = XEXP (operands[1], 0); - - if (!avr_mem_flash_p (operands[1]) - || !REG_P (addr) - || reg_overlap_mentioned_p (addr, operands[0])) - { - FAIL; - } - - operands[2] = replace_equiv_address (operands[1], - gen_rtx_POST_INC (Pmode, addr)); - operands[3] = addr; - operands[4] = gen_int_mode (-GET_MODE_SIZE (<MODE>mode), HImode); - }) - ;;========================================================================== ;; xpointer move (24 bit) Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 185100) +++ config/avr/avr.c (working copy) @@ -1423,6 +1423,22 @@ avr_cannot_modify_jumps_p (void) } +/* Implement `TARGET_MODE_DEPENDENT_ADDRESS_P'. */ + +/* FIXME: PSImode addresses are not mode-dependent in themselves. + This hook just serves to hack around PR rtl-optimization/52543 by + claiming that PSImode addresses (which are used for the 24-bit + address space __memx) were mode-dependent so that lower-subreg.s + will skip these addresses. See also the similar FIXME comment along + with mov<mode> expanders in avr.md. */ + +static bool +avr_mode_dependent_address_p (const_rtx addr) +{ + return GET_MODE (addr) != Pmode; +} + + /* Helper function for `avr_legitimate_address_p'. */ static inline bool @@ -11027,6 +11043,9 @@ avr_fold_builtin (tree fndecl, int n_arg #undef TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS #define TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS avr_addr_space_legitimize_address +#undef TARGET_MODE_DEPENDENT_ADDRESS_P +#define TARGET_MODE_DEPENDENT_ADDRESS_P avr_mode_dependent_address_p + #undef TARGET_PRINT_OPERAND #define TARGET_PRINT_OPERAND avr_print_operand #undef TARGET_PRINT_OPERAND_ADDRESS