Message ID | ZW8CJOWPBqyA/uqm@arm.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
Alex Coplan <alex.coplan@arm.com> writes: > Hi, > > This is a v2 version which addresses feedback from Richard's review > here: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html > > I'll reply inline to address specific comments. > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? > > Thanks, > Alex > > -- >8 -- > > This patch overhauls the load/store pair patterns with two main goals: > > 1. Fixing a correctness issue (the current patterns are not RA-friendly). > 2. Allowing more flexibility in which operand modes are supported, and which > combinations of modes are allowed in the two arms of the load/store pair, > while reducing the number of patterns required both in the source and in > the generated code. > > The correctness issue (1) is due to the fact that the current patterns have > two independent memory operands tied together only by a predicate on the insns. > Since LRA only looks at the constraints, one of the memory operands can get > reloaded without the other one being changed, leading to the insn becoming > unrecognizable after reload. > > We fix this issue by changing the patterns such that they only ever have one > memory operand representing the entire pair. For the store case, we use an > unspec to logically concatenate the register operands before storing them. > For the load case, we use unspecs to extract the "lanes" from the pair mem, > with the second occurrence of the mem matched using a match_dup (such that there > is still really only one memory operand as far as the RA is concerned). > > In terms of the modes used for the pair memory operands, we canonicalize > these to V2x4QImode, V2x8QImode, and V2x16QImode. These modes have not > only the correct size but also correct alignment requirement for a > memory operand representing an entire load/store pair. Unlike the other > two, V2x4QImode didn't previously exist, so had to be added with the > patch. > > As with the previous patch generalizing the writeback patterns, this > patch aims to be flexible in the combinations of modes supported by the > patterns without requiring a large number of generated patterns by using > distinct mode iterators. > > The new scheme means we only need a single (generated) pattern for each > load/store operation of a given operand size. For the 4-byte and 8-byte > operand cases, we use the GPI iterator to synthesize the two patterns. > The 16-byte case is implemented as a separate pattern in the source (due > to only having a single possible alternative). > > Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code, > we add REG_CFA_OFFSET notes to the store pair insns emitted by > aarch64_save_callee_saves, so that correct CFI information can still be > generated. Furthermore, we now unconditionally generate these CFA > notes on frame-related insns emitted by aarch64_save_callee_saves. > This is done in case that the load/store pair pass forms these into > pairs, in which case the CFA notes would be needed. > > We also adjust the ldp/stp peepholes to generate the new form. This is > done by switching the generation to use the > aarch64_gen_{load,store}_pair interface, making it easier to change the > form in the future if needed. (Likewise, the upcoming aarch64 > load/store pair pass also makes use of this interface). > > This patch also adds an "ldpstp" attribute to the non-writeback > load/store pair patterns, which is used by the post-RA load/store pair > pass to identify existing patterns and see if they can be promoted to > writeback variants. > > One potential concern with using unspecs for the patterns is that it can block > optimization by the generic RTL passes. This patch series tries to mitigate > this in two ways: > 1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline. > 2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion to > emit individual loads/stores instead of ldp/stp. These should then be > formed back into load/store pairs much later in the RTL pipeline by the > new load/store pair pass. > > gcc/ChangeLog: > > * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp > representation from peepholes, allowing use of new form. > * config/aarch64/aarch64-modes.def (V2x4QImode): Define. > * config/aarch64/aarch64-protos.h > (aarch64_finish_ldpstp_peephole): Declare. > (aarch64_swap_ldrstr_operands): Delete declaration. > (aarch64_gen_load_pair): Adjust parameters. > (aarch64_gen_store_pair): Likewise. > * config/aarch64/aarch64-simd.md (load_pair<DREG:mode><DREG2:mode>): > Delete. > (vec_store_pair<DREG:mode><DREG2:mode>): Delete. > (load_pair<VQ:mode><VQ2:mode>): Delete. > (vec_store_pair<VQ:mode><VQ2:mode>): Delete. > * config/aarch64/aarch64.cc (aarch64_pair_mode_for_mode): New. > (aarch64_gen_store_pair): Adjust to use new unspec form of stp. > Drop second mem from parameters. > (aarch64_gen_load_pair): Likewise. > (aarch64_pair_mem_from_base): New. > (aarch64_save_callee_saves): Emit REG_CFA_OFFSET notes for > frame-related saves. Adjust call to aarch64_gen_store_pair > (aarch64_restore_callee_saves): Adjust calls to > aarch64_gen_load_pair to account for change in interface. > (aarch64_process_components): Likewise. > (aarch64_classify_address): Handle 32-byte pair mems in > LDP_STP_N case. > (aarch64_print_operand): Likewise. > (aarch64_copy_one_block_and_progress_pointers): Adjust calls to > account for change in aarch64_gen_{load,store}_pair interface. > (aarch64_set_one_block_and_progress_pointer): Likewise. > (aarch64_finish_ldpstp_peephole): New. > (aarch64_gen_adjusted_ldpstp): Adjust to use generation helper. > * config/aarch64/aarch64.md (ldpstp): New attribute. > (load_pair_sw_<SX:mode><SX2:mode>): Delete. > (load_pair_dw_<DX:mode><DX2:mode>): Delete. > (load_pair_dw_<TX:mode><TX2:mode>): Delete. > (*load_pair_<ldst_sz>): New. > (*load_pair_16): New. > (store_pair_sw_<SX:mode><SX2:mode>): Delete. > (store_pair_dw_<DX:mode><DX2:mode>): Delete. > (store_pair_dw_<TX:mode><TX2:mode>): Delete. > (*store_pair_<ldst_sz>): New. > (*store_pair_16): New. > (*load_pair_extendsidi2_aarch64): Adjust to use new form. > (*zero_extendsidi2_aarch64): Likewise. > * config/aarch64/iterators.md (VPAIR): New. > * config/aarch64/predicates.md (aarch64_mem_pair_operand): Change to > a special predicate derived from aarch64_mem_pair_operator. Thanks, looks great. OK for trunk with a couple of trivial changes: > [...] > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 8faaa748a05..c2e4f531254 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -6680,59 +6680,87 @@ aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment, > } > } > > -/* Generate and return a store pair instruction of mode MODE to store > - register REG1 to MEM1 and register REG2 to MEM2. */ > +/* Given an ldp/stp register operand mode MODE, return a suitable mode to use > + for a mem rtx representing the entire pair. */ > > -static rtx > -aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, > - rtx reg2) > -{ > - switch (mode) > - { > - case E_DImode: > - return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2); > +static machine_mode > +aarch64_pair_mode_for_mode (machine_mode mode) > +{ > + if (known_eq (GET_MODE_SIZE (mode), 4)) > + return V2x4QImode; > + else if (known_eq (GET_MODE_SIZE (mode), 8)) > + return V2x8QImode; > + else if (known_eq (GET_MODE_SIZE (mode), 16)) > + return V2x16QImode; > + else > + gcc_unreachable (); > +} > > - case E_DFmode: > - return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2); > +/* Given a base mem MEM with a mode suitable for an ldp/stp register operand, > + return an rtx like MEM which instead represents the entire pair. */ Might be worth saying "mode and address suitable...", given the assert for aarch64_mem_pair_lanes_operand. > > - case E_TFmode: > - return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); > +static rtx > +aarch64_pair_mem_from_base (rtx mem) > +{ > + auto pair_mode = aarch64_pair_mode_for_mode (GET_MODE (mem)); > + mem = adjust_bitfield_address_nv (mem, pair_mode, 0); > + gcc_assert (aarch64_mem_pair_lanes_operand (mem, pair_mode)); > + return mem; > +} > > - case E_V4SImode: > - return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); > +/* Generate and return a store pair instruction to store REG1 and REG2 > + into memory starting at BASE_MEM. All three rtxes should have modes of the > + same size. */ > > - case E_V16QImode: > - return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); > +rtx > +aarch64_gen_store_pair (rtx base_mem, rtx reg1, rtx reg2) > +{ > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); > > - default: > - gcc_unreachable (); > - } > + return gen_rtx_SET (pair_mem, > + gen_rtx_UNSPEC (GET_MODE (pair_mem), > + gen_rtvec (2, reg1, reg2), > + UNSPEC_STP)); > } > > -/* Generate and regurn a load pair isntruction of mode MODE to load register > - REG1 from MEM1 and register REG2 from MEM2. */ > +/* Generate and return a load pair instruction to load a pair of > + registers starting at BASE_MEM into REG1 and REG2. If CODE is > + UNKNOWN, all three rtxes should have modes of the same size. > + Otherwise, CODE is {SIGN,ZERO}_EXTEND, base_mem should be in SImode, > + and REG{1,2} should be in DImode. */ > > -static rtx > -aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, > - rtx mem2) > +rtx > +aarch64_gen_load_pair (rtx reg1, rtx reg2, rtx base_mem, enum rtx_code code) > { > - switch (mode) > - { > - case E_DImode: > - return gen_load_pair_dw_didi (reg1, mem1, reg2, mem2); > - > - case E_DFmode: > - return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2); > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); > > - case E_TFmode: > - return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); > + const bool any_extend_p = (code == ZERO_EXTEND || code == SIGN_EXTEND); > + if (any_extend_p) > + { > + gcc_checking_assert (GET_MODE (base_mem) == SImode > + && GET_MODE (reg1) == DImode > + && GET_MODE (reg2) == DImode); > + } No need for the braces. > + else > + gcc_assert (code == UNKNOWN); > + > + rtx unspecs[2] = { > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg1), > + gen_rtvec (1, pair_mem), > + UNSPEC_LDP_FST), > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg2), > + gen_rtvec (1, copy_rtx (pair_mem)), > + UNSPEC_LDP_SND) > + }; > > - case E_V4SImode: > - return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); > + if (any_extend_p) > + for (int i = 0; i < 2; i++) > + unspecs[i] = gen_rtx_fmt_e (code, DImode, unspecs[i]); > > - default: > - gcc_unreachable (); > - } > + return gen_rtx_PARALLEL (VOIDmode, > + gen_rtvec (2, > + gen_rtx_SET (reg1, unspecs[0]), > + gen_rtx_SET (reg2, unspecs[1]))); > } > > /* Return TRUE if return address signing should be enabled for the current > [...] > @@ -453,6 +456,11 @@ (define_attr "predicated" "yes,no" (const_string "no")) > ;; may chose to hold the tracking state encoded in SP. > (define_attr "speculation_barrier" "true,false" (const_string "false")) > > +;; Attribute use to identify load pair and store pair instructions. used > +;; Currently the attribute is only applied to the non-writeback ldp/stp > +;; patterns. > +(define_attr "ldpstp" "ldp,stp,none" (const_string "none")) > + > ;; ------------------------------------------------------------------- > ;; Pipeline descriptions and scheduling > ;; -------------------------------------------------------------------
On 12/12/2023 15:58, Richard Sandiford wrote: > Alex Coplan <alex.coplan@arm.com> writes: > > Hi, > > > > This is a v2 version which addresses feedback from Richard's review > > here: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html > > > > I'll reply inline to address specific comments. > > > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? > > > > Thanks, > > Alex > > > > -- >8 -- > > > > This patch overhauls the load/store pair patterns with two main goals: > > > > 1. Fixing a correctness issue (the current patterns are not RA-friendly). > > 2. Allowing more flexibility in which operand modes are supported, and which > > combinations of modes are allowed in the two arms of the load/store pair, > > while reducing the number of patterns required both in the source and in > > the generated code. > > > > The correctness issue (1) is due to the fact that the current patterns have > > two independent memory operands tied together only by a predicate on the insns. > > Since LRA only looks at the constraints, one of the memory operands can get > > reloaded without the other one being changed, leading to the insn becoming > > unrecognizable after reload. > > > > We fix this issue by changing the patterns such that they only ever have one > > memory operand representing the entire pair. For the store case, we use an > > unspec to logically concatenate the register operands before storing them. > > For the load case, we use unspecs to extract the "lanes" from the pair mem, > > with the second occurrence of the mem matched using a match_dup (such that there > > is still really only one memory operand as far as the RA is concerned). > > > > In terms of the modes used for the pair memory operands, we canonicalize > > these to V2x4QImode, V2x8QImode, and V2x16QImode. These modes have not > > only the correct size but also correct alignment requirement for a > > memory operand representing an entire load/store pair. Unlike the other > > two, V2x4QImode didn't previously exist, so had to be added with the > > patch. > > > > As with the previous patch generalizing the writeback patterns, this > > patch aims to be flexible in the combinations of modes supported by the > > patterns without requiring a large number of generated patterns by using > > distinct mode iterators. > > > > The new scheme means we only need a single (generated) pattern for each > > load/store operation of a given operand size. For the 4-byte and 8-byte > > operand cases, we use the GPI iterator to synthesize the two patterns. > > The 16-byte case is implemented as a separate pattern in the source (due > > to only having a single possible alternative). > > > > Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code, > > we add REG_CFA_OFFSET notes to the store pair insns emitted by > > aarch64_save_callee_saves, so that correct CFI information can still be > > generated. Furthermore, we now unconditionally generate these CFA > > notes on frame-related insns emitted by aarch64_save_callee_saves. > > This is done in case that the load/store pair pass forms these into > > pairs, in which case the CFA notes would be needed. > > > > We also adjust the ldp/stp peepholes to generate the new form. This is > > done by switching the generation to use the > > aarch64_gen_{load,store}_pair interface, making it easier to change the > > form in the future if needed. (Likewise, the upcoming aarch64 > > load/store pair pass also makes use of this interface). > > > > This patch also adds an "ldpstp" attribute to the non-writeback > > load/store pair patterns, which is used by the post-RA load/store pair > > pass to identify existing patterns and see if they can be promoted to > > writeback variants. > > > > One potential concern with using unspecs for the patterns is that it can block > > optimization by the generic RTL passes. This patch series tries to mitigate > > this in two ways: > > 1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline. > > 2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion to > > emit individual loads/stores instead of ldp/stp. These should then be > > formed back into load/store pairs much later in the RTL pipeline by the > > new load/store pair pass. > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp > > representation from peepholes, allowing use of new form. > > * config/aarch64/aarch64-modes.def (V2x4QImode): Define. > > * config/aarch64/aarch64-protos.h > > (aarch64_finish_ldpstp_peephole): Declare. > > (aarch64_swap_ldrstr_operands): Delete declaration. > > (aarch64_gen_load_pair): Adjust parameters. > > (aarch64_gen_store_pair): Likewise. > > * config/aarch64/aarch64-simd.md (load_pair<DREG:mode><DREG2:mode>): > > Delete. > > (vec_store_pair<DREG:mode><DREG2:mode>): Delete. > > (load_pair<VQ:mode><VQ2:mode>): Delete. > > (vec_store_pair<VQ:mode><VQ2:mode>): Delete. > > * config/aarch64/aarch64.cc (aarch64_pair_mode_for_mode): New. > > (aarch64_gen_store_pair): Adjust to use new unspec form of stp. > > Drop second mem from parameters. > > (aarch64_gen_load_pair): Likewise. > > (aarch64_pair_mem_from_base): New. > > (aarch64_save_callee_saves): Emit REG_CFA_OFFSET notes for > > frame-related saves. Adjust call to aarch64_gen_store_pair > > (aarch64_restore_callee_saves): Adjust calls to > > aarch64_gen_load_pair to account for change in interface. > > (aarch64_process_components): Likewise. > > (aarch64_classify_address): Handle 32-byte pair mems in > > LDP_STP_N case. > > (aarch64_print_operand): Likewise. > > (aarch64_copy_one_block_and_progress_pointers): Adjust calls to > > account for change in aarch64_gen_{load,store}_pair interface. > > (aarch64_set_one_block_and_progress_pointer): Likewise. > > (aarch64_finish_ldpstp_peephole): New. > > (aarch64_gen_adjusted_ldpstp): Adjust to use generation helper. > > * config/aarch64/aarch64.md (ldpstp): New attribute. > > (load_pair_sw_<SX:mode><SX2:mode>): Delete. > > (load_pair_dw_<DX:mode><DX2:mode>): Delete. > > (load_pair_dw_<TX:mode><TX2:mode>): Delete. > > (*load_pair_<ldst_sz>): New. > > (*load_pair_16): New. > > (store_pair_sw_<SX:mode><SX2:mode>): Delete. > > (store_pair_dw_<DX:mode><DX2:mode>): Delete. > > (store_pair_dw_<TX:mode><TX2:mode>): Delete. > > (*store_pair_<ldst_sz>): New. > > (*store_pair_16): New. > > (*load_pair_extendsidi2_aarch64): Adjust to use new form. > > (*zero_extendsidi2_aarch64): Likewise. > > * config/aarch64/iterators.md (VPAIR): New. > > * config/aarch64/predicates.md (aarch64_mem_pair_operand): Change to > > a special predicate derived from aarch64_mem_pair_operator. > > Thanks, looks great. OK for trunk with a couple of trivial changes: Thanks a lot for the review, but I posted a v3 which has further changes needed when rebasing on top of SME (as e.g. the SME code directly forms the PARALLEL pair representation and invokes the old named patterns): https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639767.html is the v3 OK with those changes? Thanks, Alex > > > [...] > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 8faaa748a05..c2e4f531254 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -6680,59 +6680,87 @@ aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment, > > } > > } > > > > -/* Generate and return a store pair instruction of mode MODE to store > > - register REG1 to MEM1 and register REG2 to MEM2. */ > > +/* Given an ldp/stp register operand mode MODE, return a suitable mode to use > > + for a mem rtx representing the entire pair. */ > > > > -static rtx > > -aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, > > - rtx reg2) > > -{ > > - switch (mode) > > - { > > - case E_DImode: > > - return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2); > > +static machine_mode > > +aarch64_pair_mode_for_mode (machine_mode mode) > > +{ > > + if (known_eq (GET_MODE_SIZE (mode), 4)) > > + return V2x4QImode; > > + else if (known_eq (GET_MODE_SIZE (mode), 8)) > > + return V2x8QImode; > > + else if (known_eq (GET_MODE_SIZE (mode), 16)) > > + return V2x16QImode; > > + else > > + gcc_unreachable (); > > +} > > > > - case E_DFmode: > > - return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2); > > +/* Given a base mem MEM with a mode suitable for an ldp/stp register operand, > > + return an rtx like MEM which instead represents the entire pair. */ > > Might be worth saying "mode and address suitable...", given the assert > for aarch64_mem_pair_lanes_operand. > > > > > - case E_TFmode: > > - return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); > > +static rtx > > +aarch64_pair_mem_from_base (rtx mem) > > +{ > > + auto pair_mode = aarch64_pair_mode_for_mode (GET_MODE (mem)); > > + mem = adjust_bitfield_address_nv (mem, pair_mode, 0); > > + gcc_assert (aarch64_mem_pair_lanes_operand (mem, pair_mode)); > > + return mem; > > +} > > > > - case E_V4SImode: > > - return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); > > +/* Generate and return a store pair instruction to store REG1 and REG2 > > + into memory starting at BASE_MEM. All three rtxes should have modes of the > > + same size. */ > > > > - case E_V16QImode: > > - return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); > > +rtx > > +aarch64_gen_store_pair (rtx base_mem, rtx reg1, rtx reg2) > > +{ > > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); > > > > - default: > > - gcc_unreachable (); > > - } > > + return gen_rtx_SET (pair_mem, > > + gen_rtx_UNSPEC (GET_MODE (pair_mem), > > + gen_rtvec (2, reg1, reg2), > > + UNSPEC_STP)); > > } > > > > -/* Generate and regurn a load pair isntruction of mode MODE to load register > > - REG1 from MEM1 and register REG2 from MEM2. */ > > +/* Generate and return a load pair instruction to load a pair of > > + registers starting at BASE_MEM into REG1 and REG2. If CODE is > > + UNKNOWN, all three rtxes should have modes of the same size. > > + Otherwise, CODE is {SIGN,ZERO}_EXTEND, base_mem should be in SImode, > > + and REG{1,2} should be in DImode. */ > > > > -static rtx > > -aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, > > - rtx mem2) > > +rtx > > +aarch64_gen_load_pair (rtx reg1, rtx reg2, rtx base_mem, enum rtx_code code) > > { > > - switch (mode) > > - { > > - case E_DImode: > > - return gen_load_pair_dw_didi (reg1, mem1, reg2, mem2); > > - > > - case E_DFmode: > > - return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2); > > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); > > > > - case E_TFmode: > > - return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); > > + const bool any_extend_p = (code == ZERO_EXTEND || code == SIGN_EXTEND); > > + if (any_extend_p) > > + { > > + gcc_checking_assert (GET_MODE (base_mem) == SImode > > + && GET_MODE (reg1) == DImode > > + && GET_MODE (reg2) == DImode); > > + } > > No need for the braces. > > > + else > > + gcc_assert (code == UNKNOWN); > > + > > + rtx unspecs[2] = { > > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg1), > > + gen_rtvec (1, pair_mem), > > + UNSPEC_LDP_FST), > > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg2), > > + gen_rtvec (1, copy_rtx (pair_mem)), > > + UNSPEC_LDP_SND) > > + }; > > > > - case E_V4SImode: > > - return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); > > + if (any_extend_p) > > + for (int i = 0; i < 2; i++) > > + unspecs[i] = gen_rtx_fmt_e (code, DImode, unspecs[i]); > > > > - default: > > - gcc_unreachable (); > > - } > > + return gen_rtx_PARALLEL (VOIDmode, > > + gen_rtvec (2, > > + gen_rtx_SET (reg1, unspecs[0]), > > + gen_rtx_SET (reg2, unspecs[1]))); > > } > > > > /* Return TRUE if return address signing should be enabled for the current > > [...] > > @@ -453,6 +456,11 @@ (define_attr "predicated" "yes,no" (const_string "no")) > > ;; may chose to hold the tracking state encoded in SP. > > (define_attr "speculation_barrier" "true,false" (const_string "false")) > > > > +;; Attribute use to identify load pair and store pair instructions. > > used > > > +;; Currently the attribute is only applied to the non-writeback ldp/stp > > +;; patterns. > > +(define_attr "ldpstp" "ldp,stp,none" (const_string "none")) > > + > > ;; ------------------------------------------------------------------- > > ;; Pipeline descriptions and scheduling > > ;; -------------------------------------------------------------------
Alex Coplan <alex.coplan@arm.com> writes: > On 12/12/2023 15:58, Richard Sandiford wrote: >> Alex Coplan <alex.coplan@arm.com> writes: >> > Hi, >> > >> > This is a v2 version which addresses feedback from Richard's review >> > here: >> > >> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html >> > >> > I'll reply inline to address specific comments. >> > >> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? >> > >> > Thanks, >> > Alex >> > >> > -- >8 -- >> > >> > This patch overhauls the load/store pair patterns with two main goals: >> > >> > 1. Fixing a correctness issue (the current patterns are not RA-friendly). >> > 2. Allowing more flexibility in which operand modes are supported, and which >> > combinations of modes are allowed in the two arms of the load/store pair, >> > while reducing the number of patterns required both in the source and in >> > the generated code. >> > >> > The correctness issue (1) is due to the fact that the current patterns have >> > two independent memory operands tied together only by a predicate on the insns. >> > Since LRA only looks at the constraints, one of the memory operands can get >> > reloaded without the other one being changed, leading to the insn becoming >> > unrecognizable after reload. >> > >> > We fix this issue by changing the patterns such that they only ever have one >> > memory operand representing the entire pair. For the store case, we use an >> > unspec to logically concatenate the register operands before storing them. >> > For the load case, we use unspecs to extract the "lanes" from the pair mem, >> > with the second occurrence of the mem matched using a match_dup (such that there >> > is still really only one memory operand as far as the RA is concerned). >> > >> > In terms of the modes used for the pair memory operands, we canonicalize >> > these to V2x4QImode, V2x8QImode, and V2x16QImode. These modes have not >> > only the correct size but also correct alignment requirement for a >> > memory operand representing an entire load/store pair. Unlike the other >> > two, V2x4QImode didn't previously exist, so had to be added with the >> > patch. >> > >> > As with the previous patch generalizing the writeback patterns, this >> > patch aims to be flexible in the combinations of modes supported by the >> > patterns without requiring a large number of generated patterns by using >> > distinct mode iterators. >> > >> > The new scheme means we only need a single (generated) pattern for each >> > load/store operation of a given operand size. For the 4-byte and 8-byte >> > operand cases, we use the GPI iterator to synthesize the two patterns. >> > The 16-byte case is implemented as a separate pattern in the source (due >> > to only having a single possible alternative). >> > >> > Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code, >> > we add REG_CFA_OFFSET notes to the store pair insns emitted by >> > aarch64_save_callee_saves, so that correct CFI information can still be >> > generated. Furthermore, we now unconditionally generate these CFA >> > notes on frame-related insns emitted by aarch64_save_callee_saves. >> > This is done in case that the load/store pair pass forms these into >> > pairs, in which case the CFA notes would be needed. >> > >> > We also adjust the ldp/stp peepholes to generate the new form. This is >> > done by switching the generation to use the >> > aarch64_gen_{load,store}_pair interface, making it easier to change the >> > form in the future if needed. (Likewise, the upcoming aarch64 >> > load/store pair pass also makes use of this interface). >> > >> > This patch also adds an "ldpstp" attribute to the non-writeback >> > load/store pair patterns, which is used by the post-RA load/store pair >> > pass to identify existing patterns and see if they can be promoted to >> > writeback variants. >> > >> > One potential concern with using unspecs for the patterns is that it can block >> > optimization by the generic RTL passes. This patch series tries to mitigate >> > this in two ways: >> > 1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline. >> > 2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion to >> > emit individual loads/stores instead of ldp/stp. These should then be >> > formed back into load/store pairs much later in the RTL pipeline by the >> > new load/store pair pass. >> > >> > gcc/ChangeLog: >> > >> > * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp >> > representation from peepholes, allowing use of new form. >> > * config/aarch64/aarch64-modes.def (V2x4QImode): Define. >> > * config/aarch64/aarch64-protos.h >> > (aarch64_finish_ldpstp_peephole): Declare. >> > (aarch64_swap_ldrstr_operands): Delete declaration. >> > (aarch64_gen_load_pair): Adjust parameters. >> > (aarch64_gen_store_pair): Likewise. >> > * config/aarch64/aarch64-simd.md (load_pair<DREG:mode><DREG2:mode>): >> > Delete. >> > (vec_store_pair<DREG:mode><DREG2:mode>): Delete. >> > (load_pair<VQ:mode><VQ2:mode>): Delete. >> > (vec_store_pair<VQ:mode><VQ2:mode>): Delete. >> > * config/aarch64/aarch64.cc (aarch64_pair_mode_for_mode): New. >> > (aarch64_gen_store_pair): Adjust to use new unspec form of stp. >> > Drop second mem from parameters. >> > (aarch64_gen_load_pair): Likewise. >> > (aarch64_pair_mem_from_base): New. >> > (aarch64_save_callee_saves): Emit REG_CFA_OFFSET notes for >> > frame-related saves. Adjust call to aarch64_gen_store_pair >> > (aarch64_restore_callee_saves): Adjust calls to >> > aarch64_gen_load_pair to account for change in interface. >> > (aarch64_process_components): Likewise. >> > (aarch64_classify_address): Handle 32-byte pair mems in >> > LDP_STP_N case. >> > (aarch64_print_operand): Likewise. >> > (aarch64_copy_one_block_and_progress_pointers): Adjust calls to >> > account for change in aarch64_gen_{load,store}_pair interface. >> > (aarch64_set_one_block_and_progress_pointer): Likewise. >> > (aarch64_finish_ldpstp_peephole): New. >> > (aarch64_gen_adjusted_ldpstp): Adjust to use generation helper. >> > * config/aarch64/aarch64.md (ldpstp): New attribute. >> > (load_pair_sw_<SX:mode><SX2:mode>): Delete. >> > (load_pair_dw_<DX:mode><DX2:mode>): Delete. >> > (load_pair_dw_<TX:mode><TX2:mode>): Delete. >> > (*load_pair_<ldst_sz>): New. >> > (*load_pair_16): New. >> > (store_pair_sw_<SX:mode><SX2:mode>): Delete. >> > (store_pair_dw_<DX:mode><DX2:mode>): Delete. >> > (store_pair_dw_<TX:mode><TX2:mode>): Delete. >> > (*store_pair_<ldst_sz>): New. >> > (*store_pair_16): New. >> > (*load_pair_extendsidi2_aarch64): Adjust to use new form. >> > (*zero_extendsidi2_aarch64): Likewise. >> > * config/aarch64/iterators.md (VPAIR): New. >> > * config/aarch64/predicates.md (aarch64_mem_pair_operand): Change to >> > a special predicate derived from aarch64_mem_pair_operator. >> >> Thanks, looks great. OK for trunk with a couple of trivial changes: > > Thanks a lot for the review, but I posted a v3 which has further changes > needed when rebasing on top of SME (as e.g. the SME code directly forms > the PARALLEL pair representation and invokes the old named patterns): > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639767.html Yeah, sorry, only saw that there was a v3 after sending the review. > is the v3 OK with those changes? Yes, thanks. Makes for a nice simplification in aarch64_init_tpidr2_block. Richard > Thanks, > Alex > >> >> > [...] >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> > index 8faaa748a05..c2e4f531254 100644 >> > --- a/gcc/config/aarch64/aarch64.cc >> > +++ b/gcc/config/aarch64/aarch64.cc >> > @@ -6680,59 +6680,87 @@ aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment, >> > } >> > } >> > >> > -/* Generate and return a store pair instruction of mode MODE to store >> > - register REG1 to MEM1 and register REG2 to MEM2. */ >> > +/* Given an ldp/stp register operand mode MODE, return a suitable mode to use >> > + for a mem rtx representing the entire pair. */ >> > >> > -static rtx >> > -aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, >> > - rtx reg2) >> > -{ >> > - switch (mode) >> > - { >> > - case E_DImode: >> > - return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2); >> > +static machine_mode >> > +aarch64_pair_mode_for_mode (machine_mode mode) >> > +{ >> > + if (known_eq (GET_MODE_SIZE (mode), 4)) >> > + return V2x4QImode; >> > + else if (known_eq (GET_MODE_SIZE (mode), 8)) >> > + return V2x8QImode; >> > + else if (known_eq (GET_MODE_SIZE (mode), 16)) >> > + return V2x16QImode; >> > + else >> > + gcc_unreachable (); >> > +} >> > >> > - case E_DFmode: >> > - return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2); >> > +/* Given a base mem MEM with a mode suitable for an ldp/stp register operand, >> > + return an rtx like MEM which instead represents the entire pair. */ >> >> Might be worth saying "mode and address suitable...", given the assert >> for aarch64_mem_pair_lanes_operand. >> >> > >> > - case E_TFmode: >> > - return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); >> > +static rtx >> > +aarch64_pair_mem_from_base (rtx mem) >> > +{ >> > + auto pair_mode = aarch64_pair_mode_for_mode (GET_MODE (mem)); >> > + mem = adjust_bitfield_address_nv (mem, pair_mode, 0); >> > + gcc_assert (aarch64_mem_pair_lanes_operand (mem, pair_mode)); >> > + return mem; >> > +} >> > >> > - case E_V4SImode: >> > - return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); >> > +/* Generate and return a store pair instruction to store REG1 and REG2 >> > + into memory starting at BASE_MEM. All three rtxes should have modes of the >> > + same size. */ >> > >> > - case E_V16QImode: >> > - return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); >> > +rtx >> > +aarch64_gen_store_pair (rtx base_mem, rtx reg1, rtx reg2) >> > +{ >> > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); >> > >> > - default: >> > - gcc_unreachable (); >> > - } >> > + return gen_rtx_SET (pair_mem, >> > + gen_rtx_UNSPEC (GET_MODE (pair_mem), >> > + gen_rtvec (2, reg1, reg2), >> > + UNSPEC_STP)); >> > } >> > >> > -/* Generate and regurn a load pair isntruction of mode MODE to load register >> > - REG1 from MEM1 and register REG2 from MEM2. */ >> > +/* Generate and return a load pair instruction to load a pair of >> > + registers starting at BASE_MEM into REG1 and REG2. If CODE is >> > + UNKNOWN, all three rtxes should have modes of the same size. >> > + Otherwise, CODE is {SIGN,ZERO}_EXTEND, base_mem should be in SImode, >> > + and REG{1,2} should be in DImode. */ >> > >> > -static rtx >> > -aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, >> > - rtx mem2) >> > +rtx >> > +aarch64_gen_load_pair (rtx reg1, rtx reg2, rtx base_mem, enum rtx_code code) >> > { >> > - switch (mode) >> > - { >> > - case E_DImode: >> > - return gen_load_pair_dw_didi (reg1, mem1, reg2, mem2); >> > - >> > - case E_DFmode: >> > - return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2); >> > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); >> > >> > - case E_TFmode: >> > - return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); >> > + const bool any_extend_p = (code == ZERO_EXTEND || code == SIGN_EXTEND); >> > + if (any_extend_p) >> > + { >> > + gcc_checking_assert (GET_MODE (base_mem) == SImode >> > + && GET_MODE (reg1) == DImode >> > + && GET_MODE (reg2) == DImode); >> > + } >> >> No need for the braces. >> >> > + else >> > + gcc_assert (code == UNKNOWN); >> > + >> > + rtx unspecs[2] = { >> > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg1), >> > + gen_rtvec (1, pair_mem), >> > + UNSPEC_LDP_FST), >> > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg2), >> > + gen_rtvec (1, copy_rtx (pair_mem)), >> > + UNSPEC_LDP_SND) >> > + }; >> > >> > - case E_V4SImode: >> > - return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); >> > + if (any_extend_p) >> > + for (int i = 0; i < 2; i++) >> > + unspecs[i] = gen_rtx_fmt_e (code, DImode, unspecs[i]); >> > >> > - default: >> > - gcc_unreachable (); >> > - } >> > + return gen_rtx_PARALLEL (VOIDmode, >> > + gen_rtvec (2, >> > + gen_rtx_SET (reg1, unspecs[0]), >> > + gen_rtx_SET (reg2, unspecs[1]))); >> > } >> > >> > /* Return TRUE if return address signing should be enabled for the current >> > [...] >> > @@ -453,6 +456,11 @@ (define_attr "predicated" "yes,no" (const_string "no")) >> > ;; may chose to hold the tracking state encoded in SP. >> > (define_attr "speculation_barrier" "true,false" (const_string "false")) >> > >> > +;; Attribute use to identify load pair and store pair instructions. >> >> used >> >> > +;; Currently the attribute is only applied to the non-writeback ldp/stp >> > +;; patterns. >> > +(define_attr "ldpstp" "ldp,stp,none" (const_string "none")) >> > + >> > ;; ------------------------------------------------------------------- >> > ;; Pipeline descriptions and scheduling >> > ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md index 1ee7c73ff0c..dc39af85254 100644 --- a/gcc/config/aarch64/aarch64-ldpstp.md +++ b/gcc/config/aarch64/aarch64-ldpstp.md @@ -24,10 +24,10 @@ (define_peephole2 (set (match_operand:GPI 2 "register_operand" "") (match_operand:GPI 3 "memory_operand" ""))] "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, true); + aarch64_finish_ldpstp_peephole (operands, true); + DONE; }) (define_peephole2 @@ -36,10 +36,10 @@ (define_peephole2 (set (match_operand:GPI 2 "memory_operand" "") (match_operand:GPI 3 "aarch64_reg_or_zero" ""))] "aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, false); + aarch64_finish_ldpstp_peephole (operands, false); + DONE; }) (define_peephole2 @@ -48,10 +48,10 @@ (define_peephole2 (set (match_operand:GPF 2 "register_operand" "") (match_operand:GPF 3 "memory_operand" ""))] "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, true); + aarch64_finish_ldpstp_peephole (operands, true); + DONE; }) (define_peephole2 @@ -60,10 +60,10 @@ (define_peephole2 (set (match_operand:GPF 2 "memory_operand" "") (match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))] "aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, false); + aarch64_finish_ldpstp_peephole (operands, false); + DONE; }) (define_peephole2 @@ -72,10 +72,10 @@ (define_peephole2 (set (match_operand:DREG2 2 "register_operand" "") (match_operand:DREG2 3 "memory_operand" ""))] "aarch64_operands_ok_for_ldpstp (operands, true, <DREG:MODE>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, true); + aarch64_finish_ldpstp_peephole (operands, true); + DONE; }) (define_peephole2 @@ -84,10 +84,10 @@ (define_peephole2 (set (match_operand:DREG2 2 "memory_operand" "") (match_operand:DREG2 3 "register_operand" ""))] "aarch64_operands_ok_for_ldpstp (operands, false, <DREG:MODE>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, false); + aarch64_finish_ldpstp_peephole (operands, false); + DONE; }) (define_peephole2 @@ -99,10 +99,10 @@ (define_peephole2 && aarch64_operands_ok_for_ldpstp (operands, true, <VQ:MODE>mode) && (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, true); + aarch64_finish_ldpstp_peephole (operands, true); + DONE; }) (define_peephole2 @@ -114,10 +114,10 @@ (define_peephole2 && aarch64_operands_ok_for_ldpstp (operands, false, <VQ:MODE>mode) && (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, false); + aarch64_finish_ldpstp_peephole (operands, false); + DONE; }) @@ -129,10 +129,10 @@ (define_peephole2 (set (match_operand:DI 2 "register_operand" "") (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))] "aarch64_operands_ok_for_ldpstp (operands, true, SImode)" - [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1))) - (set (match_dup 2) (sign_extend:DI (match_dup 3)))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, true); + aarch64_finish_ldpstp_peephole (operands, true, SIGN_EXTEND); + DONE; }) (define_peephole2 @@ -141,10 +141,10 @@ (define_peephole2 (set (match_operand:DI 2 "register_operand" "") (zero_extend:DI (match_operand:SI 3 "memory_operand" "")))] "aarch64_operands_ok_for_ldpstp (operands, true, SImode)" - [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1))) - (set (match_dup 2) (zero_extend:DI (match_dup 3)))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, true); + aarch64_finish_ldpstp_peephole (operands, true, ZERO_EXTEND); + DONE; }) ;; Handle storing of a floating point zero with integer data. @@ -163,10 +163,10 @@ (define_peephole2 (set (match_operand:<FCVT_TARGET> 2 "memory_operand" "") (match_operand:<FCVT_TARGET> 3 "aarch64_reg_zero_or_fp_zero" ""))] "aarch64_operands_ok_for_ldpstp (operands, false, <V_INT_EQUIV>mode)" - [(parallel [(set (match_dup 0) (match_dup 1)) - (set (match_dup 2) (match_dup 3))])] + [(const_int 0)] { - aarch64_swap_ldrstr_operands (operands, false); + aarch64_finish_ldpstp_peephole (operands, false); + DONE; }) ;; Handle consecutive load/store whose offset is out of the range diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 6b4f4e17dd5..1e0d770f72f 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -93,9 +93,13 @@ INT_MODE (XI, 64); /* V8DI mode. */ VECTOR_MODE_WITH_PREFIX (V, INT, DI, 8, 5); - ADJUST_ALIGNMENT (V8DI, 8); +/* V2x4QImode. Used in load/store pair patterns. */ +VECTOR_MODE_WITH_PREFIX (V2x, INT, QI, 4, 5); +ADJUST_NUNITS (V2x4QI, 8); +ADJUST_ALIGNMENT (V2x4QI, 4); + /* Define Advanced SIMD modes for structures of 2, 3 and 4 d-registers. */ #define ADV_SIMD_D_REG_STRUCT_MODES(NVECS, VB, VH, VS, VD) \ VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 8, 3); \ diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 376b4984be6..72362bfb4e3 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -967,6 +967,8 @@ void aarch64_split_compare_and_swap (rtx op[]); void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); bool aarch64_gen_adjusted_ldpstp (rtx *, bool, machine_mode, RTX_CODE); +void aarch64_finish_ldpstp_peephole (rtx *, bool, + enum rtx_code = (enum rtx_code)0); void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx); bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool); @@ -1022,8 +1024,9 @@ bool aarch64_mergeable_load_pair_p (machine_mode, rtx, rtx); bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode); bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode); bool aarch64_mem_ok_with_ldpstp_policy_model (rtx, bool, machine_mode); -void aarch64_swap_ldrstr_operands (rtx *, bool); bool aarch64_ldpstp_operand_mode_p (machine_mode); +rtx aarch64_gen_load_pair (rtx, rtx, rtx, enum rtx_code = (enum rtx_code)0); +rtx aarch64_gen_store_pair (rtx, rtx, rtx); extern void aarch64_asm_output_pool_epilogue (FILE *, const char *, tree, HOST_WIDE_INT); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ad79a8110a5..9736421433f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -231,38 +231,6 @@ (define_insn "aarch64_store_lane0<mode>" [(set_attr "type" "neon_store1_1reg<q>")] ) -(define_insn "load_pair<DREG:mode><DREG2:mode>" - [(set (match_operand:DREG 0 "register_operand") - (match_operand:DREG 1 "aarch64_mem_pair_operand")) - (set (match_operand:DREG2 2 "register_operand") - (match_operand:DREG2 3 "memory_operand"))] - "TARGET_FLOAT - && rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (<DREG:MODE>mode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type ] - [ w , Ump , w , m ; neon_ldp ] ldp\t%d0, %d2, %z1 - [ r , Ump , r , m ; load_16 ] ldp\t%x0, %x2, %z1 - } -) - -(define_insn "vec_store_pair<DREG:mode><DREG2:mode>" - [(set (match_operand:DREG 0 "aarch64_mem_pair_operand") - (match_operand:DREG 1 "register_operand")) - (set (match_operand:DREG2 2 "memory_operand") - (match_operand:DREG2 3 "register_operand"))] - "TARGET_FLOAT - && rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (<DREG:MODE>mode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type ] - [ Ump , w , m , w ; neon_stp ] stp\t%d1, %d3, %z0 - [ Ump , r , m , r ; store_16 ] stp\t%x1, %x3, %z0 - } -) - (define_insn "aarch64_simd_stp<mode>" [(set (match_operand:VP_2E 0 "aarch64_mem_pair_lanes_operand") (vec_duplicate:VP_2E (match_operand:<VEL> 1 "register_operand")))] @@ -273,34 +241,6 @@ (define_insn "aarch64_simd_stp<mode>" } ) -(define_insn "load_pair<VQ:mode><VQ2:mode>" - [(set (match_operand:VQ 0 "register_operand" "=w") - (match_operand:VQ 1 "aarch64_mem_pair_operand" "Ump")) - (set (match_operand:VQ2 2 "register_operand" "=w") - (match_operand:VQ2 3 "memory_operand" "m"))] - "TARGET_FLOAT - && rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (<VQ:MODE>mode)))" - "ldp\\t%q0, %q2, %z1" - [(set_attr "type" "neon_ldp_q")] -) - -(define_insn "vec_store_pair<VQ:mode><VQ2:mode>" - [(set (match_operand:VQ 0 "aarch64_mem_pair_operand" "=Ump") - (match_operand:VQ 1 "register_operand" "w")) - (set (match_operand:VQ2 2 "memory_operand" "=m") - (match_operand:VQ2 3 "register_operand" "w"))] - "TARGET_FLOAT - && rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (<VQ:MODE>mode)))" - "stp\\t%q1, %q3, %z0" - [(set_attr "type" "neon_stp_q")] -) - (define_expand "@aarch64_split_simd_mov<mode>" [(set (match_operand:VQMOV 0) (match_operand:VQMOV 1))] diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 8faaa748a05..c2e4f531254 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -6680,59 +6680,87 @@ aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment, } } -/* Generate and return a store pair instruction of mode MODE to store - register REG1 to MEM1 and register REG2 to MEM2. */ +/* Given an ldp/stp register operand mode MODE, return a suitable mode to use + for a mem rtx representing the entire pair. */ -static rtx -aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, - rtx reg2) -{ - switch (mode) - { - case E_DImode: - return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2); +static machine_mode +aarch64_pair_mode_for_mode (machine_mode mode) +{ + if (known_eq (GET_MODE_SIZE (mode), 4)) + return V2x4QImode; + else if (known_eq (GET_MODE_SIZE (mode), 8)) + return V2x8QImode; + else if (known_eq (GET_MODE_SIZE (mode), 16)) + return V2x16QImode; + else + gcc_unreachable (); +} - case E_DFmode: - return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2); +/* Given a base mem MEM with a mode suitable for an ldp/stp register operand, + return an rtx like MEM which instead represents the entire pair. */ - case E_TFmode: - return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); +static rtx +aarch64_pair_mem_from_base (rtx mem) +{ + auto pair_mode = aarch64_pair_mode_for_mode (GET_MODE (mem)); + mem = adjust_bitfield_address_nv (mem, pair_mode, 0); + gcc_assert (aarch64_mem_pair_lanes_operand (mem, pair_mode)); + return mem; +} - case E_V4SImode: - return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); +/* Generate and return a store pair instruction to store REG1 and REG2 + into memory starting at BASE_MEM. All three rtxes should have modes of the + same size. */ - case E_V16QImode: - return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); +rtx +aarch64_gen_store_pair (rtx base_mem, rtx reg1, rtx reg2) +{ + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); - default: - gcc_unreachable (); - } + return gen_rtx_SET (pair_mem, + gen_rtx_UNSPEC (GET_MODE (pair_mem), + gen_rtvec (2, reg1, reg2), + UNSPEC_STP)); } -/* Generate and regurn a load pair isntruction of mode MODE to load register - REG1 from MEM1 and register REG2 from MEM2. */ +/* Generate and return a load pair instruction to load a pair of + registers starting at BASE_MEM into REG1 and REG2. If CODE is + UNKNOWN, all three rtxes should have modes of the same size. + Otherwise, CODE is {SIGN,ZERO}_EXTEND, base_mem should be in SImode, + and REG{1,2} should be in DImode. */ -static rtx -aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, - rtx mem2) +rtx +aarch64_gen_load_pair (rtx reg1, rtx reg2, rtx base_mem, enum rtx_code code) { - switch (mode) - { - case E_DImode: - return gen_load_pair_dw_didi (reg1, mem1, reg2, mem2); - - case E_DFmode: - return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2); + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); - case E_TFmode: - return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); + const bool any_extend_p = (code == ZERO_EXTEND || code == SIGN_EXTEND); + if (any_extend_p) + { + gcc_checking_assert (GET_MODE (base_mem) == SImode + && GET_MODE (reg1) == DImode + && GET_MODE (reg2) == DImode); + } + else + gcc_assert (code == UNKNOWN); + + rtx unspecs[2] = { + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg1), + gen_rtvec (1, pair_mem), + UNSPEC_LDP_FST), + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg2), + gen_rtvec (1, copy_rtx (pair_mem)), + UNSPEC_LDP_SND) + }; - case E_V4SImode: - return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); + if (any_extend_p) + for (int i = 0; i < 2; i++) + unspecs[i] = gen_rtx_fmt_e (code, DImode, unspecs[i]); - default: - gcc_unreachable (); - } + return gen_rtx_PARALLEL (VOIDmode, + gen_rtvec (2, + gen_rtx_SET (reg1, unspecs[0]), + gen_rtx_SET (reg2, unspecs[1]))); } /* Return TRUE if return address signing should be enabled for the current @@ -6909,7 +6937,7 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, rtx reg = gen_rtx_REG (mode, regno); offset = frame.reg_offset[regno] - bytes_below_sp; rtx base_rtx = stack_pointer_rtx; - poly_int64 sp_offset = offset; + poly_int64 cfa_offset = offset; HOST_WIDE_INT const_offset; if (mode == VNx2DImode && BYTES_BIG_ENDIAN) @@ -6934,8 +6962,17 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, offset -= fp_offset; } rtx mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); - bool need_cfa_note_p = (base_rtx != stack_pointer_rtx); + rtx cfa_base = stack_pointer_rtx; + if (hard_fp_valid_p && frame_pointer_needed) + { + cfa_base = hard_frame_pointer_rtx; + cfa_offset += (bytes_below_sp - frame.bytes_below_hard_fp); + } + + rtx cfa_mem = gen_frame_mem (mode, + plus_constant (Pmode, + cfa_base, cfa_offset)); unsigned int regno2; if (!aarch64_sve_mode_p (mode) && i + 1 < regs.size () @@ -6944,42 +6981,35 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, frame.reg_offset[regno2] - frame.reg_offset[regno])) { rtx reg2 = gen_rtx_REG (mode, regno2); - rtx mem2; offset += GET_MODE_SIZE (mode); - mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); - insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2, - reg2)); - - /* The first part of a frame-related parallel insn is - always assumed to be relevant to the frame - calculations; subsequent parts, are only - frame-related if explicitly marked. */ + insn = emit_insn (aarch64_gen_store_pair (mem, reg, reg2)); + if (aarch64_emit_cfi_for_reg_p (regno2)) { - if (need_cfa_note_p) - aarch64_add_cfa_expression (insn, reg2, stack_pointer_rtx, - sp_offset + GET_MODE_SIZE (mode)); - else - RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1; + const auto off = cfa_offset + GET_MODE_SIZE (mode); + rtx cfa_mem2 = gen_frame_mem (mode, + plus_constant (Pmode, + cfa_base, + off)); + add_reg_note (insn, REG_CFA_OFFSET, + gen_rtx_SET (cfa_mem2, reg2)); } regno = regno2; ++i; } else if (mode == VNx2DImode && BYTES_BIG_ENDIAN) - { - insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg)); - need_cfa_note_p = true; - } + insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg)); else if (aarch64_sve_mode_p (mode)) insn = emit_insn (gen_rtx_SET (mem, reg)); else insn = emit_move_insn (mem, reg); RTX_FRAME_RELATED_P (insn) = frame_related_p; - if (frame_related_p && need_cfa_note_p) - aarch64_add_cfa_expression (insn, reg, stack_pointer_rtx, sp_offset); + + if (frame_related_p) + add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (cfa_mem, reg)); } } @@ -7038,12 +7068,7 @@ aarch64_restore_callee_saves (poly_int64 bytes_below_sp, frame.reg_offset[regno2] - frame.reg_offset[regno])) { rtx reg2 = gen_rtx_REG (mode, regno2); - rtx mem2; - - offset += GET_MODE_SIZE (mode); - mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); - emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2)); - + emit_insn (aarch64_gen_load_pair (reg, reg2, mem)); *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg2, *cfi_ops); regno = regno2; ++i; @@ -7375,9 +7400,9 @@ aarch64_process_components (sbitmap components, bool prologue_p) : gen_rtx_SET (reg2, mem2); if (prologue_p) - insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2, reg2)); + insn = emit_insn (aarch64_gen_store_pair (mem, reg, reg2)); else - insn = emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2)); + insn = emit_insn (aarch64_gen_load_pair (reg, reg2, mem)); if (frame_related_p || frame_related2_p) { @@ -8573,12 +8598,18 @@ aarch64_classify_address (struct aarch64_address_info *info, mode of the corresponding addressing mode is half of that. */ if (type == ADDR_QUERY_LDP_STP_N) { - if (known_eq (GET_MODE_SIZE (mode), 16)) + if (known_eq (GET_MODE_SIZE (mode), 32)) + mode = V16QImode; + else if (known_eq (GET_MODE_SIZE (mode), 16)) mode = DFmode; else if (known_eq (GET_MODE_SIZE (mode), 8)) mode = SFmode; else return false; + + /* This isn't really an Advanced SIMD struct mode, but a mode + used to represent the complete mem in a load/store pair. */ + advsimd_struct_p = false; } bool allow_reg_index_p = (!load_store_pair_p @@ -10207,7 +10238,8 @@ aarch64_print_operand (FILE *f, rtx x, int code) if (!MEM_P (x) || (code == 'y' && maybe_ne (GET_MODE_SIZE (mode), 8) - && maybe_ne (GET_MODE_SIZE (mode), 16))) + && maybe_ne (GET_MODE_SIZE (mode), 16) + && maybe_ne (GET_MODE_SIZE (mode), 32))) { output_operand_lossage ("invalid operand for '%%%c'", code); return; @@ -23082,10 +23114,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, *src = adjust_address (*src, mode, 0); *dst = adjust_address (*dst, mode, 0); /* Emit the memcpy. */ - emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2, - aarch64_progress_pointer (*src))); - emit_insn (aarch64_gen_store_pair (mode, *dst, reg1, - aarch64_progress_pointer (*dst), reg2)); + emit_insn (aarch64_gen_load_pair (reg1, reg2, *src)); + emit_insn (aarch64_gen_store_pair (*dst, reg1, reg2)); /* Move the pointers forward. */ *src = aarch64_move_pointer (*src, 32); *dst = aarch64_move_pointer (*dst, 32); @@ -23260,8 +23290,7 @@ aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst, /* "Cast" the *dst to the correct mode. */ *dst = adjust_address (*dst, mode, 0); /* Emit the memset. */ - emit_insn (aarch64_gen_store_pair (mode, *dst, src, - aarch64_progress_pointer (*dst), src)); + emit_insn (aarch64_gen_store_pair (*dst, src, src)); /* Move the pointers forward. */ *dst = aarch64_move_pointer (*dst, 32); @@ -24458,6 +24487,29 @@ aarch64_swap_ldrstr_operands (rtx* operands, bool load) } } +/* Helper function used for generation of load/store pair instructions, called + from peepholes in aarch64-ldpstp.md. OPERANDS is an array of + operands as matched by the peepholes in that file. LOAD_P is true if we're + generating a load pair, otherwise we're generating a store pair. CODE is + either {ZERO,SIGN}_EXTEND for extending loads or UNKNOWN if we're generating a + standard load/store pair. */ + +void +aarch64_finish_ldpstp_peephole (rtx *operands, bool load_p, enum rtx_code code) +{ + aarch64_swap_ldrstr_operands (operands, load_p); + + if (load_p) + emit_insn (aarch64_gen_load_pair (operands[0], operands[2], + operands[1], code)); + else + { + gcc_assert (code == UNKNOWN); + emit_insn (aarch64_gen_store_pair (operands[0], operands[1], + operands[3])); + } +} + /* Taking X and Y to be HOST_WIDE_INT pointers, return the result of a comparison between the two. */ int @@ -24639,10 +24691,10 @@ bool aarch64_gen_adjusted_ldpstp (rtx *operands, bool load, machine_mode mode, RTX_CODE code) { - rtx base, offset_1, offset_3, t1, t2; - rtx mem_1, mem_2, mem_3, mem_4; + rtx base, offset_1, offset_2; + rtx mem_1, mem_2; rtx temp_operands[8]; - HOST_WIDE_INT off_val_1, off_val_3, base_off, new_off_1, new_off_3, + HOST_WIDE_INT off_val_1, off_val_2, base_off, new_off_1, new_off_2, stp_off_upper_limit, stp_off_lower_limit, msize; /* We make changes on a copy as we may still bail out. */ @@ -24665,23 +24717,19 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load, if (load) { mem_1 = copy_rtx (temp_operands[1]); - mem_2 = copy_rtx (temp_operands[3]); - mem_3 = copy_rtx (temp_operands[5]); - mem_4 = copy_rtx (temp_operands[7]); + mem_2 = copy_rtx (temp_operands[5]); } else { mem_1 = copy_rtx (temp_operands[0]); - mem_2 = copy_rtx (temp_operands[2]); - mem_3 = copy_rtx (temp_operands[4]); - mem_4 = copy_rtx (temp_operands[6]); + mem_2 = copy_rtx (temp_operands[4]); gcc_assert (code == UNKNOWN); } extract_base_offset_in_addr (mem_1, &base, &offset_1); - extract_base_offset_in_addr (mem_3, &base, &offset_3); + extract_base_offset_in_addr (mem_2, &base, &offset_2); gcc_assert (base != NULL_RTX && offset_1 != NULL_RTX - && offset_3 != NULL_RTX); + && offset_2 != NULL_RTX); /* Adjust offset so it can fit in LDP/STP instruction. */ msize = GET_MODE_SIZE (mode).to_constant(); @@ -24689,11 +24737,11 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load, stp_off_lower_limit = - msize * 0x40; off_val_1 = INTVAL (offset_1); - off_val_3 = INTVAL (offset_3); + off_val_2 = INTVAL (offset_2); /* The base offset is optimally half way between the two STP/LDP offsets. */ if (msize <= 4) - base_off = (off_val_1 + off_val_3) / 2; + base_off = (off_val_1 + off_val_2) / 2; else /* However, due to issues with negative LDP/STP offset generation for larger modes, for DF, DD, DI and vector modes. we must not use negative @@ -24733,73 +24781,58 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load, new_off_1 = off_val_1 - base_off; /* Offset of the second STP/LDP. */ - new_off_3 = off_val_3 - base_off; + new_off_2 = off_val_2 - base_off; /* The offsets must be within the range of the LDP/STP instructions. */ if (new_off_1 > stp_off_upper_limit || new_off_1 < stp_off_lower_limit - || new_off_3 > stp_off_upper_limit || new_off_3 < stp_off_lower_limit) + || new_off_2 > stp_off_upper_limit || new_off_2 < stp_off_lower_limit) return false; replace_equiv_address_nv (mem_1, plus_constant (Pmode, operands[8], new_off_1), true); replace_equiv_address_nv (mem_2, plus_constant (Pmode, operands[8], - new_off_1 + msize), true); - replace_equiv_address_nv (mem_3, plus_constant (Pmode, operands[8], - new_off_3), true); - replace_equiv_address_nv (mem_4, plus_constant (Pmode, operands[8], - new_off_3 + msize), true); + new_off_2), true); if (!aarch64_mem_pair_operand (mem_1, mode) - || !aarch64_mem_pair_operand (mem_3, mode)) + || !aarch64_mem_pair_operand (mem_2, mode)) return false; - if (code == ZERO_EXTEND) - { - mem_1 = gen_rtx_ZERO_EXTEND (DImode, mem_1); - mem_2 = gen_rtx_ZERO_EXTEND (DImode, mem_2); - mem_3 = gen_rtx_ZERO_EXTEND (DImode, mem_3); - mem_4 = gen_rtx_ZERO_EXTEND (DImode, mem_4); - } - else if (code == SIGN_EXTEND) - { - mem_1 = gen_rtx_SIGN_EXTEND (DImode, mem_1); - mem_2 = gen_rtx_SIGN_EXTEND (DImode, mem_2); - mem_3 = gen_rtx_SIGN_EXTEND (DImode, mem_3); - mem_4 = gen_rtx_SIGN_EXTEND (DImode, mem_4); - } - if (load) { operands[0] = temp_operands[0]; operands[1] = mem_1; operands[2] = temp_operands[2]; - operands[3] = mem_2; operands[4] = temp_operands[4]; - operands[5] = mem_3; + operands[5] = mem_2; operands[6] = temp_operands[6]; - operands[7] = mem_4; } else { operands[0] = mem_1; operands[1] = temp_operands[1]; - operands[2] = mem_2; operands[3] = temp_operands[3]; - operands[4] = mem_3; + operands[4] = mem_2; operands[5] = temp_operands[5]; - operands[6] = mem_4; operands[7] = temp_operands[7]; } /* Emit adjusting instruction. */ emit_insn (gen_rtx_SET (operands[8], plus_constant (DImode, base, base_off))); /* Emit ldp/stp instructions. */ - t1 = gen_rtx_SET (operands[0], operands[1]); - t2 = gen_rtx_SET (operands[2], operands[3]); - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); - t1 = gen_rtx_SET (operands[4], operands[5]); - t2 = gen_rtx_SET (operands[6], operands[7]); - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); + if (load) + { + emit_insn (aarch64_gen_load_pair (operands[0], operands[2], + operands[1], code)); + emit_insn (aarch64_gen_load_pair (operands[4], operands[6], + operands[5], code)); + } + else + { + emit_insn (aarch64_gen_store_pair (operands[0], operands[1], + operands[3])); + emit_insn (aarch64_gen_store_pair (operands[4], operands[5], + operands[7])); + } return true; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index bfa2ec80ee7..bed74cfd311 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -175,6 +175,9 @@ (define_c_enum "unspec" [ UNSPEC_GOTSMALLTLS UNSPEC_GOTTINYPIC UNSPEC_GOTTINYTLS + UNSPEC_STP + UNSPEC_LDP_FST + UNSPEC_LDP_SND UNSPEC_LD1 UNSPEC_LD2 UNSPEC_LD2_DREG @@ -453,6 +456,11 @@ (define_attr "predicated" "yes,no" (const_string "no")) ;; may chose to hold the tracking state encoded in SP. (define_attr "speculation_barrier" "true,false" (const_string "false")) +;; Attribute use to identify load pair and store pair instructions. +;; Currently the attribute is only applied to the non-writeback ldp/stp +;; patterns. +(define_attr "ldpstp" "ldp,stp,none" (const_string "none")) + ;; ------------------------------------------------------------------- ;; Pipeline descriptions and scheduling ;; ------------------------------------------------------------------- @@ -1735,100 +1743,62 @@ (define_expand "setmemdi" FAIL; }) -;; Operands 1 and 3 are tied together by the final condition; so we allow -;; fairly lax checking on the second memory operation. -(define_insn "load_pair_sw_<SX:mode><SX2:mode>" - [(set (match_operand:SX 0 "register_operand") - (match_operand:SX 1 "aarch64_mem_pair_operand")) - (set (match_operand:SX2 2 "register_operand") - (match_operand:SX2 3 "memory_operand"))] - "rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (<SX:MODE>mode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] - [ r , Ump , r , m ; load_8 , * ] ldp\t%w0, %w2, %z1 - [ w , Ump , w , m ; neon_load1_2reg , fp ] ldp\t%s0, %s2, %z1 - } -) - -;; Storing different modes that can still be merged -(define_insn "load_pair_dw_<DX:mode><DX2:mode>" - [(set (match_operand:DX 0 "register_operand") - (match_operand:DX 1 "aarch64_mem_pair_operand")) - (set (match_operand:DX2 2 "register_operand") - (match_operand:DX2 3 "memory_operand"))] - "rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (<DX:MODE>mode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] - [ r , Ump , r , m ; load_16 , * ] ldp\t%x0, %x2, %z1 - [ w , Ump , w , m ; neon_load1_2reg , fp ] ldp\t%d0, %d2, %z1 - } -) - -(define_insn "load_pair_dw_<TX:mode><TX2:mode>" - [(set (match_operand:TX 0 "register_operand" "=w") - (match_operand:TX 1 "aarch64_mem_pair_operand" "Ump")) - (set (match_operand:TX2 2 "register_operand" "=w") - (match_operand:TX2 3 "memory_operand" "m"))] - "TARGET_SIMD - && rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (<TX:MODE>mode)))" - "ldp\\t%q0, %q2, %z1" +(define_insn "*load_pair_<ldst_sz>" + [(set (match_operand:GPI 0 "aarch64_ldp_reg_operand") + (unspec [ + (match_operand:<VPAIR> 1 "aarch64_mem_pair_lanes_operand") + ] UNSPEC_LDP_FST)) + (set (match_operand:GPI 2 "aarch64_ldp_reg_operand") + (unspec [ + (match_dup 1) + ] UNSPEC_LDP_SND))] + "" + {@ [cons: =0, 1, =2; attrs: type, arch] + [ r, Umn, r; load_<ldpstp_sz>, * ] ldp\t%<w>0, %<w>2, %y1 + [ w, Umn, w; neon_load1_2reg, fp ] ldp\t%<v>0, %<v>2, %y1 + } + [(set_attr "ldpstp" "ldp")] +) + +(define_insn "*load_pair_16" + [(set (match_operand:TI 0 "aarch64_ldp_reg_operand" "=w") + (unspec [ + (match_operand:V2x16QI 1 "aarch64_mem_pair_lanes_operand" "Umn") + ] UNSPEC_LDP_FST)) + (set (match_operand:TI 2 "aarch64_ldp_reg_operand" "=w") + (unspec [ + (match_dup 1) + ] UNSPEC_LDP_SND))] + "TARGET_FLOAT" + "ldp\\t%q0, %q2, %y1" [(set_attr "type" "neon_ldp_q") - (set_attr "fp" "yes")] -) - -;; Operands 0 and 2 are tied together by the final condition; so we allow -;; fairly lax checking on the second memory operation. -(define_insn "store_pair_sw_<SX:mode><SX2:mode>" - [(set (match_operand:SX 0 "aarch64_mem_pair_operand") - (match_operand:SX 1 "aarch64_reg_zero_or_fp_zero")) - (set (match_operand:SX2 2 "memory_operand") - (match_operand:SX2 3 "aarch64_reg_zero_or_fp_zero"))] - "rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (<SX:MODE>mode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] - [ Ump , rYZ , m , rYZ ; store_8 , * ] stp\t%w1, %w3, %z0 - [ Ump , w , m , w ; neon_store1_2reg , fp ] stp\t%s1, %s3, %z0 - } -) - -;; Storing different modes that can still be merged -(define_insn "store_pair_dw_<DX:mode><DX2:mode>" - [(set (match_operand:DX 0 "aarch64_mem_pair_operand") - (match_operand:DX 1 "aarch64_reg_zero_or_fp_zero")) - (set (match_operand:DX2 2 "memory_operand") - (match_operand:DX2 3 "aarch64_reg_zero_or_fp_zero"))] - "rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (<DX:MODE>mode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] - [ Ump , rYZ , m , rYZ ; store_16 , * ] stp\t%x1, %x3, %z0 - [ Ump , w , m , w ; neon_store1_2reg , fp ] stp\t%d1, %d3, %z0 - } -) - -(define_insn "store_pair_dw_<TX:mode><TX2:mode>" - [(set (match_operand:TX 0 "aarch64_mem_pair_operand" "=Ump") - (match_operand:TX 1 "register_operand" "w")) - (set (match_operand:TX2 2 "memory_operand" "=m") - (match_operand:TX2 3 "register_operand" "w"))] - "TARGET_SIMD && - rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (TFmode)))" - "stp\\t%q1, %q3, %z0" + (set_attr "fp" "yes") + (set_attr "ldpstp" "ldp")] +) + +(define_insn "*store_pair_<ldst_sz>" + [(set (match_operand:<VPAIR> 0 "aarch64_mem_pair_lanes_operand") + (unspec:<VPAIR> + [(match_operand:GPI 1 "aarch64_stp_reg_operand") + (match_operand:GPI 2 "aarch64_stp_reg_operand")] UNSPEC_STP))] + "" + {@ [cons: =0, 1, 2; attrs: type , arch] + [ Umn, rYZ, rYZ; store_<ldpstp_sz>, * ] stp\t%<w>1, %<w>2, %y0 + [ Umn, w, w; neon_store1_2reg , fp ] stp\t%<v>1, %<v>2, %y0 + } + [(set_attr "ldpstp" "stp")] +) + +(define_insn "*store_pair_16" + [(set (match_operand:V2x16QI 0 "aarch64_mem_pair_lanes_operand" "=Umn") + (unspec:V2x16QI + [(match_operand:TI 1 "aarch64_ldp_reg_operand" "w") + (match_operand:TI 2 "aarch64_ldp_reg_operand" "w")] UNSPEC_STP))] + "TARGET_FLOAT" + "stp\t%q1, %q2, %y0" [(set_attr "type" "neon_stp_q") - (set_attr "fp" "yes")] + (set_attr "fp" "yes") + (set_attr "ldpstp" "stp")] ) ;; Writeback load/store pair patterns. @@ -2058,14 +2028,15 @@ (define_insn "*extendsidi2_aarch64" (define_insn "*load_pair_extendsidi2_aarch64" [(set (match_operand:DI 0 "register_operand" "=r") - (sign_extend:DI (match_operand:SI 1 "aarch64_mem_pair_operand" "Ump"))) + (sign_extend:DI (unspec:SI [ + (match_operand:V2x4QI 1 "aarch64_mem_pair_lanes_operand" "Umn") + ] UNSPEC_LDP_FST))) (set (match_operand:DI 2 "register_operand" "=r") - (sign_extend:DI (match_operand:SI 3 "memory_operand" "m")))] - "rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (SImode)))" - "ldpsw\\t%0, %2, %z1" + (sign_extend:DI (unspec:SI [ + (match_dup 1) + ] UNSPEC_LDP_SND)))] + "" + "ldpsw\\t%0, %2, %y1" [(set_attr "type" "load_8")] ) @@ -2085,16 +2056,17 @@ (define_insn "*zero_extendsidi2_aarch64" (define_insn "*load_pair_zero_extendsidi2_aarch64" [(set (match_operand:DI 0 "register_operand") - (zero_extend:DI (match_operand:SI 1 "aarch64_mem_pair_operand"))) + (zero_extend:DI (unspec:SI [ + (match_operand:V2x4QI 1 "aarch64_mem_pair_lanes_operand") + ] UNSPEC_LDP_FST))) (set (match_operand:DI 2 "register_operand") - (zero_extend:DI (match_operand:SI 3 "memory_operand")))] - "rtx_equal_p (XEXP (operands[3], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (SImode)))" - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] - [ r , Ump , r , m ; load_8 , * ] ldp\t%w0, %w2, %z1 - [ w , Ump , w , m ; neon_load1_2reg , fp ] ldp\t%s0, %s2, %z1 + (zero_extend:DI (unspec:SI [ + (match_dup 1) + ] UNSPEC_LDP_SND)))] + "" + {@ [ cons: =0 , 1 , =2; attrs: type , arch] + [ r , Umn , r ; load_8 , * ] ldp\t%w0, %w2, %y1 + [ w , Umn , w ; neon_load1_2reg, fp ] ldp\t%s0, %s2, %y1 } ) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index a920de99ffc..fd8dd6db349 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1435,6 +1435,9 @@ (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI") (SI "V2SI") (SF "V2SF") (DI "V2DI") (DF "V2DF")]) +;; Load/store pair mode. +(define_mode_attr VPAIR [(SI "V2x4QI") (DI "V2x8QI")]) + ;; Register suffix for double-length mode. (define_mode_attr Vdtype [(V4HF "8h") (V2SF "4s")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 64bcdc65977..72afcabd5f0 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -266,10 +266,12 @@ (define_special_predicate "aarch64_mem_pair_operator" (match_test "known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (GET_MODE (op)))")))) -(define_predicate "aarch64_mem_pair_operand" - (and (match_code "mem") - (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), false, - ADDR_QUERY_LDP_STP)"))) +;; Like aarch64_mem_pair_operator, but additionally check the +;; address is suitable. +(define_special_predicate "aarch64_mem_pair_operand" + (and (match_operand 0 "aarch64_mem_pair_operator") + (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, 0), + false, ADDR_QUERY_LDP_STP)"))) (define_predicate "pmode_plus_operator" (and (match_code "plus")