Patchwork [AArch64] Peepholes to generate ldp and stp instructions

login
register
mail settings
Submitter Hurugalawadi, Naveen
Date March 26, 2013, 10:27 a.m.
Message ID <F3068DEED1A463459E0887A091B1549312354681@BY2PRD0710MB364.namprd07.prod.outlook.com>
Download mbox | patch
Permalink /patch/231160/
State New
Headers show

Comments

Hurugalawadi, Naveen - March 26, 2013, 10:27 a.m.
Hi,

Please find attached the patch that implements load pair(ldp) and store
pair(stp) peephole for aarch64 target.

Please review the same and let me know if its okay.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

Thanks,
Naveen

gcc/

2013-03-26   Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>

	* config/aarch64/aarch64.md (peephole2s to generate ldp
	instruction for 2 consecutive loads from memory): New.
	(peephole2s to generate stp instruction for 2 consecutive
	stores to memory in integer mode): New.
	(peephole2s to generate ldp instruction for 2 consecutive
	loads from memory in floating point mode): New.
	(peephole2s to generate stp instruction for 2 consecutive
	stores to memory in floating point mode): New.
Mike Stump - March 26, 2013, 2:40 p.m.
On Mar 26, 2013, at 3:27 AM, "Hurugalawadi, Naveen" <Naveen.Hurugalawadi@caviumnetworks.com> wrote:
> Please find attached the patch that implements load pair(ldp) and store
> pair(stp) peephole for aarch64 target.

Ah, I wish gcc had a better machine independent optimizer for load/store combination.
Andrew Pinski - Jan. 30, 2014, 10:49 p.m.
Ping?

On Tue, Mar 26, 2013 at 3:27 AM, Hurugalawadi, Naveen
<Naveen.Hurugalawadi@caviumnetworks.com> wrote:
> Hi,
>
> Please find attached the patch that implements load pair(ldp) and store
> pair(stp) peephole for aarch64 target.
>
> Please review the same and let me know if its okay.
>
> Build and tested on aarch64-thunder-elf (using Cavium's internal
> simulator). No new regressions.
>
> Thanks,
> Naveen
>
> gcc/
>
> 2013-03-26   Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>
>
>         * config/aarch64/aarch64.md (peephole2s to generate ldp
>         instruction for 2 consecutive loads from memory): New.
>         (peephole2s to generate stp instruction for 2 consecutive
>         stores to memory in integer mode): New.
>         (peephole2s to generate ldp instruction for 2 consecutive
>         loads from memory in floating point mode): New.
>         (peephole2s to generate stp instruction for 2 consecutive
>         stores to memory in floating point mode): New.

Patch

--- gcc/config/aarch64/aarch64.md	2013-03-14 16:04:19.705897493 +0530
+++ gcc/config/aarch64/aarch64.md	2013-03-19 15:45:49.808730935 +0530
@@ -1013,6 +1013,26 @@ 
    (set_attr "mode" "<MODE>")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 "register_operand")
+	(match_operand:GPI 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPI 2 "register_operand")
+	(match_operand:GPI 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+      == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+				  GET_MODE_SIZE (<MODE>mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pair<mode>"
@@ -1029,6 +1049,26 @@ 
    (set_attr "mode" "<MODE>")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 "aarch64_mem_pair_operand")
+	(match_operand:GPI 1 "register_operand"))
+   (set (match_operand:GPI 2 "memory_operand")
+	(match_operand:GPI 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && REGNO_REG_CLASS (REGNO (operands[1]))
+      == REGNO_REG_CLASS (REGNO (operands[3]))
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode, XEXP (operands[0], 0),
+				  GET_MODE_SIZE (<MODE>mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "load_pair<mode>"
@@ -1045,6 +1085,27 @@ 
    (set_attr "mode" "<MODE>")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 "register_operand")
+	(match_operand:GPF 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPF 2 "register_operand")
+	(match_operand:GPF 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO (operands[0]) >= 32 && REGNO (operands[2]) >= 32
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+      == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+				  GET_MODE_SIZE (<MODE>mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pair<mode>"
@@ -1061,6 +1122,27 @@ 
    (set_attr "mode" "<MODE>")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 "aarch64_mem_pair_operand")
+	(match_operand:GPF 1 "register_operand"))
+   (set (match_operand:GPF 2 "memory_operand")
+	(match_operand:GPF 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && REGNO (operands[1]) >= 32 && REGNO (operands[3]) >= 32
+   && REGNO_REG_CLASS (REGNO (operands[1]))
+      == REGNO_REG_CLASS (REGNO (operands[3]))
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode, XEXP (operands[0], 0),
+				  GET_MODE_SIZE (<MODE>mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Load pair with writeback.  This is primarily used in function epilogues
 ;; when restoring [fp,lr]
 (define_insn "loadwb_pair<GPI:mode>_<PTR:mode>"