diff mbox

[AArch64] Implement ALU_BRANCH fusion

Message ID CO2PR07MB2694CB39F815C51139D35101832C0@CO2PR07MB2694.namprd07.prod.outlook.com
State New
Headers show

Commit Message

Hurugalawadi, Naveen March 6, 2017, 5:10 a.m. UTC
Hi,

Please find attached the patch that implements alu_branch fusion
for AArch64.
The patch doesn't change spec but improve other benchmarks.

Bootstrapped and Regression tested on aarch64-thunder-linux.
Please review the patch and let us know if its okay for Stage-1?

Thanks,
Naveen

2017-03-06  Julian Brown  <julian@codesourcery.com>
	    Naveen H.S  <Naveen.Hurugalawadi@cavium.com>

	* config/aarch64/aarch64-fusion-pairs.def: Add ALU_BRANCH entry.
	* config/aarch64/aarch64.c (AARCH64_FUSE_ALU_BRANCH): New fusion type.
	(thunderx2t99_tunings): Set AARCH64_FUSE_ALU_BRANCH flag.
	(aarch_macro_fusion_pair_p): Add support for AARCH64_FUSE_ALU_BRANCH.

Comments

James Greenhalgh March 8, 2017, 6:04 p.m. UTC | #1
On Mon, Mar 06, 2017 at 05:10:10AM +0000, Hurugalawadi, Naveen wrote:
> Hi,
> 
> Please find attached the patch that implements alu_branch fusion
> for AArch64.
> The patch doesn't change spec but improve other benchmarks.
> 
> Bootstrapped and Regression tested on aarch64-thunder-linux.
> Please review the patch and let us know if its okay for Stage-1?

This description is insufficient for me to review this patch - in
particular I'd need more detail on what types of instruction pairs you
are trying to fuse. From inspection you will be trying to fuse any
ALU operation with an unconditional direct branch. Is that what you
intend?

i.e. you are looking to fuse instruction sequences like:

  add	x0, x1, #5
  b	.L3

  csel	x0, x1, x1, gt
  b	.L4

Have I understood that right?

> +  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
> +      && any_uncondjump_p (curr))
> +    {
> +      /* These types correspond to the reservation "vulcan_alu_basic" for
> +	 Broadcom Vulcan: these are ALU operations that produce a single uop
> +	 during instruction decoding.  */

This comment looks incorrect - there is no vulcan_alu_basic reservation
in trunk GCC.

Thanks,
James
Hurugalawadi, Naveen March 9, 2017, 6:22 a.m. UTC | #2
Hi James,

Thanks for the review and your comments.

>> I'd need more detail on what types of instruction pairs you
>> are trying to fuse. 

The documentation mentions it as follows:-
Single uop ALU instruction may fuse with adjacent branch instruction in the same bundle

>> This comment looks incorrect - there is no vulcan_alu_basic reservation

Modified as per comment.

Please let us know if the description is sufficient?

Thanks,
Naveen
James Greenhalgh March 9, 2017, 10:22 a.m. UTC | #3
On Thu, Mar 09, 2017 at 06:22:33AM +0000, Hurugalawadi, Naveen wrote:
> Hi James,
> 
> Thanks for the review and your comments.
> 
> >> I'd need more detail on what types of instruction pairs you
> >> are trying to fuse. 
> 
> The documentation mentions it as follows:-
> Single uop ALU instruction may fuse with adjacent branch instruction in the
> same bundle
> 
> >> This comment looks incorrect - there is no vulcan_alu_basic reservation
> 
> Modified as per comment.
> 
> Please let us know if the description is sufficient?

My reason for asking is that the instruction fusion implemented in LLVM
( lib/Target/AArch64/AArch64MacroFusion.cpp::shouldScheduleAdjacent ) is
between ALU instructions and conditional branches, while this patch fuses
ALU instructions and unconditional branches. I'm trying to understand why
there is a discrepancy, and consequently whether this patch is correct.

Your clarification helps, but it would be useful to know which sort of
branches you are actually targeting and to fix the disagreement between
this patch and LLVM.

Thanks,
James
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@  AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa25d43..62f5461 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@  static const struct tune_params thunderx2t99_tunings =
   &generic_approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -14063,6 +14064,37 @@  aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
         return true;
     }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+      && any_uncondjump_p (curr))
+    {
+      /* These types correspond to the reservation "vulcan_alu_basic" for
+	 Broadcom Vulcan: these are ALU operations that produce a single uop
+	 during instruction decoding.  */
+      switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+    }
+
   return false;
 }