diff mbox series

AArch64: Improve address rematerialization costs

Message ID VE1PR08MB5599FB325815E5957CD41AAE833D9@VE1PR08MB5599.eurprd08.prod.outlook.com
State New
Headers show
Series AArch64: Improve address rematerialization costs | expand

Commit Message

Wilco Dijkstra June 2, 2021, 10:21 a.m. UTC
Hi,

Given the large improvements from better register allocation of GOT accesses,
I decided to generalize it to get large gains for normal addressing too:

Improve rematerialization costs of addresses.  The current costs are set too high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better rematerialization
        costs for HIGH, LO_SUM and SYMBOL_REF.

---

Comments

Richard Earnshaw June 2, 2021, 3:55 p.m. UTC | #1
On 02/06/2021 11:21, Wilco Dijkstra via Gcc-patches wrote:
> Hi,
> 
> Given the large improvements from better register allocation of GOT accesses,
> I decided to generalize it to get large gains for normal addressing too:
> 
> Improve rematerialization costs of addresses.  The current costs are set too high
> which results in extra register pressure and spilling.  Using lower costs means
> addresses will be rematerialized more often rather than being spilled or causing
> spills.  This results in significant codesize reductions and performance gains.
> SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
> smaller.
> 
> Passes bootstrap and regress. OK for commit?
> 
> ChangeLog:
> 2021-06-01  Wilco Dijkstra  <wdijkstr@arm.com>
> 
>          * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better rematerialization
>          costs for HIGH, LO_SUM and SYMBOL_REF.
> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13444,45 +13444,22 @@ cost_plus:
>   	  return false;  /* All arguments need to be in registers.  */
>   	}
>   
> -    case SYMBOL_REF:
> +    /* The following costs are used for rematerialization of addresses.
> +       Set a low cost for all global accesses - this ensures they are
> +       preferred for rematerialization, blocks them from being spilled
> +       and reduces register pressure.  The result is significant codesize
> +       reductions and performance gains. */
>   
> -      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
> -	  || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
> -	{
> -	  /* LDR.  */
> -	  if (speed)
> -	    *cost += extra_cost->ldst.load;
> -	}
> -      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
> -	       || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
> -	{
> -	  /* ADRP, followed by ADD.  */
> -	  *cost += COSTS_N_INSNS (1);
> -	  if (speed)
> -	    *cost += 2 * extra_cost->alu.arith;
> -	}
> -      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
> -	       || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
> -	{
> -	  /* ADR.  */
> -	  if (speed)
> -	    *cost += extra_cost->alu.arith;
> -	}
> -
> -      if (flag_pic)
> -	{
> -	  /* One extra load instruction, after accessing the GOT.  */
> -	  *cost += COSTS_N_INSNS (1);
> -	  if (speed)
> -	    *cost += extra_cost->ldst.load;
> -	}
> +    case SYMBOL_REF:
> +      *cost = 0;
>         return true;

No.  It's never correct to completely wipe out the existing cost - you 
don't know the context where this is being used.

The most you can do is not add any additional cost.

Similarly for all the other cases.

>   
>       case HIGH:
> +      *cost = 0;
> +      return true;
> +
>       case LO_SUM:
> -      /* ADRP/ADD (immediate).  */
> -      if (speed)
> -	*cost += extra_cost->alu.arith;
> +      *cost = COSTS_N_INSNS (3) / 4;
>         return true;
>   
>       case ZERO_EXTRACT:
>
Wilco Dijkstra June 2, 2021, 4:48 p.m. UTC | #2
Hi Richard,

> No.  It's never correct to completely wipe out the existing cost - you 
> don't know the context where this is being used.
> 
> The most you can do is not add any additional cost.

Remember that aarch64_rtx_costs starts like this:

  /* By default, assume that everything has equivalent cost to the
     cheapest instruction.  Any additional costs are applied as a delta
     above this default.  */
  *cost = COSTS_N_INSNS (1);

This is literally the last statement executed before the big switch...
Given the cost is always initialized, there is no existing cost besides this
default value, and thus changing it to something else is not an issue.
We could of course do something like:

*cost -= COSTS_N_INSNS (1);

But that is less clear and problematic if the default value ever changes.

Cheers,
Wilco
Wilco Dijkstra Oct. 20, 2021, 2:52 p.m. UTC | #3
ping


From: Wilco Dijkstra
Sent: 02 June 2021 11:21
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [PATCH] AArch64: Improve address rematerialization costs 
 
Hi,

Given the large improvements from better register allocation of GOT accesses,
I decided to generalize it to get large gains for normal addressing too:

Improve rematerialization costs of addresses.  The current costs are set too high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better rematerialization
        costs for HIGH, LO_SUM and SYMBOL_REF.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13444,45 +13444,22 @@ cost_plus:
           return false;  /* All arguments need to be in registers.  */
         }
 
-    case SYMBOL_REF:
+    /* The following costs are used for rematerialization of addresses.
+       Set a low cost for all global accesses - this ensures they are
+       preferred for rematerialization, blocks them from being spilled
+       and reduces register pressure.  The result is significant codesize
+       reductions and performance gains. */
 
-      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
-         || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-       {
-         /* LDR.  */
-         if (speed)
-           *cost += extra_cost->ldst.load;
-       }
-      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-              || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-       {
-         /* ADRP, followed by ADD.  */
-         *cost += COSTS_N_INSNS (1);
-         if (speed)
-           *cost += 2 * extra_cost->alu.arith;
-       }
-      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-              || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-       {
-         /* ADR.  */
-         if (speed)
-           *cost += extra_cost->alu.arith;
-       }
-
-      if (flag_pic)
-       {
-         /* One extra load instruction, after accessing the GOT.  */
-         *cost += COSTS_N_INSNS (1);
-         if (speed)
-           *cost += extra_cost->ldst.load;
-       }
+    case SYMBOL_REF:
+      *cost = 0;
       return true;
 
     case HIGH:
+      *cost = 0;
+      return true;
+
     case LO_SUM:
-      /* ADRP/ADD (immediate).  */
-      if (speed)
-       *cost += extra_cost->alu.arith;
+      *cost = COSTS_N_INSNS (3) / 4;
       return true;
 
     case ZERO_EXTRACT:
Wilco Dijkstra Nov. 4, 2021, 2:18 p.m. UTC | #4
ping


From: Wilco Dijkstra
Sent: 02 June 2021 11:21
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [PATCH] AArch64: Improve address rematerialization costs 
 
Hi,

Given the large improvements from better register allocation of GOT accesses,
I decided to generalize it to get large gains for normal addressing too:

Improve rematerialization costs of addresses.  The current costs are set too high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better rematerialization
        costs for HIGH, LO_SUM and SYMBOL_REF.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13444,45 +13444,22 @@ cost_plus:
           return false;  /* All arguments need to be in registers.  */
         }
 
-    case SYMBOL_REF:
+    /* The following costs are used for rematerialization of addresses.
+       Set a low cost for all global accesses - this ensures they are
+       preferred for rematerialization, blocks them from being spilled
+       and reduces register pressure.  The result is significant codesize
+       reductions and performance gains. */
 
-      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
-         || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-       {
-         /* LDR.  */
-         if (speed)
-           *cost += extra_cost->ldst.load;
-       }
-      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-              || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-       {
-         /* ADRP, followed by ADD.  */
-         *cost += COSTS_N_INSNS (1);
-         if (speed)
-           *cost += 2 * extra_cost->alu.arith;
-       }
-      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-              || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-       {
-         /* ADR.  */
-         if (speed)
-           *cost += extra_cost->alu.arith;
-       }
-
-      if (flag_pic)
-       {
-         /* One extra load instruction, after accessing the GOT.  */
-         *cost += COSTS_N_INSNS (1);
-         if (speed)
-           *cost += extra_cost->ldst.load;
-       }
+    case SYMBOL_REF:
+      *cost = 0;
       return true;
 
     case HIGH:
+      *cost = 0;
+      return true;
+
     case LO_SUM:
-      /* ADRP/ADD (immediate).  */
-      if (speed)
-       *cost += extra_cost->alu.arith;
+      *cost = COSTS_N_INSNS (3) / 4;
       return true;
 
     case ZERO_EXTRACT:
Richard Sandiford Nov. 4, 2021, 6:22 p.m. UTC | #5
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> ping

Can you fold in the rtx costs part of the original GOT relaxation patch?

I don't think there's enough information here for me to be able to review
the patch though.  I'll need to find testcases, look in detail at what
the rtl passes are doing, and try to work out whether (and why) this is
a good way of fixing things.

I don't mind doing that, but I don't think I'll have time before stage 3.

Thanks,
Richard

>
>
> From: Wilco Dijkstra
> Sent: 02 June 2021 11:21
> To: GCC Patches <gcc-patches@gcc.gnu.org>
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: [PATCH] AArch64: Improve address rematerialization costs
>
> Hi,
>
> Given the large improvements from better register allocation of GOT accesses,
> I decided to generalize it to get large gains for normal addressing too:
>
> Improve rematerialization costs of addresses.  The current costs are set too high
> which results in extra register pressure and spilling.  Using lower costs means
> addresses will be rematerialized more often rather than being spilled or causing
> spills.  This results in significant codesize reductions and performance gains.
> SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
> smaller.
>
> Passes bootstrap and regress. OK for commit?
>
> ChangeLog:
> 2021-06-01  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better rematerialization
>         costs for HIGH, LO_SUM and SYMBOL_REF.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13444,45 +13444,22 @@ cost_plus:
>            return false;  /* All arguments need to be in registers.  */
>          }
>
> -    case SYMBOL_REF:
> +    /* The following costs are used for rematerialization of addresses.
> +       Set a low cost for all global accesses - this ensures they are
> +       preferred for rematerialization, blocks them from being spilled
> +       and reduces register pressure.  The result is significant codesize
> +       reductions and performance gains. */
>
> -      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
> -         || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
> -       {
> -         /* LDR.  */
> -         if (speed)
> -           *cost += extra_cost->ldst.load;
> -       }
> -      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
> -              || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
> -       {
> -         /* ADRP, followed by ADD.  */
> -         *cost += COSTS_N_INSNS (1);
> -         if (speed)
> -           *cost += 2 * extra_cost->alu.arith;
> -       }
> -      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
> -              || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
> -       {
> -         /* ADR.  */
> -         if (speed)
> -           *cost += extra_cost->alu.arith;
> -       }
> -
> -      if (flag_pic)
> -       {
> -         /* One extra load instruction, after accessing the GOT.  */
> -         *cost += COSTS_N_INSNS (1);
> -         if (speed)
> -           *cost += extra_cost->ldst.load;
> -       }
> +    case SYMBOL_REF:
> +      *cost = 0;
>        return true;
>
>      case HIGH:
> +      *cost = 0;
> +      return true;
> +
>      case LO_SUM:
> -      /* ADRP/ADD (immediate).  */
> -      if (speed)
> -       *cost += extra_cost->alu.arith;
> +      *cost = COSTS_N_INSNS (3) / 4;
>        return true;
>
>      case ZERO_EXTRACT:
Wilco Dijkstra Nov. 24, 2021, 4:51 p.m. UTC | #6
Hi Richard,

> Can you fold in the rtx costs part of the original GOT relaxation patch?

Sure, see below for the updated version.

> I don't think there's enough information here for me to be able to review
> the patch though.  I'll need to find testcases, look in detail at what
> the rtl passes are doing, and try to work out whether (and why) this is
> a good way of fixing things.

Well today GCC does everything with costs rather than backend callbacks.
I'd be interested in hearing about alternatives that have the same effect 
without a callback that allows a backend to decide between spilling and
rematerialization.

Cheers,
Wilco


v2: fold in GOT remat cost

Improve rematerialization costs of addresses.  The current costs are set too high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better rematerialization
        costs for HIGH, LO_SUM and SYMREF.
---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 39de231d8ac6d10362cdd2b48eb9bd9de60c6703..a7f99ece55383168fb0f77e5c11c501d0bb2f013 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13610,45 +13610,28 @@ cost_plus:
 	  return false;  /* All arguments need to be in registers.  */
 	}
 
+    /* The following costs are used for rematerialization of addresses.
+       Set a low cost for all global accesses - this ensures they are
+       preferred for rematerialization, blocks them from being spilled
+       and reduces register pressure.  The result is significant codesize
+       reductions and performance gains. */
+
     case SYMBOL_REF:
 
-      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
-	  || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-	{
-	  /* LDR.  */
-	  if (speed)
-	    *cost += extra_cost->ldst.load;
-	}
-      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-	       || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-	{
-	  /* ADRP, followed by ADD.  */
-	  *cost += COSTS_N_INSNS (1);
-	  if (speed)
-	    *cost += 2 * extra_cost->alu.arith;
-	}
-      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-	       || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-	{
-	  /* ADR.  */
-	  if (speed)
-	    *cost += extra_cost->alu.arith;
-	}
+      /* Use a separate remateralization cost for GOT accesses.  */
+      if (aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC
+	  && aarch64_classify_symbol (x, 0) == SYMBOL_SMALL_GOT_4G)
+	*cost = COSTS_N_INSNS (1) / 2;
 
-      if (flag_pic)
-	{
-	  /* One extra load instruction, after accessing the GOT.  */
-	  *cost += COSTS_N_INSNS (1);
-	  if (speed)
-	    *cost += extra_cost->ldst.load;
-	}
+      *cost = 0;
       return true;
 
     case HIGH:
+      *cost = 0;
+      return true;
+
     case LO_SUM:
-      /* ADRP/ADD (immediate).  */
-      if (speed)
-	*cost += extra_cost->alu.arith;
+      *cost = COSTS_N_INSNS (3) / 4;
       return true;
 
     case ZERO_EXTRACT:
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13444,45 +13444,22 @@  cost_plus:
 	  return false;  /* All arguments need to be in registers.  */
 	}
 
-    case SYMBOL_REF:
+    /* The following costs are used for rematerialization of addresses.
+       Set a low cost for all global accesses - this ensures they are
+       preferred for rematerialization, blocks them from being spilled
+       and reduces register pressure.  The result is significant codesize
+       reductions and performance gains. */
 
-      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
-	  || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-	{
-	  /* LDR.  */
-	  if (speed)
-	    *cost += extra_cost->ldst.load;
-	}
-      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-	       || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-	{
-	  /* ADRP, followed by ADD.  */
-	  *cost += COSTS_N_INSNS (1);
-	  if (speed)
-	    *cost += 2 * extra_cost->alu.arith;
-	}
-      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-	       || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-	{
-	  /* ADR.  */
-	  if (speed)
-	    *cost += extra_cost->alu.arith;
-	}
-
-      if (flag_pic)
-	{
-	  /* One extra load instruction, after accessing the GOT.  */
-	  *cost += COSTS_N_INSNS (1);
-	  if (speed)
-	    *cost += extra_cost->ldst.load;
-	}
+    case SYMBOL_REF:
+      *cost = 0;
       return true;
 
     case HIGH:
+      *cost = 0;
+      return true;
+
     case LO_SUM:
-      /* ADRP/ADD (immediate).  */
-      if (speed)
-	*cost += extra_cost->alu.arith;
+      *cost = COSTS_N_INSNS (3) / 4;
       return true;
 
     case ZERO_EXTRACT: