vect: Account for unused IFN_LOAD_LANES results

Message ID	mptzh1b93rt.fsf@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 12279385481F To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Subject: [PATCH] vect: Account for unused IFN_LOAD_LANES results Date: Thu, 14 Jan 2021 11:06:14 +0000 Message-ID: <mptzh1b93rt.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: list From: Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Richard Sandiford <richard.sandiford@arm.com> Cc: rguenther@suse.de Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	vect: Account for unused IFN_LOAD_LANES results \| expand vect: Account for unused IFN_LOAD_LANES results

Message ID

mptzh1b93rt.fsf@arm.com

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 12279385481F
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de,
 richard.sandiford@arm.com
Subject: [PATCH] vect: Account for unused IFN_LOAD_LANES results
Date: Thu, 14 Jan 2021 11:06:14 +0000
Message-ID: <mptzh1b93rt.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Precedence: list
From: Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org>
Reply-To: Richard Sandiford <richard.sandiford@arm.com>
Cc: rguenther@suse.de
Errors-To: gcc-patches-bounces@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>

Series

vect: Account for unused IFN_LOAD_LANES results | expand

Commit Message

Richard Sandiford Jan. 14, 2021, 11:06 a.m. UTC

At the moment, if we use only one vector of an LD4 result,
we'll treat the LD4 as having the cost of a single load.
But all 4 loads and any associated permutes take place
regardless of which results are actually used.

This patch therefore counts the cost of unused LOAD_LANES
results against the first statement in a group.  An alternative
would be to multiply the ncopies of the first stmt by the group
size and treat other stmts in the group as having zero cost,
but I thought that might be more surprising when reading dumps.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


gcc/
	* tree-vect-stmts.c (vect_model_load_cost): Account for unused
	IFN_LOAD_LANES results.

gcc/testsuite/
	* gcc.target/aarch64/sve/cost_model_11.c: New test.
	* gcc.target/aarch64/sve/mask_struct_load_5.c: Use
	-fno-vect-cost-model.
---
 .../gcc.target/aarch64/sve/cost_model_11.c    | 12 ++++++++++
 .../aarch64/sve/mask_struct_load_5.c          |  2 +-
 gcc/tree-vect-stmts.c                         | 24 +++++++++++++++++++
 3 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c

Comments

Richard Biener Jan. 14, 2021, 11:12 a.m. UTC | #1

On Thu, 14 Jan 2021, Richard Sandiford wrote:

> At the moment, if we use only one vector of an LD4 result,
> we'll treat the LD4 as having the cost of a single load.
> But all 4 loads and any associated permutes take place
> regardless of which results are actually used.
> 
> This patch therefore counts the cost of unused LOAD_LANES
> results against the first statement in a group.  An alternative
> would be to multiply the ncopies of the first stmt by the group
> size and treat other stmts in the group as having zero cost,
> but I thought that might be more surprising when reading dumps.
> 
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK to install?

OK.

Richard.

> Richard
> 
> 
> gcc/
> 	* tree-vect-stmts.c (vect_model_load_cost): Account for unused
> 	IFN_LOAD_LANES results.
> 
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/cost_model_11.c: New test.
> 	* gcc.target/aarch64/sve/mask_struct_load_5.c: Use
> 	-fno-vect-cost-model.
> ---
>  .../gcc.target/aarch64/sve/cost_model_11.c    | 12 ++++++++++
>  .../aarch64/sve/mask_struct_load_5.c          |  2 +-
>  gcc/tree-vect-stmts.c                         | 24 +++++++++++++++++++
>  3 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c
> new file mode 100644
> index 00000000000..d9f4ccc76de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c
> @@ -0,0 +1,12 @@
> +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=128" } */
> +
> +long
> +f (long *x, long *y, long *z, long n)
> +{
> +  long res = 0;
> +  for (long i = 0; i < n; ++i)
> +    z[i] = x[i * 4] + y[i * 4];
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tld4d\t} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c
> index da367e4fd79..2a33ee81d1a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
> +/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
>  
>  #include <stdint.h>
>  
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 068e4982303..4d72c4db2f7 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1120,6 +1120,30 @@ vect_model_load_cost (vec_info *vinfo,
>       once per group anyhow.  */
>    bool first_stmt_p = (first_stmt_info == stmt_info);
>  
> +  /* An IFN_LOAD_LANES will load all its vector results, regardless of which
> +     ones we actually need.  Account for the cost of unused results.  */
> +  if (first_stmt_p && !slp_node && memory_access_type == VMAT_LOAD_STORE_LANES)
> +    {
> +      unsigned int gaps = DR_GROUP_SIZE (first_stmt_info);
> +      stmt_vec_info next_stmt_info = first_stmt_info;
> +      do
> +	{
> +	  gaps -= 1;
> +	  next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
> +	}
> +      while (next_stmt_info);
> +      if (gaps)
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_NOTE, vect_location,
> +			     "vect_model_load_cost: %d unused vectors.\n",
> +			     gaps);
> +	  vect_get_load_cost (vinfo, stmt_info, ncopies * gaps, false,
> +			      &inside_cost, &prologue_cost,
> +			      cost_vec, cost_vec, true);
> +	}
> +    }
> +
>    /* We assume that the cost of a single load-lanes instruction is
>       equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
>       access is instead being provided by a load-and-permute operation,
>

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c
new file mode 100644
index 00000000000..d9f4ccc76de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_11.c
@@ -0,0 +1,12 @@ 
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=128" } */
+
+long
+f (long *x, long *y, long *z, long n)
+{
+  long res = 0;
+  for (long i = 0; i < n; ++i)
+    z[i] = x[i * 4] + y[i * 4];
+  return res;
+}
+
+/* { dg-final { scan-assembler-not {\tld4d\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c
index da367e4fd79..2a33ee81d1a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_5.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
 
 #include <stdint.h>
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 068e4982303..4d72c4db2f7 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1120,6 +1120,30 @@  vect_model_load_cost (vec_info *vinfo,
      once per group anyhow.  */
   bool first_stmt_p = (first_stmt_info == stmt_info);
 
+  /* An IFN_LOAD_LANES will load all its vector results, regardless of which
+     ones we actually need.  Account for the cost of unused results.  */
+  if (first_stmt_p && !slp_node && memory_access_type == VMAT_LOAD_STORE_LANES)
+    {
+      unsigned int gaps = DR_GROUP_SIZE (first_stmt_info);
+      stmt_vec_info next_stmt_info = first_stmt_info;
+      do
+	{
+	  gaps -= 1;
+	  next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
+	}
+      while (next_stmt_info);
+      if (gaps)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "vect_model_load_cost: %d unused vectors.\n",
+			     gaps);
+	  vect_get_load_cost (vinfo, stmt_info, ncopies * gaps, false,
+			      &inside_cost, &prologue_cost,
+			      cost_vec, cost_vec, true);
+	}
+    }
+
   /* We assume that the cost of a single load-lanes instruction is
      equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
      access is instead being provided by a load-and-permute operation,

vect: Account for unused IFN_LOAD_LANES results

Commit Message

Comments

Patch