From patchwork Mon Jul  3 02:58:30 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Kewen.Lin" <linkw@linux.ibm.com>
X-Patchwork-Id: 1802491
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=iqjgHYwS;
	dkim-atps=neutral
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4QvVxk6crLz20b9
	for <incoming@patchwork.ozlabs.org>; Mon,  3 Jul 2023 12:59:06 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D95B5385772C
	for <incoming@patchwork.ozlabs.org>; Mon,  3 Jul 2023 02:59:04 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D95B5385772C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1688353144;
	bh=L5X4FfZsfQ+t0srp+70UFlXUbOrkAlSGN544wwaiceQ=;
	h=Date:Subject:To:Cc:References:In-Reply-To:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=iqjgHYwSipVcwq2YkjFJYz9YD5Zk3DynYalteVPK8666ak8UM2ROk/4HQIInhcx4c
	 YKAeUjwPi50xbQ8UO6NNOhmfQFZxdDHDP9oWnFQgAJAAqIJy3g82mMZmuykAMF5FKM
	 b1KUt9Y4qO1QcgGhAHUTFS8BA7N59WNobrR8uJeg=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id C9D563858296
 for <gcc-patches@gcc.gnu.org>; Mon,  3 Jul 2023 02:58:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C9D563858296
Received: from pps.filterd (m0353729.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3632putm009633; Mon, 3 Jul 2023 02:58:40 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rknwyg2bu-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Mon, 03 Jul 2023 02:58:39 +0000
Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3632rKm8014815;
 Mon, 3 Jul 2023 02:58:39 GMT
Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com
 [159.122.73.72])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rknwyg2bf-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Mon, 03 Jul 2023 02:58:38 +0000
Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1])
 by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3632peMr030302;
 Mon, 3 Jul 2023 02:58:36 GMT
Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227])
 by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3rjbddrs92-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Mon, 03 Jul 2023 02:58:36 +0000
Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com
 [10.20.54.101])
 by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 3632wXIk15008372
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Mon, 3 Jul 2023 02:58:34 GMT
Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id D4A592004B;
 Mon,  3 Jul 2023 02:58:33 +0000 (GMT)
Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id D634420040;
 Mon,  3 Jul 2023 02:58:31 +0000 (GMT)
Received: from [9.197.241.204] (unknown [9.197.241.204])
 by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP;
 Mon,  3 Jul 2023 02:58:31 +0000 (GMT)
Message-ID: <06e499be-2151-5c64-52be-ac8f69c46ad9@linux.ibm.com>
Date: Mon, 3 Jul 2023 10:58:30 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
 Gecko/20100101 Thunderbird/91.6.1
Subject: [PATCH 3/9 v2] vect: Adjust vectorizable_load costing on
 VMAT_INVARIANT
Content-Language: en-US
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com,
 segher@kernel.crashing.org, bergner@linux.ibm.com
References: <cover.1686573640.git.linkw@linux.ibm.com>
 <db8fecc7f079cd781695e26b6a7bf6a47e14e8ab.1686573640.git.linkw@linux.ibm.com>
 <CAFiYyc1YS1TnR8edxm5+=gjt7_PROLQN4dy9L+rZdukv_jDYaA@mail.gmail.com>
In-Reply-To: 
 <CAFiYyc1YS1TnR8edxm5+=gjt7_PROLQN4dy9L+rZdukv_jDYaA@mail.gmail.com>
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: 6SsjzeeRa1BrSRd1MH6p6yB73M-wV9kg
X-Proofpoint-ORIG-GUID: 9RBRQURpjZ8XSFgAVo0MMZDa8iPEcsIy
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26
 definitions=2023-07-03_02,2023-06-30_01,2023-05-22_02
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0
 mlxlogscore=999 malwarescore=0 phishscore=0 impostorscore=0
 lowpriorityscore=0 adultscore=0 bulkscore=0 clxscore=1015 mlxscore=0
 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.12.0-2305260000 definitions=main-2307030022
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Reply-To: "Kewen.Lin" <linkw@linux.ibm.com>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Hi Richi,

on 2023/6/30 19:18, Richard Biener wrote:
> On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin <linkw@linux.ibm.com> wrote:
>>
>> This patch adjusts the cost handling on VMAT_INVARIANT in
>> function vectorizable_load.  We don't call function
>> vect_model_load_cost for it any more.
>>
>> To make the costing on VMAT_INVARIANT better, this patch is
>> to query hoist_defs_of_uses for hoisting decision, and add
>> costs for different "where" based on it.  Currently function
>> hoist_defs_of_uses would always hoist the defs of all SSA
>> uses, adding one argument HOIST_P aims to avoid the actual
>> hoisting during costing phase.
>>
>> gcc/ChangeLog:
>>
>>         * tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
>>         (vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
>>         hoisting decision and without calling vect_model_load_cost.
>>         (vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
>>         and remove VMAT_INVARIANT related handlings.
>> ---
>>  gcc/tree-vect-stmts.cc | 61 +++++++++++++++++++++++++++---------------
>>  1 file changed, 39 insertions(+), 22 deletions(-)
>>
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 744cdf40e26..19c61d703c8 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
>>                       slp_tree slp_node,
>>                       stmt_vector_for_cost *cost_vec)
>>  {
>> -  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
>> +  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
>> +             && memory_access_type != VMAT_INVARIANT);
>>
>>    unsigned int inside_cost = 0, prologue_cost = 0;
>>    bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>> @@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo,
>>                                        ncopies * assumed_nunits,
>>                                        scalar_load, stmt_info, 0, vect_body);
>>      }
>> -  else if (memory_access_type == VMAT_INVARIANT)
>> -    {
>> -      /* Invariant loads will ideally be hoisted and splat to a vector.  */
>> -      prologue_cost += record_stmt_cost (cost_vec, 1,
>> -                                        scalar_load, stmt_info, 0,
>> -                                        vect_prologue);
>> -      prologue_cost += record_stmt_cost (cost_vec, 1,
>> -                                        scalar_to_vec, stmt_info, 0,
>> -                                        vect_prologue);
>> -    }
>>    else
>>      vect_get_load_cost (vinfo, stmt_info, ncopies,
>>                         alignment_support_scheme, misalignment, first_stmt_p,
>> @@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo,
>>  /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP,
>>     inserting them on the loops preheader edge.  Returns true if we
>>     were successful in doing so (and thus STMT_INFO can be moved then),
>> -   otherwise returns false.  */
>> +   otherwise returns false.  HOIST_P indicates if we want to hoist the
>> +   definitions of all SSA uses, it would be false when we are costing.  */
>>
>>  static bool
>> -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
>> +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p)
>>  {
>>    ssa_op_iter i;
>>    tree op;
>> @@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
>>    if (!any)
>>      return true;
>>
>> +  if (!hoist_p)
>> +    return true;
>> +
>>    FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
>>      {
>>        gimple *def_stmt = SSA_NAME_DEF_STMT (op);
>> @@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo,
>>
>>    if (memory_access_type == VMAT_INVARIANT)
>>      {
>> -      if (costing_p)
>> -       {
>> -         vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
>> -                               memory_access_type, alignment_support_scheme,
>> -                               misalignment, &gs_info, slp_node, cost_vec);
>> -         return true;
>> -       }
>> -
>>        gcc_assert (!grouped_load && !mask && !bb_vinfo);
>>        /* If we have versioned for aliasing or the loop doesn't
>>          have any data dependencies that would preclude this,
>> @@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo,
>>          thus we can insert it on the preheader edge.  */
>>        bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo)
>>                       && !nested_in_vect_loop
>> -                     && hoist_defs_of_uses (stmt_info, loop));
>> +                     && hoist_defs_of_uses (stmt_info, loop, !costing_p));
> 
> 'hoist_defs_of_uses' should ideally be computed once at analysis time and
> the result remembered.  It's not so easy in this case so maybe just
> add a comment
> for this here.

Ok, updated with: 

       /* If we have versioned for aliasing or the loop doesn't
 	 have any data dependencies that would preclude this,
 	 then we are sure this is a loop invariant load and
-	 thus we can insert it on the preheader edge.  */
+	 thus we can insert it on the preheader edge.
+	 TODO: hoist_defs_of_uses should ideally be computed
+	 once at analysis time, remembered and used in the
+	 transform time.  */


> 
>> +      if (costing_p)
>> +       {
>> +         if (hoist_p)
>> +           {
>> +             unsigned int prologue_cost;
>> +             prologue_cost = record_stmt_cost (cost_vec, 1, scalar_load,
>> +                                               stmt_info, 0, vect_prologue);
>> +             prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
>> +                                                stmt_info, 0, vect_prologue);
>> +             if (dump_enabled_p ())
>> +               dump_printf_loc (MSG_NOTE, vect_location,
>> +                                "vect_model_load_cost: inside_cost = 0, "
>> +                                "prologue_cost = %d .\n",
>> +                                prologue_cost);
>> +           }
>> +         else
>> +           {
>> +             unsigned int inside_cost;
>> +             inside_cost = record_stmt_cost (cost_vec, 1, scalar_load,
>> +                                             stmt_info, 0, vect_body);
>> +             inside_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
>> +                                              stmt_info, 0, vect_body);
>> +             if (dump_enabled_p ())
>> +               dump_printf_loc (MSG_NOTE, vect_location,
>> +                                "vect_model_load_cost: inside_cost = %d, "
>> +                                "prologue_cost = 0 .\n",
>> +                                inside_cost);
>> +           }
> 
> Please instead do
> 
>               enum vect_cost_model_location loc = hoist_p ?
> vect_prologue : vect_body;
> 
> and merge the two branches which otherwise look identical to me.

Good idea, the dump_printf_loc also has some difference, updated with:

+      if (costing_p)
+	{
+	  enum vect_cost_model_location cost_loc
+	    = hoist_p ? vect_prologue : vect_body;
+	  unsigned int cost = record_stmt_cost (cost_vec, 1, scalar_load,
+						stmt_info, 0, cost_loc);
+	  cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info, 0,
+				    cost_loc);
+	  unsigned int prologue_cost = hoist_p ? cost : 0;
+	  unsigned int inside_cost = hoist_p ? 0 : cost;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "vect_model_load_cost: inside_cost = %d, "
+			     "prologue_cost = %d .\n",
+			     inside_cost, prologue_cost);
+	  return true;
+	}

---------------------

The whole patch v2 is as below:
---
2.31.1
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index dd8f5421d4e..ce53cb30c79 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1136,7 +1136,8 @@ vect_model_load_cost (vec_info *vinfo,
 		      slp_tree slp_node,
 		      stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
+  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
+	      && memory_access_type != VMAT_INVARIANT);

   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1241,16 +1242,6 @@ vect_model_load_cost (vec_info *vinfo,
 				       ncopies * assumed_nunits,
 				       scalar_load, stmt_info, 0, vect_body);
     }
-  else if (memory_access_type == VMAT_INVARIANT)
-    {
-      /* Invariant loads will ideally be hoisted and splat to a vector.  */
-      prologue_cost += record_stmt_cost (cost_vec, 1,
-					 scalar_load, stmt_info, 0,
-					 vect_prologue);
-      prologue_cost += record_stmt_cost (cost_vec, 1,
-					 scalar_to_vec, stmt_info, 0,
-					 vect_prologue);
-    }
   else
     vect_get_load_cost (vinfo, stmt_info, ncopies,
 			alignment_support_scheme, misalignment, first_stmt_p,
@@ -9269,10 +9260,11 @@ permute_vec_elements (vec_info *vinfo,
 /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP,
    inserting them on the loops preheader edge.  Returns true if we
    were successful in doing so (and thus STMT_INFO can be moved then),
-   otherwise returns false.  */
+   otherwise returns false.  HOIST_P indicates if we want to hoist the
+   definitions of all SSA uses, it would be false when we are costing.  */

 static bool
-hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
+hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p)
 {
   ssa_op_iter i;
   tree op;
@@ -9306,6 +9298,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
   if (!any)
     return true;

+  if (!hoist_p)
+    return true;
+
   FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
     {
       gimple *def_stmt = SSA_NAME_DEF_STMT (op);
@@ -9658,22 +9653,34 @@ vectorizable_load (vec_info *vinfo,

   if (memory_access_type == VMAT_INVARIANT)
     {
-      if (costing_p)
-	{
-	  vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
-				memory_access_type, alignment_support_scheme,
-				misalignment, &gs_info, slp_node, cost_vec);
-	  return true;
-	}
-
       gcc_assert (!grouped_load && !mask && !bb_vinfo);
       /* If we have versioned for aliasing or the loop doesn't
 	 have any data dependencies that would preclude this,
 	 then we are sure this is a loop invariant load and
-	 thus we can insert it on the preheader edge.  */
+	 thus we can insert it on the preheader edge.
+	 TODO: hoist_defs_of_uses should ideally be computed
+	 once at analysis time, remembered and used in the
+	 transform time.  */
       bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo)
 		      && !nested_in_vect_loop
-		      && hoist_defs_of_uses (stmt_info, loop));
+		      && hoist_defs_of_uses (stmt_info, loop, !costing_p));
+      if (costing_p)
+	{
+	  enum vect_cost_model_location cost_loc
+	    = hoist_p ? vect_prologue : vect_body;
+	  unsigned int cost = record_stmt_cost (cost_vec, 1, scalar_load,
+						stmt_info, 0, cost_loc);
+	  cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info, 0,
+				    cost_loc);
+	  unsigned int prologue_cost = hoist_p ? cost : 0;
+	  unsigned int inside_cost = hoist_p ? 0 : cost;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "vect_model_load_cost: inside_cost = %d, "
+			     "prologue_cost = %d .\n",
+			     inside_cost, prologue_cost);
+	  return true;
+	}
       if (hoist_p)
 	{
 	  gassign *stmt = as_a <gassign *> (stmt_info->stmt);