From patchwork Sat Nov 16 08:26:07 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1196071
Return-Path: 
 <gcc-patches-return-513770-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized)
	smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131;
	helo=sourceware.org;
	envelope-from=gcc-patches-return-513770-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="jRCR8c0/"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 47FSww4SjPz9sPF
	for <incoming@patchwork.ozlabs.org>;
	Sat, 16 Nov 2019 19:26:22 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; q=dns;
	s=default; b=UyO6zqoebAJcZUfwbBXhHeOdOGSK0wCsRcEatgOxnnVFUGlsQw
	Xlpr1yNtNJa8eyw9knWxt68q/IdUcp/xRA3KVdZehtz04HesmZ6cvOsXt/0aZtq9
	ZnziKnkGb0Nh3tFJYEWDcIyNIVncuBPij1cq9jRckDx0OSE6uY6uC+dXs=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; s=
	default; bh=l1W08LbyKSMYYzv+nhvaSSUzVwQ=; b=jRCR8c0/tdr8291wwlQP
	MkNfLmu3IZEW5YyleTy/Xl+AOXbHBFb7o5smbEYRZ6PwWX9fGjSgXXa+tn1f3pRa
	Wo5+YTbJG1xkkb8Rn7fqOL0nHbFMfjqhxV39wCkkrekDLJxI5qiMhh/HT4Jy7Df6
	UOToNGggJSwTqy951hkc9CM=
Received: (qmail 108057 invoked by alias); 16 Nov 2019 08:26:14 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 108046 invoked by uid 89); 16 Nov 2019 08:26:14 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-9.3 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	SPF_PASS autolearn=ham version=3.3.1 spammy=leaders, reuses
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Sat, 16 Nov 2019 08:26:11 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
	F401930E; Sat, 16 Nov 2019 00:26:08 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])	by
	usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
	D3C553F52E; Sat, 16 Nov 2019 00:29:08 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de,
	richard.sandiford@arm.com
Cc: rguenther@suse.de
Subject: Record leader nodes in the SLP graph (PR92516)
Date: Sat, 16 Nov 2019 08:26:07 +0000
Message-ID: <mptimnk2pnk.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
X-IsSubscribed: yes

If stmt analysis failed for an SLP node, r278246 tried building the
node from scalars instead.  By that point we've already processed child
nodes (usually successfully) and might have made some of them the lead
nodes for their stmt list.  This is why we couldn't free the child nodes
when deciding to build from scalars:

  /* Don't remove and free the child nodes here, since they could be
     referenced by other structures.  The analysis and scheduling phases
     (need to) ignore child nodes of anything that isn't vect_internal_def.  */

The problem in this PR is that we (correctly) don't process the unused
child nodes again during the scheduling phase, which means that those
nodes won't become the leader again.  So some other node with same stmts
becomes the leader instead.  However, any internal-def child nodes of
this new leader won't have been processed during the analysis phase,
because we also (correctly) skip child nodes of non-leader nodes.

We talked on IRC about fixing this by sharing nodes between SLP
instances, so that it becomes a "proper" graph.  But that seems
like it could throw up some problems too, so I wanted to fix the
PR in a less invasive way first.

This patch therefore records the leader nodes during the analysis
phase and reuses that choice during scheduling.  When scheduling
a non-leader node, we first try to schedule the leader node and
then reuse its vector stmts.  At the moment, doing this means we
need to know the leader node's SLP instance, so the patch records
that in the SLP node too.

While there, I noticed that the two-operation handling returned
early without restoring the original def types.  That's really
an independent bug fix.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/92516
	* tree-vectorizer.h (_slp_tree::leader): New member variable.
	(_slp_tree::leader_instance): Likewise.
	* tree-vect-slp.c (vect_create_new_slp_node): Initialize them.
	(vect_slp_analyze_node_operations): When deferring to an existing
	leader, record that leader in the SLP node.
	(vect_slp_analyze_operations): When committing to keeping an
	SLP instance, record that leaders in it are themselves leaderless.
	Also record the SLP instance we should use to vectorize them.
	(vect_schedule_slp_instance): Remove the bst_map parameter.
	If the node has a leader, vectorize that first and reuse
	its vector stmts.  Restore stmt def types for two-operation SLP.
	(vect_schedule_slp): Update call accordingly, removing the bst_map.

gcc/testsuite/
	* g++.dg/vect/slp-pr92516.cc: New test.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-14 15:14:29.239572543 +0000
+++ gcc/tree-vectorizer.h	2019-11-16 08:22:50.620257693 +0000
@@ -128,6 +128,11 @@ struct _slp_tree {
   vec<unsigned> load_permutation;
   /* Vectorized stmt/s.  */
   vec<stmt_vec_info> vec_stmts;
+  /* If non-null, the leader node that provides the vectorized stmts.  */
+  _slp_tree *leader;
+  /* If the node is a leader node, this is the slp_instance to use when
+     vectorizing it.  */
+  class _slp_instance *leader_instance;
   /* Number of vector stmts that are created to replace the group of scalar
      stmts. It is calculated during the transformation phase as the number of
      scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-14 15:33:43.503740636 +0000
+++ gcc/tree-vect-slp.c	2019-11-16 08:22:50.612257749 +0000
@@ -129,6 +129,8 @@ vect_create_new_slp_node (vec<stmt_vec_i
   SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
   SLP_TREE_TWO_OPERATORS (node) = false;
   SLP_TREE_DEF_TYPE (node) = vect_internal_def;
+  node->leader = NULL;
+  node->leader_instance = NULL;
   node->refcnt = 1;
   node->max_nunits = 1;
 
@@ -155,6 +157,8 @@ vect_create_new_slp_node (vec<tree> ops)
   SLP_TREE_LOAD_PERMUTATION (node) = vNULL;
   SLP_TREE_TWO_OPERATORS (node) = false;
   SLP_TREE_DEF_TYPE (node) = vect_external_def;
+  node->leader = NULL;
+  node->leader_instance = NULL;
   node->refcnt = 1;
   node->max_nunits = 1;
 
@@ -2764,6 +2768,7 @@ vect_slp_analyze_node_operations (vec_in
   if ((leader = visited->get (SLP_TREE_SCALAR_STMTS (node)))
       || (leader = lvisited->get (SLP_TREE_SCALAR_STMTS (node))))
     {
+      node->leader = *leader;
       SLP_TREE_NUMBER_OF_VEC_STMTS (node)
 	= SLP_TREE_NUMBER_OF_VEC_STMTS (*leader);
       /* Cope with cases in which we made a late decision to build the
@@ -2878,7 +2883,11 @@ vect_slp_analyze_operations (vec_info *v
 	{
 	  for (scalar_stmts_to_slp_tree_map_t::iterator x = lvisited.begin();
 	       x != lvisited.end(); ++x)
-	    visited->put ((*x).first.copy (), (*x).second);
+	    {
+	      visited->put ((*x).first.copy (), (*x).second);
+	      (*x).second->leader = NULL;
+	      (*x).second->leader_instance = instance;
+	    }
 	  i++;
 
 	  add_stmt_costs (vinfo->target_cost_data, &cost_vec);
@@ -4046,8 +4055,7 @@ vect_transform_slp_perm_load (slp_tree n
 /* Vectorize SLP instance tree in postorder.  */
 
 static void
-vect_schedule_slp_instance (slp_tree node, slp_instance instance,
-			    scalar_stmts_to_slp_tree_map_t *bst_map)
+vect_schedule_slp_instance (slp_tree node, slp_instance instance)
 {
   gimple_stmt_iterator si;
   stmt_vec_info stmt_info;
@@ -4066,15 +4074,18 @@ vect_schedule_slp_instance (slp_tree nod
 
   /* See if we have already vectorized the same set of stmts and reuse their
      vectorized stmts across instances.  */
-  if (slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node)))
+  if (node->leader)
     {
-      SLP_TREE_VEC_STMTS (node).safe_splice (SLP_TREE_VEC_STMTS (*leader));
+      gcc_assert (!node->leader->leader);
+      vect_schedule_slp_instance (node->leader, node->leader->leader_instance);
+      SLP_TREE_VEC_STMTS (node).safe_splice
+	(SLP_TREE_VEC_STMTS (node->leader));
       return;
     }
+  gcc_assert (instance == node->leader_instance);
 
-  bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-    vect_schedule_slp_instance (child, instance, bst_map);
+    vect_schedule_slp_instance (child, instance);
 
   /* Push SLP node def-type to stmts.  */
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
@@ -4107,6 +4118,7 @@ vect_schedule_slp_instance (slp_tree nod
 
   /* Handle two-operation SLP nodes by vectorizing the group with
      both operations and then performing a merge.  */
+  bool done_p = false;
   if (SLP_TREE_TWO_OPERATORS (node))
     {
       gassign *stmt = as_a <gassign *> (stmt_info->stmt);
@@ -4177,10 +4189,11 @@ vect_schedule_slp_instance (slp_tree nod
 	    }
 	  v0.release ();
 	  v1.release ();
-	  return;
+	  done_p = true;
 	}
     }
-  vect_transform_stmt (stmt_info, &si, node, instance);
+  if (!done_p)
+    vect_transform_stmt (stmt_info, &si, node, instance);
 
   /* Restore stmt def-types.  */
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
@@ -4294,14 +4307,12 @@ vect_schedule_slp (vec_info *vinfo)
   slp_instance instance;
   unsigned int i;
 
-  scalar_stmts_to_slp_tree_map_t *bst_map
-    = new scalar_stmts_to_slp_tree_map_t ();
   slp_instances = vinfo->slp_instances;
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
     {
       slp_tree node = SLP_INSTANCE_TREE (instance);
       /* Schedule the tree of INSTANCE.  */
-      vect_schedule_slp_instance (node, instance, bst_map);
+      vect_schedule_slp_instance (node, instance);
 
       if (SLP_INSTANCE_ROOT_STMT (instance))
 	vectorize_slp_instance_root_stmt (node, instance);
@@ -4310,7 +4321,6 @@ vect_schedule_slp (vec_info *vinfo)
 	dump_printf_loc (MSG_NOTE, vect_location,
                          "vectorizing stmts using SLP.\n");
     }
-  delete bst_map;
 
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
     {
Index: gcc/testsuite/g++.dg/vect/slp-pr92516.cc
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/g++.dg/vect/slp-pr92516.cc	2019-11-16 08:22:50.612257749 +0000
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target c++14 } */
+
+class a {
+public:
+  typedef int b;
+  operator b();
+};
+class c {
+public:
+  constexpr int m_fn1() const;
+  constexpr int d() const;
+  int e;
+  int f;
+};
+constexpr int c::m_fn1() const { return e; }
+constexpr int c::d() const { return f; }
+class g {
+public:
+  g();
+  constexpr void i(const c &) noexcept;
+  int j;
+  int k;
+  int l;
+  int m;
+};
+constexpr void g::i(const c &n) noexcept {
+  int v = l - j, h = m - k;
+  j = n.m_fn1() - v / 2;
+  k = n.d() - h / 2;
+  l = j + v;
+  m = k + h;
+}
+class o {
+  void m_fn4() const;
+  a p;
+} r;
+void o::m_fn4() const {
+  g q;
+  c t;
+  q.i(t);
+  r.p || 0;
+}