From patchwork Wed Jul 17 10:24:42 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Biener <rguenther@suse.de>
X-Patchwork-Id: 1133232
Return-Path: 
 <gcc-patches-return-505194-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-505194-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="XUl4Rw3E"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 45pYKz3QCtz9sMQ
	for <incoming@patchwork.ozlabs.org>;
	Wed, 17 Jul 2019 20:24:53 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; q=dns; s=
	default; b=jJHd55IQ9Oq6uhTrBpJZsTyuqs0Ekbf6sQLscYG+2EPVPcfe+RLeV
	hihLEMhOI5PYy+ExUp8hZdrBE2Q2qVpC224Qmo2WKIs+Y3FayjBdkGPVTJSKlyi8
	KPYh0cvIJ0dOE7fivT+sIHfw33gAqeeywMDPL3Xlc64zKwXHtwC9DM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; s=
	default; bh=LQnD1Jd4rx1xJhHZrz01zp2XBnw=; b=XUl4Rw3EViV4OyvHkvbA
	UtIbAp5lOtD8QytHHqPtHJx9SSv+HIxTJYakELrYJo9FcPR+6oZ+TA78Dmgl1oYf
	FTUjTGmS+7TZyZbui40nC2Ya99rD9/3oKBOg4zFr+Hcy9RiM8L1X0az25ct/VP/W
	UBzsWFG6KnsukOf817G0O5E=
Received: (qmail 10945 invoked by alias); 17 Jul 2019 10:24:46 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 10937 invoked by uid 89); 17 Jul 2019 10:24:46 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-10.2 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT,
	SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 17 Jul 2019 10:24:45 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx1.suse.de
	(Postfix) with ESMTP id C3D9DB090	for <gcc-patches@gcc.gnu.org>;
	Wed, 17 Jul 2019 10:24:42 +0000 (UTC)
Date: Wed, 17 Jul 2019 12:24:42 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Fix PR91178
Message-ID: <alpine.LSU.2.20.1907171221200.2976@zhemvz.fhfr.qr>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0

This is the vectorizer part of the fix - currently when we
need to permute a load in contiguous accesses we load the
"gap" between two instances of a group as well.  That can
cause quite excessive code generation (fixed up by DCE / forwprop
later but confusing intermediate passes compile-time wise)
in case the gap is large.

The following addresses this in the SLP case, simply skipping
code generation of such loads.  This avoids the huge IV
increment chain which causes all of the followup issues.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-07-17  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/91178
	* tree-vect-stmts.c (get_group_load_store_type): For SLP
	loads with a gap larger than the vector size always use
	VMAT_STRIDED_SLP.
	(vectorizable_load): For VMAT_STRIDED_SLP with a permutation
	avoid loading vectors that are only contained in the gap
	and thus are not needed.

	* gcc.dg/torture/pr91178.c: New testcase.

Index: gcc/testsuite/gcc.dg/torture/pr91178.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/pr91178.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr91178.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+int a;
+extern int f[10][91125];
+int b[50];
+void c()
+{
+  for (int d = 6; d <= a; d++)
+    for (int e = 16; e <= 24; e++)
+      b[e] -= f[d][d];
+}
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	(revision 273520)
+++ gcc/tree-vect-stmts.c	(working copy)
@@ -2267,6 +2267,14 @@ get_group_load_store_type (stmt_vec_info
 			/ vect_get_scalar_dr_size (first_dr_info)))
 	    overrun_p = false;
 
+	  /* If the gap at the end of the group exceeds a whole vector
+	     in size use the strided SLP code which can skip code-generation
+	     for the gap.  */
+	  if (vls_type == VLS_LOAD && known_gt (gap, nunits))
+	    *memory_access_type = VMAT_STRIDED_SLP;
+	  else
+	    *memory_access_type = VMAT_CONTIGUOUS;
+
 	  /* If the gap splits the vector in half and the target
 	     can do half-vector operations avoid the epilogue peeling
 	     by simply loading half of the vector only.  Usually
@@ -2274,7 +2282,8 @@ get_group_load_store_type (stmt_vec_info
 	  dr_alignment_support alignment_support_scheme;
 	  scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
 	  machine_mode vmode;
-	  if (overrun_p
+	  if (*memory_access_type == VMAT_CONTIGUOUS
+	      && overrun_p
 	      && !masked_p
 	      && (((alignment_support_scheme
 		      = vect_supportable_dr_alignment (first_dr_info, false)))
@@ -2297,7 +2306,6 @@ get_group_load_store_type (stmt_vec_info
 				 "Peeling for outer loop is not supported\n");
 	      return false;
 	    }
-	  *memory_access_type = VMAT_CONTIGUOUS;
 	}
     }
   else
@@ -8732,6 +8740,7 @@ vectorizable_load (stmt_vec_info stmt_in
       /* Checked by get_load_store_type.  */
       unsigned int const_nunits = nunits.to_constant ();
       unsigned HOST_WIDE_INT cst_offset = 0;
+      unsigned int group_gap = 0;
 
       gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo));
       gcc_assert (!nested_in_vect_loop);
@@ -8749,6 +8758,7 @@ vectorizable_load (stmt_vec_info stmt_in
       if (slp && grouped_load)
 	{
 	  group_size = DR_GROUP_SIZE (first_stmt_info);
+	  group_gap = DR_GROUP_GAP (first_stmt_info);
 	  ref_type = get_group_alias_ptr_type (first_stmt_info);
 	}
       else
@@ -8892,6 +8902,14 @@ vectorizable_load (stmt_vec_info stmt_in
 	  if (nloads > 1)
 	    vec_alloc (v, nloads);
 	  stmt_vec_info new_stmt_info = NULL;
+	  if (slp && slp_perm
+	      && (group_el % group_size) > group_size - group_gap
+	      && (group_el % group_size) + nloads * lnel < group_size)
+	    {
+	      dr_chain.quick_push (NULL_TREE);
+	      group_el += nloads * lnel;
+	      continue;
+	    }
 	  for (i = 0; i < nloads; i++)
 	    {
 	      tree this_off = build_int_cst (TREE_TYPE (alias_off),