From patchwork Wed May  3 08:19:59 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Biener <rguenther@suse.de>
X-Patchwork-Id: 757886
Return-Path: 
 <gcc-patches-return-452665-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3wHrh35vLrz9s7n
	for <incoming@patchwork.ozlabs.org>;
	Wed,  3 May 2017 18:20:34 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="Nb8WQ7Hv"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; q=dns; s=
	default; b=aVLwb2okikZU+cm1Z2/a1MyhcdMg3KwP+tgwH9jPK1tPVnV2G7FiT
	pEWThI6Zv2wtiC320TxLjZ5r+v409ju6VYRWpds9fYJKYDF6OLPmAVl+tgg5fZnM
	xm7mRb+JEMJIqOlqtDybiwTb9IZM9JCppLvXNb7SeqLh05u4j0S8R8=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; s=
	default; bh=q5DfJRzo9uOYPm2H2/AFMWQ39j8=; b=Nb8WQ7HvPWW82WTFbiO8
	pXMe92pQQzCPjQ6lAMkSwTW4ubu3eziZYAMvGxwDg3ydZ0Wcg05U7Vr1dzctTEcH
	vrk6h/zUXfi2jiLg4piLqm+TN8c24NY7Wk+1Yo7Zl5KS08J/PZHV7uvQVi2qC8Kz
	koHRdAMaa3i1DZkPpz0xsEM=
Received: (qmail 66692 invoked by alias); 3 May 2017 08:20:02 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 65505 invoked by uid 89); 3 May 2017 08:20:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_2,
	GIT_PATCH_3, KAM_ASCII_DIVIDERS, RP_MATCHES_RCVD,
	SPF_PASS autolearn=ham version=3.3.2 spammy=modeling, DRs,
	late
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 03 May 2017 08:19:59 +0000
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])	by
	mx1.suse.de (Postfix) with ESMTP id 68811AAB4	for
	<gcc-patches@gcc.gnu.org>; Wed,  3 May 2017 08:19:59 +0000 (UTC)
Date: Wed, 3 May 2017 10:19:59 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Improve vectorizer peeling for alignment costmodel
Message-ID: <alpine.LSU.2.20.1705030957290.17885@zhemvz.fhfr.qr>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0

The following extends the very simplistic cost modeling I added somewhen
late in the release process to, for all unknown misaligned refs, also
apply this model for loops containing stores.

The model basically says it's useless to peel for alignment if there's
only a single DR that is affected or if, in case we'll end up using
hw-supported misaligned loads, the cost of misaligned loads is the same
as of aligned ones.  Previously we'd usually align one of the stores
with the theory that this improves (precious) store-bandwith.

Note this is only a so slightly conservative (aka less peeling).  We'll
still apply peeling for alignment if you make the testcase use +=
because then we'll align both the load and the store from v1.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-05-03  Richard Biener  <rguenther@suse.de>

	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
	When all DRs have unknown misaligned do not always peel
	when there is a store but apply the same costing model as if
	there were only loads.

	* gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	(revision 247498)
+++ gcc/tree-vect-data-refs.c	(working copy)
@@ -1715,18 +1741,18 @@ vect_enhance_data_refs_alignment (loop_v
             dr0 = first_store;
         }
 
-      /* In case there are only loads with different unknown misalignments, use
-         peeling only if it may help to align other accesses in the loop or
+      /* Use peeling only if it may help to align other accesses in the loop or
 	 if it may help improving load bandwith when we'd end up using
 	 unaligned loads.  */
       tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (!first_store
-	  && !STMT_VINFO_SAME_ALIGN_REFS (
-		  vinfo_for_stmt (DR_STMT (dr0))).length ()
+      if (STMT_VINFO_SAME_ALIGN_REFS
+	    (vinfo_for_stmt (DR_STMT (dr0))).length () == 0
 	  && (vect_supportable_dr_alignment (dr0, false)
 	      != dr_unaligned_supported
-	      || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		  == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
+	      || (DR_IS_READ (dr0)
+		  && (builtin_vectorization_cost (vector_load, dr0_vt, 0)
+		      == builtin_vectorization_cost (unaligned_load,
+						     dr0_vt, -1)))))
         do_peeling = false;
     }
 

Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+void func(double * __restrict__ v1, double * v2, unsigned n)
+{
+  for (unsigned i = 0; i < n; ++i)
+    v1[i] = v2[i];
+}
+
+/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */