From patchwork Tue Nov  5 14:29:31 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1189718
Return-Path: 
 <gcc-patches-return-512503-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized)
	smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131;
	helo=sourceware.org;
	envelope-from=gcc-patches-return-512503-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="YfBcsAeW";
	dkim=fail reason="signature verification failed" (1024-bit key;
	unprotected) header.d=armh.onmicrosoft.com
	header.i=@armh.onmicrosoft.com header.b="E59Co+1b";
	dkim=fail reason="signature verification failed" (1024-bit key)
	header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com
	header.b="E59Co+1b"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 476sWP2Fgsz9sNx
	for <incoming@patchwork.ozlabs.org>;
	Wed,  6 Nov 2019 01:29:53 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:references:in-reply-to:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=LYw
	/RAG/HRTH0hEgbFyUOp2SmH9FejExQNDjHGkqmKwlGrEvFgXZeMMhUNtiOKTWHCX
	L+F6QZdSjaPoMO7f/bJGPR7+QtaPPomMiw/c61Qf61V8Ji61raudCSyfp/JLUbx5
	JcjiogH6QEcKIo3vj08DtyZbp5ZlX+ctCOg+DZqQ=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:references:in-reply-to:content-type
	:content-transfer-encoding:mime-version; s=default; bh=A15tRbIND
	w3z9dDpg7v1MqhLz1o=; b=YfBcsAeWBnkK55E03JyOtQhWjGOrYL8OAcejTEbJY
	qckNB0pdARXLoquEXxbiWpmDCJuXmO9IxZpeFPV4Ul/OIrvekTSOSDVGGhAZLmL/
	ErbtXFLep0uZIN3D5lBlb2EIlgkIwgIA+pHxj+PGHyf1vbfJFH/ZqWJNJvX2BNhp
	Nk=
Received: (qmail 111301 invoked by alias); 5 Nov 2019 14:29:45 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 111293 invoked by uid 89); 5 Nov 2019 14:29:45 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-9.2 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,
	SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: EUR03-DB5-obe.outbound.protection.outlook.com
Received: from mail-eopbgr40080.outbound.protection.outlook.com (HELO
	EUR03-DB5-obe.outbound.protection.outlook.com) (40.107.4.80)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	ESMTP; Tue, 05 Nov 2019 14:29:42 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
	s=selector2-armh-onmicrosoft-com;
	h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
	bh=5WeHTpebdB6wbUDdQwUb2jSaxL1P7YiYQ4doTZ9ckHo=;
	b=E59Co+1b6SpQV7SU8E6i0vEsJAC0f3LVie8iuluILq54kyg2QOJG26E8jbHkA2Y+6GEoaJvcTBzM5HrMGLtei8lldagGJbo1C5xsAkrDD2JFCBCzmtdR1rduVSKCvNpLBpvt9wsmK+iq+N6jLID0x/yoMcF7fqXhLgh2dZGOCVY=
Received: from AM6PR08CA0014.eurprd08.prod.outlook.com
	(2603:10a6:20b:b2::26) by AM5PR0802MB2516.eurprd08.prod.outlook.com
	(2603:10a6:203:a1::17) with Microsoft SMTP Server (version=TLS1_2,
	cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
	15.20.2430.20; Tue, 5 Nov 2019 14:29:39 +0000
Received: from AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com
	(2a01:111:f400:7e08::205) by
	AM6PR08CA0014.outlook.office365.com (2603:10a6:20b:b2::26)
	with Microsoft SMTP Server (version=TLS1_2,
	cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
	15.20.2408.24 via Frontend Transport;
	Tue, 5 Nov 2019 14:29:39 +0000
Authentication-Results: spf=fail (sender IP is 63.35.35.123)
	smtp.mailfrom=arm.com; gcc.gnu.org;
	dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;
	gcc.gnu.org; dmarc=none action=none header.from=arm.com;
Received-SPF: Fail (protection.outlook.com: domain of arm.com does not
	designate 63.35.35.123 as permitted sender)
	receiver=protection.outlook.com; client-ip=63.35.35.123;
	helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123)
	by AM5EUR03FT016.mail.protection.outlook.com (10.152.16.142)
	with Microsoft SMTP Server (version=TLS1_2,
	cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
	15.20.2387.20 via Frontend Transport;
	Tue, 5 Nov 2019 14:29:39 +0000
Received: ("Tessian outbound 0cf06bf5c60e:v33");
	Tue, 05 Nov 2019 14:29:39 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: 57570bcec10dee0a
X-CR-MTA-TID: 64aa7808
Received: from 072a6333a402.1 (cr-mta-lb-1.cr-mta-net [104.47.13.56])	by
	64aa7808-outbound-1.mta.getcheckrecipient.com id
	BBF07797-3D50-4E4C-B322-8F12BA2794FE.1;
	Tue, 05 Nov 2019 14:29:33 +0000
Received: from EUR04-HE1-obe.outbound.protection.outlook.com
	(mail-he1eur04lp2056.outbound.protection.outlook.com
	[104.47.13.56]) by
	64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id
	072a6333a402.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384);
	Tue, 05 Nov 2019 14:29:33 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
	b=KqqzM7Jr9HHqXLqsR4MCh6cGfqbLm4SY51HhGRg5KmfAkNeNxvzChdrtUgR0tB5Q0ngHnhuO7/n08Tc3aZyeHlb8RmAJdeXwyXnvs06ioY5w2HMmzpYREqj1J7bhVtvl1bRdeileCsTk4MbrsK/1fTMvmb83z4KX8C1w2G+UbNtiet9HDHqyQt3lvFd/TB0YQwrcnmivzHZGCsCU+Q0s0dthaWLSmtfxZL7/33HWM67RECCS6Wv5D/ZpF6HLDdtII87McUM6RcPIWm3l5NMHiikc2nNEF2l2ki0NSBHkm5Z70pcYTYjrpHXwfPFjiXO5ySP1ww0O2XrS3icHjiuz0Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
	s=arcselector9901;
	h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
	bh=5WeHTpebdB6wbUDdQwUb2jSaxL1P7YiYQ4doTZ9ckHo=;
	b=mrs9cEgY5/gnnUnVXpYwVT6ujjuN3HSpljcWRXHsZulVMesHw+ANU/PlV/eqKgC7okKNj4qyGNnGehlyYWHPzN1OyXnRfm5Bq6XMiUxGEe2V9rqf1w0+UJV9pc7Z8zNu1crkxcuWsuH+XKHE9dL2Oa9CaKVVWq3Hxg9Q1YR5tJBtH34DsO7baUQJahwHBR4MHW582kbeJmn8qUI7j5EtPfKw1TCRpE+PZAJOxh059q2yEjf7QKRvXHCbDYRGStIWy2L/OTa3XE2jIpojn569WCZoRd4n9tzK6+gVol9lrbWvFNKpr4g0mgNshX2BPJO5vjd9djvPNdSGPliVvy+Ysg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1;
	spf=pass smtp.mailfrom=arm.com;
	dmarc=pass action=none header.from=arm.com;
	dkim=pass header.d=arm.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
	s=selector2-armh-onmicrosoft-com;
	h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
	bh=5WeHTpebdB6wbUDdQwUb2jSaxL1P7YiYQ4doTZ9ckHo=;
	b=E59Co+1b6SpQV7SU8E6i0vEsJAC0f3LVie8iuluILq54kyg2QOJG26E8jbHkA2Y+6GEoaJvcTBzM5HrMGLtei8lldagGJbo1C5xsAkrDD2JFCBCzmtdR1rduVSKCvNpLBpvt9wsmK+iq+N6jLID0x/yoMcF7fqXhLgh2dZGOCVY=
Received: from AM0PR08MB3140.eurprd08.prod.outlook.com (52.134.95.145) by
	AM0PR08MB4194.eurprd08.prod.outlook.com (20.178.202.82) with
	Microsoft SMTP Server (version=TLS1_2,
	cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
	15.20.2408.24; Tue, 5 Nov 2019 14:29:31 +0000
Received: from AM0PR08MB3140.eurprd08.prod.outlook.com
	([fe80::9ded:1181:9d29:5d68]) by
	AM0PR08MB3140.eurprd08.prod.outlook.com
	([fe80::9ded:1181:9d29:5d68%3]) with mapi id 15.20.2408.024;
	Tue, 5 Nov 2019 14:29:31 +0000
From: Richard Sandiford <Richard.Sandiford@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [4/6] Optionally pick the cheapest loop_vec_info
Date: Tue, 5 Nov 2019 14:29:31 +0000
Message-ID: <mpttv7is8fp.fsf@arm.com>
References: <mptbltqtn9b.fsf@arm.com>
In-Reply-To: <mptbltqtn9b.fsf@arm.com> (Richard Sandiford's message of "Tue,
	05 Nov 2019 14:24:00 +0000")
user-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
mail-followup-to: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Authentication-Results-Original: spf=none (sender IP is )
	smtp.mailfrom=Richard.Sandiford@arm.com;
x-checkrecipientrouted: true
x-ms-oob-tlc-oobclassifiers: OLM:7691;OLM:7691;
X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;
	SFS:(10009020)(979002)(4636009)(136003)(396003)(376002)(366004)(346002)(39860400002)(189003)(199004)(14444005)(256004)(5640700003)(86362001)(2906002)(58126008)(316002)(81156014)(8676002)(26005)(81166006)(186003)(6116002)(6486002)(3846002)(66446008)(66066001)(66476007)(66556008)(8936002)(64756008)(66946007)(446003)(6916009)(486006)(71190400001)(5660300002)(2351001)(476003)(478600001)(14454004)(6436002)(44832011)(6512007)(6506007)(99286004)(102836004)(7736002)(71200400001)(36756003)(305945005)(2501003)(386003)(25786009)(11346002)(30864003)(2616005)(52116002)(76176011)(12043001)(969003)(989001)(999001)(1009001)(1019001);
	DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR08MB4194;
	H:AM0PR08MB3140.eurprd08.prod.outlook.com; FPR:; SPF:None;
	LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: 
 mFP/rxSiu8yWrznHa4JwpaiR/bmCD0HGC/QGhTr1U+eBDnqpP/3FYZTt9zepBf1dUPIFiMZHhZM+YE07083zaIM30qVjPMvPlsPlopKp4ls7A6S1naQyIRKTJkMYXma+E3K6FmJ60+QiPcTuxTP9zCHnpWHX4IyW8tJsRLLq1YNHZk8H/dlAlgY4T1UE79x6JL6E4JeGZ6F1ZLo9n29IKSeikHqyH0YjoMJ2W4oyqeQYk6tHyy0YSvV+JIM+Q2fSLcw7CFa4YS6DEFNBOOm2rcFwXVrOmBbtSUuFQhX4MeAgYoZat/pcIdFfiE6brNu1TufwTLumHfmD2E1VEwCHZ8MRv0ZhbT+cPyOqnsPx/znfgRiQD8YnzoDUHtoW0yvNg/JgtdL9lzcAGbdMdf84OlTn1feeqQsYQsxhD0QTiMuiv4/ngkaGGL3BZJTXOiz3
x-ms-exchange-transport-forked: True
MIME-Version: 1.0
Original-Authentication-Results: spf=none (sender IP is )
	smtp.mailfrom=Richard.Sandiford@arm.com;
X-MS-Exchange-Transport-CrossTenantHeadersStripped: 
 AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: 
 09245a1d-e862-4d50-8486-08d761fc929a
X-IsSubscribed: yes

This patch adds a mode in which the vectoriser tries each available
base vector mode and picks the one with the lowest cost.  For now
the behaviour is behind a default-off --param, but a later patch
enables it by default for SVE.

The patch keeps the current behaviour of preferring a VF of
loop->simdlen over any larger or smaller VF, regardless of costs
or target preferences.


2019-11-05  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* params.def (vect-compare-loop-costs): New param.
	* doc/invoke.texi: Document it.
	* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
	(_loop_vec_info::vec_inside_cost): New member variables.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
	(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
	(vect_analyze_loop): When the new parameter allows, try vectorizing
	the loop with each available vector mode and picking the one with
	the lowest cost.
	(vect_estimate_min_profitable_iters): Record the computed costs
	in the loop_vec_info.

Index: gcc/params.def
===================================================================
--- gcc/params.def	2019-10-31 17:15:25.470517368 +0000
+++ gcc/params.def	2019-11-05 14:19:58.781197820 +0000
@@ -661,6 +661,13 @@ DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIG
          "Maximum number of loop peels to enhance alignment of data references in a loop.",
          -1, -1, 64)
 
+DEFPARAM(PARAM_VECT_COMPARE_LOOP_COSTS,
+	 "vect-compare-loop-costs",
+	 "Whether to try vectorizing a loop using each supported"
+	 " combination of vector types and picking the version with the"
+	 " lowest cost.",
+	 0, 0, 1)
+
 DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
 	 "max-cselib-memory-locations",
 	 "The maximum memory locations recorded by cselib.",
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	2019-11-04 21:13:57.611756365 +0000
+++ gcc/doc/invoke.texi	2019-11-05 14:19:58.777197850 +0000
@@ -11563,6 +11563,12 @@ doing loop versioning for alias in the v
 The maximum number of loop peels to enhance access alignment
 for vectorizer. Value -1 means no limit.
 
+@item vect-compare-loop-costs
+Whether to try vectorizing a loop using each supported combination of
+vector types and picking the version with the lowest cost.  This parameter
+has no effect when @option{-fno-vect-cost-model} or
+@option{-fvect-cost-model=unlimited} are used.
+
 @item max-iterations-to-track
 The maximum number of iterations of a loop the brute-force algorithm
 for analysis of the number of iterations of the loop tries to evaluate.
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-05 14:19:33.829371745 +0000
+++ gcc/tree-vectorizer.h	2019-11-05 14:19:58.781197820 +0000
@@ -601,6 +601,13 @@ typedef class _loop_vec_info : public ve
   /* Cost of a single scalar iteration.  */
   int single_scalar_iteration_cost;
 
+  /* The cost of the vector prologue and epilogue, including peeled
+     iterations and set-up code.  */
+  int vec_outside_cost;
+
+  /* The cost of the vector loop body.  */
+  int vec_inside_cost;
+
   /* Is the loop vectorizable? */
   bool vectorizable;
 
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-05 14:19:33.829371745 +0000
+++ gcc/tree-vect-loop.c	2019-11-05 14:19:58.781197820 +0000
@@ -830,6 +830,8 @@ _loop_vec_info::_loop_vec_info (class lo
     scan_map (NULL),
     slp_unrolling_factor (1),
     single_scalar_iteration_cost (0),
+    vec_outside_cost (0),
+    vec_inside_cost (0),
     vectorizable (false),
     can_fully_mask_p (true),
     fully_masked_p (false),
@@ -2373,6 +2375,80 @@ vect_analyze_loop_2 (loop_vec_info loop_
   goto start_over;
 }
 
+/* Return true if vectorizing a loop using NEW_LOOP_VINFO appears
+   to be better than vectorizing it using OLD_LOOP_VINFO.  Assume that
+   OLD_LOOP_VINFO is better unless something specifically indicates
+   otherwise.
+
+   Note that this deliberately isn't a partial order.  */
+
+static bool
+vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
+			  loop_vec_info old_loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (new_loop_vinfo);
+  gcc_assert (LOOP_VINFO_LOOP (old_loop_vinfo) == loop);
+
+  poly_int64 new_vf = LOOP_VINFO_VECT_FACTOR (new_loop_vinfo);
+  poly_int64 old_vf = LOOP_VINFO_VECT_FACTOR (old_loop_vinfo);
+
+  /* Always prefer a VF of loop->simdlen over any other VF.  */
+  if (loop->simdlen)
+    {
+      bool new_simdlen_p = known_eq (new_vf, loop->simdlen);
+      bool old_simdlen_p = known_eq (old_vf, loop->simdlen);
+      if (new_simdlen_p != old_simdlen_p)
+	return new_simdlen_p;
+    }
+
+  /* Limit the VFs to what is likely to be the maximum number of iterations,
+     to handle cases in which at least one loop_vinfo is fully-masked.  */
+  HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop);
+  if (estimated_max_niter != -1)
+    {
+      if (known_le (estimated_max_niter, new_vf))
+	new_vf = estimated_max_niter;
+      if (known_le (estimated_max_niter, old_vf))
+	old_vf = estimated_max_niter;
+    }
+
+  /* Check whether the (fractional) cost per scalar iteration is lower
+     or higher: new_inside_cost / new_vf vs. old_inside_cost / old_vf.  */
+  poly_widest_int rel_new = (new_loop_vinfo->vec_inside_cost
+			     * poly_widest_int (old_vf));
+  poly_widest_int rel_old = (old_loop_vinfo->vec_inside_cost
+			     * poly_widest_int (new_vf));
+  if (maybe_lt (rel_old, rel_new))
+    return false;
+  if (known_lt (rel_new, rel_old))
+    return true;
+
+  /* If there's nothing to choose between the loop bodies, see whether
+     there's a difference in the prologue and epilogue costs.  */
+  if (new_loop_vinfo->vec_outside_cost != old_loop_vinfo->vec_outside_cost)
+    return new_loop_vinfo->vec_outside_cost < old_loop_vinfo->vec_outside_cost;
+
+  return false;
+}
+
+/* Decide whether to replace OLD_LOOP_VINFO with NEW_LOOP_VINFO.  Return
+   true if we should.  */
+
+static bool
+vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
+			loop_vec_info old_loop_vinfo)
+{
+  if (!vect_better_loop_vinfo_p (new_loop_vinfo, old_loop_vinfo))
+    return false;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "***** Preferring vector mode %s to vector mode %s\n",
+		     GET_MODE_NAME (new_loop_vinfo->vector_mode),
+		     GET_MODE_NAME (old_loop_vinfo->vector_mode));
+  return true;
+}
+
 /* Function vect_analyze_loop.
 
    Apply a set of analyses on LOOP, and create a loop_vec_info struct
@@ -2408,6 +2484,8 @@ vect_analyze_loop (class loop *loop, vec
   machine_mode next_vector_mode = VOIDmode;
   poly_uint64 lowest_th = 0;
   unsigned vectorized_loops = 0;
+  bool pick_lowest_cost_p = (PARAM_VALUE (PARAM_VECT_COMPARE_LOOP_COSTS)
+			     && !unlimited_cost_model (loop));
 
   bool vect_epilogues = false;
   opt_result res = opt_result::success ();
@@ -2428,6 +2506,34 @@ vect_analyze_loop (class loop *loop, vec
 
       bool fatal = false;
 
+      /* When pick_lowest_cost_p is true, we should in principle iterate
+	 over all the loop_vec_infos that LOOP_VINFO could replace and
+	 try to vectorize LOOP_VINFO under the same conditions.
+	 E.g. when trying to replace an epilogue loop, we should vectorize
+	 LOOP_VINFO as an epilogue loop with the same VF limit.  When trying
+	 to replace the main loop, we should vectorize LOOP_VINFO as a main
+	 loop too.
+
+	 However, autovectorize_vector_modes is usually sorted as follows:
+
+	 - Modes that naturally produce lower VFs usually follow modes that
+	   naturally produce higher VFs.
+
+	 - When modes naturally produce the same VF, maskable modes
+	   usually follow unmaskable ones, so that the maskable mode
+	   can be used to vectorize the epilogue of the unmaskable mode.
+
+	 This order is preferred because it leads to the maximum
+	 epilogue vectorization opportunities.  Targets should only use
+	 a different order if they want to make wide modes available while
+	 disparaging them relative to earlier, smaller modes.  The assumption
+	 in that case is that the wider modes are more expensive in some
+	 way that isn't reflected directly in the costs.
+
+	 There should therefore be few interesting cases in which
+	 LOOP_VINFO fails when treated as an epilogue loop, succeeds when
+	 treated as a standalone loop, and ends up being genuinely cheaper
+	 than FIRST_LOOP_VINFO.  */
       if (vect_epilogues)
 	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
 
@@ -2475,13 +2581,34 @@ vect_analyze_loop (class loop *loop, vec
 	      LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL;
 	      simdlen = 0;
 	    }
+	  else if (pick_lowest_cost_p && first_loop_vinfo)
+	    {
+	      /* Keep trying to roll back vectorization attempts while the
+		 loop_vec_infos they produced were worse than this one.  */
+	      vec<loop_vec_info> &vinfos = first_loop_vinfo->epilogue_vinfos;
+	      while (!vinfos.is_empty ()
+		     && vect_joust_loop_vinfos (loop_vinfo, vinfos.last ()))
+		{
+		  gcc_assert (vect_epilogues);
+		  delete vinfos.pop ();
+		}
+	      if (vinfos.is_empty ()
+		  && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo))
+		{
+		  delete first_loop_vinfo;
+		  first_loop_vinfo = opt_loop_vec_info::success (NULL);
+		  LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL;
+		}
+	    }
 
 	  if (first_loop_vinfo == NULL)
 	    {
 	      first_loop_vinfo = loop_vinfo;
 	      lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
 	    }
-	  else if (vect_epilogues)
+	  else if (vect_epilogues
+		   /* For now only allow one epilogue loop.  */
+		   && first_loop_vinfo->epilogue_vinfos.is_empty ())
 	    {
 	      first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo);
 	      poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
@@ -2501,12 +2628,14 @@ vect_analyze_loop (class loop *loop, vec
 			    && loop->inner == NULL
 			    && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
 			    && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
-			    /* For now only allow one epilogue loop.  */
-			    && first_loop_vinfo->epilogue_vinfos.is_empty ());
+			    /* For now only allow one epilogue loop, but allow
+			       pick_lowest_cost_p to replace it.  */
+			    && (first_loop_vinfo->epilogue_vinfos.is_empty ()
+				|| pick_lowest_cost_p));
 
 	  /* Commit to first_loop_vinfo if we have no reason to try
 	     alternatives.  */
-	  if (!simdlen && !vect_epilogues)
+	  if (!simdlen && !vect_epilogues && !pick_lowest_cost_p)
 	    break;
 	}
       else
@@ -3454,7 +3583,11 @@ vect_estimate_min_profitable_iters (loop
 	       &vec_inside_cost, &vec_epilogue_cost);
 
   vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
-  
+
+  /* Stash the costs so that we can compare two loop_vec_infos.  */
+  loop_vinfo->vec_inside_cost = vec_inside_cost;
+  loop_vinfo->vec_outside_cost = vec_outside_cost;
+
   if (dump_enabled_p ())
     {
       dump_printf_loc (MSG_NOTE, vect_location, "Cost model analysis: \n");