From patchwork Tue Nov 5 14:29:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1189718 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-512503-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="YfBcsAeW"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.b="E59Co+1b"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.b="E59Co+1b"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 476sWP2Fgsz9sNx for ; Wed, 6 Nov 2019 01:29:53 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:references:in-reply-to:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=LYw /RAG/HRTH0hEgbFyUOp2SmH9FejExQNDjHGkqmKwlGrEvFgXZeMMhUNtiOKTWHCX L+F6QZdSjaPoMO7f/bJGPR7+QtaPPomMiw/c61Qf61V8Ji61raudCSyfp/JLUbx5 JcjiogH6QEcKIo3vj08DtyZbp5ZlX+ctCOg+DZqQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:references:in-reply-to:content-type :content-transfer-encoding:mime-version; s=default; bh=A15tRbIND w3z9dDpg7v1MqhLz1o=; b=YfBcsAeWBnkK55E03JyOtQhWjGOrYL8OAcejTEbJY qckNB0pdARXLoquEXxbiWpmDCJuXmO9IxZpeFPV4Ul/OIrvekTSOSDVGGhAZLmL/ ErbtXFLep0uZIN3D5lBlb2EIlgkIwgIA+pHxj+PGHyf1vbfJFH/ZqWJNJvX2BNhp Nk= Received: (qmail 111301 invoked by alias); 5 Nov 2019 14:29:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 111293 invoked by uid 89); 5 Nov 2019 14:29:45 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: EUR03-DB5-obe.outbound.protection.outlook.com Received: from mail-eopbgr40080.outbound.protection.outlook.com (HELO EUR03-DB5-obe.outbound.protection.outlook.com) (40.107.4.80) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 05 Nov 2019 14:29:42 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5WeHTpebdB6wbUDdQwUb2jSaxL1P7YiYQ4doTZ9ckHo=; b=E59Co+1b6SpQV7SU8E6i0vEsJAC0f3LVie8iuluILq54kyg2QOJG26E8jbHkA2Y+6GEoaJvcTBzM5HrMGLtei8lldagGJbo1C5xsAkrDD2JFCBCzmtdR1rduVSKCvNpLBpvt9wsmK+iq+N6jLID0x/yoMcF7fqXhLgh2dZGOCVY= Received: from AM6PR08CA0014.eurprd08.prod.outlook.com (2603:10a6:20b:b2::26) by AM5PR0802MB2516.eurprd08.prod.outlook.com (2603:10a6:203:a1::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2430.20; Tue, 5 Nov 2019 14:29:39 +0000 Received: from AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e08::205) by AM6PR08CA0014.outlook.office365.com (2603:10a6:20b:b2::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2408.24 via Frontend Transport; Tue, 5 Nov 2019 14:29:39 +0000 Authentication-Results: spf=fail (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT016.mail.protection.outlook.com (10.152.16.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2387.20 via Frontend Transport; Tue, 5 Nov 2019 14:29:39 +0000 Received: ("Tessian outbound 0cf06bf5c60e:v33"); Tue, 05 Nov 2019 14:29:39 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 57570bcec10dee0a X-CR-MTA-TID: 64aa7808 Received: from 072a6333a402.1 (cr-mta-lb-1.cr-mta-net [104.47.13.56]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id BBF07797-3D50-4E4C-B322-8F12BA2794FE.1; Tue, 05 Nov 2019 14:29:33 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04lp2056.outbound.protection.outlook.com [104.47.13.56]) by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 072a6333a402.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384); Tue, 05 Nov 2019 14:29:33 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KqqzM7Jr9HHqXLqsR4MCh6cGfqbLm4SY51HhGRg5KmfAkNeNxvzChdrtUgR0tB5Q0ngHnhuO7/n08Tc3aZyeHlb8RmAJdeXwyXnvs06ioY5w2HMmzpYREqj1J7bhVtvl1bRdeileCsTk4MbrsK/1fTMvmb83z4KX8C1w2G+UbNtiet9HDHqyQt3lvFd/TB0YQwrcnmivzHZGCsCU+Q0s0dthaWLSmtfxZL7/33HWM67RECCS6Wv5D/ZpF6HLDdtII87McUM6RcPIWm3l5NMHiikc2nNEF2l2ki0NSBHkm5Z70pcYTYjrpHXwfPFjiXO5ySP1ww0O2XrS3icHjiuz0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5WeHTpebdB6wbUDdQwUb2jSaxL1P7YiYQ4doTZ9ckHo=; b=mrs9cEgY5/gnnUnVXpYwVT6ujjuN3HSpljcWRXHsZulVMesHw+ANU/PlV/eqKgC7okKNj4qyGNnGehlyYWHPzN1OyXnRfm5Bq6XMiUxGEe2V9rqf1w0+UJV9pc7Z8zNu1crkxcuWsuH+XKHE9dL2Oa9CaKVVWq3Hxg9Q1YR5tJBtH34DsO7baUQJahwHBR4MHW582kbeJmn8qUI7j5EtPfKw1TCRpE+PZAJOxh059q2yEjf7QKRvXHCbDYRGStIWy2L/OTa3XE2jIpojn569WCZoRd4n9tzK6+gVol9lrbWvFNKpr4g0mgNshX2BPJO5vjd9djvPNdSGPliVvy+Ysg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5WeHTpebdB6wbUDdQwUb2jSaxL1P7YiYQ4doTZ9ckHo=; b=E59Co+1b6SpQV7SU8E6i0vEsJAC0f3LVie8iuluILq54kyg2QOJG26E8jbHkA2Y+6GEoaJvcTBzM5HrMGLtei8lldagGJbo1C5xsAkrDD2JFCBCzmtdR1rduVSKCvNpLBpvt9wsmK+iq+N6jLID0x/yoMcF7fqXhLgh2dZGOCVY= Received: from AM0PR08MB3140.eurprd08.prod.outlook.com (52.134.95.145) by AM0PR08MB4194.eurprd08.prod.outlook.com (20.178.202.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2408.24; Tue, 5 Nov 2019 14:29:31 +0000 Received: from AM0PR08MB3140.eurprd08.prod.outlook.com ([fe80::9ded:1181:9d29:5d68]) by AM0PR08MB3140.eurprd08.prod.outlook.com ([fe80::9ded:1181:9d29:5d68%3]) with mapi id 15.20.2408.024; Tue, 5 Nov 2019 14:29:31 +0000 From: Richard Sandiford To: "gcc-patches@gcc.gnu.org" Subject: [4/6] Optionally pick the cheapest loop_vec_info Date: Tue, 5 Nov 2019 14:29:31 +0000 Message-ID: References: In-Reply-To: (Richard Sandiford's message of "Tue, 05 Nov 2019 14:24:00 +0000") user-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) mail-followup-to: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Richard.Sandiford@arm.com; x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:7691;OLM:7691; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009020)(979002)(4636009)(136003)(396003)(376002)(366004)(346002)(39860400002)(189003)(199004)(14444005)(256004)(5640700003)(86362001)(2906002)(58126008)(316002)(81156014)(8676002)(26005)(81166006)(186003)(6116002)(6486002)(3846002)(66446008)(66066001)(66476007)(66556008)(8936002)(64756008)(66946007)(446003)(6916009)(486006)(71190400001)(5660300002)(2351001)(476003)(478600001)(14454004)(6436002)(44832011)(6512007)(6506007)(99286004)(102836004)(7736002)(71200400001)(36756003)(305945005)(2501003)(386003)(25786009)(11346002)(30864003)(2616005)(52116002)(76176011)(12043001)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR08MB4194; H:AM0PR08MB3140.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: mFP/rxSiu8yWrznHa4JwpaiR/bmCD0HGC/QGhTr1U+eBDnqpP/3FYZTt9zepBf1dUPIFiMZHhZM+YE07083zaIM30qVjPMvPlsPlopKp4ls7A6S1naQyIRKTJkMYXma+E3K6FmJ60+QiPcTuxTP9zCHnpWHX4IyW8tJsRLLq1YNHZk8H/dlAlgY4T1UE79x6JL6E4JeGZ6F1ZLo9n29IKSeikHqyH0YjoMJ2W4oyqeQYk6tHyy0YSvV+JIM+Q2fSLcw7CFa4YS6DEFNBOOm2rcFwXVrOmBbtSUuFQhX4MeAgYoZat/pcIdFfiE6brNu1TufwTLumHfmD2E1VEwCHZ8MRv0ZhbT+cPyOqnsPx/znfgRiQD8YnzoDUHtoW0yvNg/JgtdL9lzcAGbdMdf84OlTn1feeqQsYQsxhD0QTiMuiv4/ngkaGGL3BZJTXOiz3 x-ms-exchange-transport-forked: True MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Richard.Sandiford@arm.com; X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 09245a1d-e862-4d50-8486-08d761fc929a X-IsSubscribed: yes This patch adds a mode in which the vectoriser tries each available base vector mode and picks the one with the lowest cost. For now the behaviour is behind a default-off --param, but a later patch enables it by default for SVE. The patch keeps the current behaviour of preferring a VF of loop->simdlen over any larger or smaller VF, regardless of costs or target preferences. 2019-11-05 Richard Sandiford gcc/ * params.def (vect-compare-loop-costs): New param. * doc/invoke.texi: Document it. * tree-vectorizer.h (_loop_vec_info::vec_outside_cost) (_loop_vec_info::vec_inside_cost): New member variables. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them. (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions. (vect_analyze_loop): When the new parameter allows, try vectorizing the loop with each available vector mode and picking the one with the lowest cost. (vect_estimate_min_profitable_iters): Record the computed costs in the loop_vec_info. Index: gcc/params.def =================================================================== --- gcc/params.def 2019-10-31 17:15:25.470517368 +0000 +++ gcc/params.def 2019-11-05 14:19:58.781197820 +0000 @@ -661,6 +661,13 @@ DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIG "Maximum number of loop peels to enhance alignment of data references in a loop.", -1, -1, 64) +DEFPARAM(PARAM_VECT_COMPARE_LOOP_COSTS, + "vect-compare-loop-costs", + "Whether to try vectorizing a loop using each supported" + " combination of vector types and picking the version with the" + " lowest cost.", + 0, 0, 1) + DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS, "max-cselib-memory-locations", "The maximum memory locations recorded by cselib.", Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi 2019-11-04 21:13:57.611756365 +0000 +++ gcc/doc/invoke.texi 2019-11-05 14:19:58.777197850 +0000 @@ -11563,6 +11563,12 @@ doing loop versioning for alias in the v The maximum number of loop peels to enhance access alignment for vectorizer. Value -1 means no limit. +@item vect-compare-loop-costs +Whether to try vectorizing a loop using each supported combination of +vector types and picking the version with the lowest cost. This parameter +has no effect when @option{-fno-vect-cost-model} or +@option{-fvect-cost-model=unlimited} are used. + @item max-iterations-to-track The maximum number of iterations of a loop the brute-force algorithm for analysis of the number of iterations of the loop tries to evaluate. Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2019-11-05 14:19:33.829371745 +0000 +++ gcc/tree-vectorizer.h 2019-11-05 14:19:58.781197820 +0000 @@ -601,6 +601,13 @@ typedef class _loop_vec_info : public ve /* Cost of a single scalar iteration. */ int single_scalar_iteration_cost; + /* The cost of the vector prologue and epilogue, including peeled + iterations and set-up code. */ + int vec_outside_cost; + + /* The cost of the vector loop body. */ + int vec_inside_cost; + /* Is the loop vectorizable? */ bool vectorizable; Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2019-11-05 14:19:33.829371745 +0000 +++ gcc/tree-vect-loop.c 2019-11-05 14:19:58.781197820 +0000 @@ -830,6 +830,8 @@ _loop_vec_info::_loop_vec_info (class lo scan_map (NULL), slp_unrolling_factor (1), single_scalar_iteration_cost (0), + vec_outside_cost (0), + vec_inside_cost (0), vectorizable (false), can_fully_mask_p (true), fully_masked_p (false), @@ -2373,6 +2375,80 @@ vect_analyze_loop_2 (loop_vec_info loop_ goto start_over; } +/* Return true if vectorizing a loop using NEW_LOOP_VINFO appears + to be better than vectorizing it using OLD_LOOP_VINFO. Assume that + OLD_LOOP_VINFO is better unless something specifically indicates + otherwise. + + Note that this deliberately isn't a partial order. */ + +static bool +vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo, + loop_vec_info old_loop_vinfo) +{ + struct loop *loop = LOOP_VINFO_LOOP (new_loop_vinfo); + gcc_assert (LOOP_VINFO_LOOP (old_loop_vinfo) == loop); + + poly_int64 new_vf = LOOP_VINFO_VECT_FACTOR (new_loop_vinfo); + poly_int64 old_vf = LOOP_VINFO_VECT_FACTOR (old_loop_vinfo); + + /* Always prefer a VF of loop->simdlen over any other VF. */ + if (loop->simdlen) + { + bool new_simdlen_p = known_eq (new_vf, loop->simdlen); + bool old_simdlen_p = known_eq (old_vf, loop->simdlen); + if (new_simdlen_p != old_simdlen_p) + return new_simdlen_p; + } + + /* Limit the VFs to what is likely to be the maximum number of iterations, + to handle cases in which at least one loop_vinfo is fully-masked. */ + HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); + if (estimated_max_niter != -1) + { + if (known_le (estimated_max_niter, new_vf)) + new_vf = estimated_max_niter; + if (known_le (estimated_max_niter, old_vf)) + old_vf = estimated_max_niter; + } + + /* Check whether the (fractional) cost per scalar iteration is lower + or higher: new_inside_cost / new_vf vs. old_inside_cost / old_vf. */ + poly_widest_int rel_new = (new_loop_vinfo->vec_inside_cost + * poly_widest_int (old_vf)); + poly_widest_int rel_old = (old_loop_vinfo->vec_inside_cost + * poly_widest_int (new_vf)); + if (maybe_lt (rel_old, rel_new)) + return false; + if (known_lt (rel_new, rel_old)) + return true; + + /* If there's nothing to choose between the loop bodies, see whether + there's a difference in the prologue and epilogue costs. */ + if (new_loop_vinfo->vec_outside_cost != old_loop_vinfo->vec_outside_cost) + return new_loop_vinfo->vec_outside_cost < old_loop_vinfo->vec_outside_cost; + + return false; +} + +/* Decide whether to replace OLD_LOOP_VINFO with NEW_LOOP_VINFO. Return + true if we should. */ + +static bool +vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo, + loop_vec_info old_loop_vinfo) +{ + if (!vect_better_loop_vinfo_p (new_loop_vinfo, old_loop_vinfo)) + return false; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "***** Preferring vector mode %s to vector mode %s\n", + GET_MODE_NAME (new_loop_vinfo->vector_mode), + GET_MODE_NAME (old_loop_vinfo->vector_mode)); + return true; +} + /* Function vect_analyze_loop. Apply a set of analyses on LOOP, and create a loop_vec_info struct @@ -2408,6 +2484,8 @@ vect_analyze_loop (class loop *loop, vec machine_mode next_vector_mode = VOIDmode; poly_uint64 lowest_th = 0; unsigned vectorized_loops = 0; + bool pick_lowest_cost_p = (PARAM_VALUE (PARAM_VECT_COMPARE_LOOP_COSTS) + && !unlimited_cost_model (loop)); bool vect_epilogues = false; opt_result res = opt_result::success (); @@ -2428,6 +2506,34 @@ vect_analyze_loop (class loop *loop, vec bool fatal = false; + /* When pick_lowest_cost_p is true, we should in principle iterate + over all the loop_vec_infos that LOOP_VINFO could replace and + try to vectorize LOOP_VINFO under the same conditions. + E.g. when trying to replace an epilogue loop, we should vectorize + LOOP_VINFO as an epilogue loop with the same VF limit. When trying + to replace the main loop, we should vectorize LOOP_VINFO as a main + loop too. + + However, autovectorize_vector_modes is usually sorted as follows: + + - Modes that naturally produce lower VFs usually follow modes that + naturally produce higher VFs. + + - When modes naturally produce the same VF, maskable modes + usually follow unmaskable ones, so that the maskable mode + can be used to vectorize the epilogue of the unmaskable mode. + + This order is preferred because it leads to the maximum + epilogue vectorization opportunities. Targets should only use + a different order if they want to make wide modes available while + disparaging them relative to earlier, smaller modes. The assumption + in that case is that the wider modes are more expensive in some + way that isn't reflected directly in the costs. + + There should therefore be few interesting cases in which + LOOP_VINFO fails when treated as an epilogue loop, succeeds when + treated as a standalone loop, and ends up being genuinely cheaper + than FIRST_LOOP_VINFO. */ if (vect_epilogues) LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo; @@ -2475,13 +2581,34 @@ vect_analyze_loop (class loop *loop, vec LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL; simdlen = 0; } + else if (pick_lowest_cost_p && first_loop_vinfo) + { + /* Keep trying to roll back vectorization attempts while the + loop_vec_infos they produced were worse than this one. */ + vec &vinfos = first_loop_vinfo->epilogue_vinfos; + while (!vinfos.is_empty () + && vect_joust_loop_vinfos (loop_vinfo, vinfos.last ())) + { + gcc_assert (vect_epilogues); + delete vinfos.pop (); + } + if (vinfos.is_empty () + && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo)) + { + delete first_loop_vinfo; + first_loop_vinfo = opt_loop_vec_info::success (NULL); + LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL; + } + } if (first_loop_vinfo == NULL) { first_loop_vinfo = loop_vinfo; lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo); } - else if (vect_epilogues) + else if (vect_epilogues + /* For now only allow one epilogue loop. */ + && first_loop_vinfo->epilogue_vinfos.is_empty ()) { first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo); poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo); @@ -2501,12 +2628,14 @@ vect_analyze_loop (class loop *loop, vec && loop->inner == NULL && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK) && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) - /* For now only allow one epilogue loop. */ - && first_loop_vinfo->epilogue_vinfos.is_empty ()); + /* For now only allow one epilogue loop, but allow + pick_lowest_cost_p to replace it. */ + && (first_loop_vinfo->epilogue_vinfos.is_empty () + || pick_lowest_cost_p)); /* Commit to first_loop_vinfo if we have no reason to try alternatives. */ - if (!simdlen && !vect_epilogues) + if (!simdlen && !vect_epilogues && !pick_lowest_cost_p) break; } else @@ -3454,7 +3583,11 @@ vect_estimate_min_profitable_iters (loop &vec_inside_cost, &vec_epilogue_cost); vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost); - + + /* Stash the costs so that we can compare two loop_vec_infos. */ + loop_vinfo->vec_inside_cost = vec_inside_cost; + loop_vinfo->vec_outside_cost = vec_outside_cost; + if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, "Cost model analysis: \n");