From patchwork Tue May 23 15:58:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 766069 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wXKwN6gNFz9sPD for ; Wed, 24 May 2017 01:59:32 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="b/DvSUXB"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=ZjjwYZDjZIw7YLorYN oPFRSK4VMKAJk5typ3uWixlyOwsOX7A4f3WlSfN0ssM6iLublo3Rnr337z0jcxHy 8fPBnwk6c1oONjlDJJadWfI9A74wmISaaxwGOlmIVRmmD3R7jBArON5iLl/IIJeF 0IEa71dOVVNwFcHvCAHqb2LHM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=K8hjbbb6fwAA233EvcNNbZan MuA=; b=b/DvSUXBCVLWVQfTF5Ou/RFyXf9szcqUU3koQ8z3D3EcIzJB3fEFOIOW JPCIVoqWoPlV6POAgiDb2vEH3+y+LPm2k926EeHS7I2Ls58rcU1VoPg+j9JSajTU IrWQCcx6CrrC1fmg6dhFUWjc3d3iPparhPKxY2ZG/wmcmT92Qi4= Received: (qmail 97425 invoked by alias); 23 May 2017 15:58:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 97235 invoked by uid 89); 23 May 2017 15:58:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=cheapest X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 23 May 2017 15:58:42 +0000 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4NFcs3N042848 for ; Tue, 23 May 2017 11:58:44 -0400 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0a-001b2d01.pphosted.com with ESMTP id 2amcgba2vk-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 23 May 2017 11:58:44 -0400 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 23 May 2017 16:58:42 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 23 May 2017 16:58:38 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v4NFwcBT38076634; Tue, 23 May 2017 15:58:38 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 40C61AE04D; Tue, 23 May 2017 16:56:37 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1BA86AE055; Tue, 23 May 2017 16:56:37 +0100 (BST) Received: from oc6142347168.ibm.com (unknown [9.152.212.171]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 23 May 2017 16:56:37 +0100 (BST) Subject: [PATCH 3/5 v3] Vect peeling cost model To: Richard Biener References: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com> <2804E9EF-67D1-4EFD-AF29-65C634EBE24F@gmail.com> <6f1194a0-9e57-0028-faf4-6190beec2009@linux.vnet.ibm.com> <3e575f6d-874a-b260-1fc2-f4db1250c32b@linux.vnet.ibm.com> <15b3df5a-f6a1-23f0-57ff-1f065420df19@linux.vnet.ibm.com> <765ae66b-6294-6c39-a101-8c54a6be42ae@linux.vnet.ibm.com> Cc: GCC Patches , "Bin.Cheng" , Andreas Krebbel From: Robin Dapp Date: Tue, 23 May 2017 17:58:37 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: X-TM-AS-GCONF: 00 x-cbid: 17052315-0020-0000-0000-000003723F5B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17052315-0021-0000-0000-000041DBEAEA Message-Id: <0cb32428-adf9-d610-ee1e-a65ae05c4c93@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-05-23_04:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705230081 gcc/ChangeLog: 2017-05-23 Robin Dapp * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peeling info and set costs to zero for unlimited cost model. (vect_enhance_data_refs_alignment): Also inspect all datarefs with unknown misalignment. Compute and costs for unknown misalignment, compare them to the costs for known misalignment and choose the cheapest for peeling. diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index fe398ea..8cd6edd 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot, choosing an option with the lowest cost (if cost model is enabled) or the option that aligns as many accesses as possible. */ -static struct data_reference * +static struct _vect_peel_extended_info vect_peeling_hash_choose_best_peeling (hash_table *peeling_htab, loop_vec_info loop_vinfo, unsigned int *npeel, @@ -1365,11 +1365,13 @@ vect_peeling_hash_choose_best_peeling (hash_table *peeling_hta res.peel_info.count = 0; peeling_htab->traverse <_vect_peel_extended_info *, vect_peeling_hash_get_most_frequent> (&res); + res.inside_cost = 0; + res.outside_cost = 0; } *npeel = res.peel_info.npeel; *body_cost_vec = res.body_cost_vec; - return res.peel_info.dr; + return res; } /* Return true if the new peeling NPEEL is supported. */ @@ -1518,6 +1520,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) enum dr_alignment_support supportable_dr_alignment; struct data_reference *dr0 = NULL, *first_store = NULL; struct data_reference *dr; + struct data_reference *dr0_known_align = NULL; unsigned int i, j; bool do_peeling = false; bool do_versioning = false; @@ -1525,7 +1528,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) gimple *stmt; stmt_vec_info stmt_info; unsigned int npeel = 0; - bool all_misalignments_unknown = true; + bool one_misalignment_known = false; + bool one_misalignment_unknown = false; unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); unsigned possible_npeel_number = 1; tree vectype; @@ -1651,11 +1655,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) npeel_tmp += nelements; } - all_misalignments_unknown = false; - /* Data-ref that was chosen for the case that all the - misalignments are unknown is not relevant anymore, since we - have a data-ref with known alignment. */ - dr0 = NULL; + one_misalignment_known = true; } else { @@ -1663,35 +1663,32 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) peeling for data-ref that has the maximum number of data-refs with the same alignment, unless the target prefers to align stores over load. */ - if (all_misalignments_unknown) - { - unsigned same_align_drs - = STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length (); - if (!dr0 - || same_align_drs_max < same_align_drs) - { - same_align_drs_max = same_align_drs; - dr0 = dr; - } - /* For data-refs with the same number of related - accesses prefer the one where the misalign - computation will be invariant in the outermost loop. */ - else if (same_align_drs_max == same_align_drs) - { - struct loop *ivloop0, *ivloop; - ivloop0 = outermost_invariant_loop_for_expr - (loop, DR_BASE_ADDRESS (dr0)); - ivloop = outermost_invariant_loop_for_expr - (loop, DR_BASE_ADDRESS (dr)); - if ((ivloop && !ivloop0) - || (ivloop && ivloop0 - && flow_loop_nested_p (ivloop, ivloop0))) - dr0 = dr; - } + unsigned same_align_drs + = STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length (); + if (!dr0 + || same_align_drs_max < same_align_drs) + { + same_align_drs_max = same_align_drs; + dr0 = dr; + } + /* For data-refs with the same number of related + accesses prefer the one where the misalign + computation will be invariant in the outermost loop. */ + else if (same_align_drs_max == same_align_drs) + { + struct loop *ivloop0, *ivloop; + ivloop0 = outermost_invariant_loop_for_expr + (loop, DR_BASE_ADDRESS (dr0)); + ivloop = outermost_invariant_loop_for_expr + (loop, DR_BASE_ADDRESS (dr)); + if ((ivloop && !ivloop0) + || (ivloop && ivloop0 + && flow_loop_nested_p (ivloop, ivloop0))) + dr0 = dr; + } - if (!first_store && DR_IS_WRITE (dr)) - first_store = dr; - } + if (!first_store && DR_IS_WRITE (dr)) + first_store = dr; /* If there are both known and unknown misaligned accesses in the loop, we choose peeling amount according to the known @@ -1702,6 +1699,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) if (!first_store && DR_IS_WRITE (dr)) first_store = dr; } + + one_misalignment_unknown = true; } } else @@ -1722,8 +1721,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) || loop->inner) do_peeling = false; + unsigned int unknown_align_inside_cost = INT_MAX; + unsigned int unknown_align_outside_cost = INT_MAX; + if (do_peeling - && all_misalignments_unknown + && one_misalignment_unknown && vect_supportable_dr_alignment (dr0, false)) { /* Check if the target requires to prefer stores over loads, i.e., if @@ -1731,62 +1733,51 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) drs with same alignment into account). */ if (first_store && DR_IS_READ (dr0)) { - unsigned int load_inside_cost = 0, load_outside_cost = 0; - unsigned int store_inside_cost = 0, store_outside_cost = 0; - unsigned int load_inside_penalty = 0, load_outside_penalty = 0; - unsigned int store_inside_penalty = 0, store_outside_penalty = 0; + unsigned int load_inside_cost = 0; + unsigned int load_outside_cost = 0; + unsigned int store_inside_cost = 0; + unsigned int store_outside_cost = 0; stmt_vector_for_cost dummy; dummy.create (2); + vect_get_peeling_costs_all_drs (dr0, + &load_inside_cost, + &load_outside_cost, + &dummy, vf / 2, vf); + dummy.release (); - vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost, - &dummy); - vect_get_data_access_cost (first_store, &store_inside_cost, - &store_outside_cost, &dummy); - + dummy.create (2); + vect_get_peeling_costs_all_drs (first_store, + &store_inside_cost, + &store_outside_cost, + &dummy, vf / 2, vf); dummy.release (); - /* Calculate the penalty for leaving FIRST_STORE unaligned (by - aligning the load DR0). */ - load_inside_penalty = store_inside_cost; - load_outside_penalty = store_outside_cost; - for (i = 0; - STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt ( - DR_STMT (first_store))).iterate (i, &dr); - i++) - if (DR_IS_READ (dr)) - { - load_inside_penalty += load_inside_cost; - load_outside_penalty += load_outside_cost; - } - else - { - load_inside_penalty += store_inside_cost; - load_outside_penalty += store_outside_cost; - } - - /* Calculate the penalty for leaving DR0 unaligned (by - aligning the FIRST_STORE). */ - store_inside_penalty = load_inside_cost; - store_outside_penalty = load_outside_cost; - for (i = 0; - STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt ( - DR_STMT (dr0))).iterate (i, &dr); - i++) - if (DR_IS_READ (dr)) - { - store_inside_penalty += load_inside_cost; - store_outside_penalty += load_outside_cost; - } - else - { - store_inside_penalty += store_inside_cost; - store_outside_penalty += store_outside_cost; - } - - if (load_inside_penalty > store_inside_penalty - || (load_inside_penalty == store_inside_penalty - && load_outside_penalty > store_outside_penalty)) - dr0 = first_store; + if (load_inside_cost > store_inside_cost + || (load_inside_cost == store_inside_cost + && load_outside_cost > store_outside_cost)) + { + dr0 = first_store; + unknown_align_inside_cost = store_inside_cost; + unknown_align_outside_cost = store_outside_cost; + } + else + { + unknown_align_inside_cost = load_inside_cost; + unknown_align_outside_cost = load_outside_cost; + } + + stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec; + prologue_cost_vec.create (2); + epilogue_cost_vec.create (2); + + int dummy2; + unknown_align_outside_cost += vect_get_known_peeling_cost + (loop_vinfo, vf / 2, &dummy2, + &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), + &prologue_cost_vec, &epilogue_cost_vec); + + prologue_cost_vec.release (); + epilogue_cost_vec.release (); } /* Use peeling only if it may help to align other accesses in the loop or @@ -1804,22 +1795,35 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) do_peeling = false; } - if (do_peeling && !dr0) + struct _vect_peel_extended_info peel_for_known_alignment; + peel_for_known_alignment.inside_cost = INT_MAX; + peel_for_known_alignment.outside_cost = INT_MAX; + peel_for_known_alignment.peel_info.count = 0; + peel_for_known_alignment.peel_info.dr = NULL; + + if (do_peeling && one_misalignment_known) { /* Peeling is possible, but there is no data access that is not supported unless aligned. So we try to choose the best possible peeling. */ - /* We should get here only if there are drs with known misalignment. */ - gcc_assert (!all_misalignments_unknown); - /* Choose the best peeling from the hash table. */ - dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab, - loop_vinfo, &npeel, - &body_cost_vec); - if (!dr0 || !npeel) - do_peeling = false; + peel_for_known_alignment = vect_peeling_hash_choose_best_peeling + (&peeling_htab, loop_vinfo, &npeel, &body_cost_vec); + dr0_known_align = peel_for_known_alignment.peel_info.dr; + } + + /* Compare costs of peeling for known and unknown alignment. */ + if (dr0_known_align != NULL + && unknown_align_inside_cost >= peel_for_known_alignment.inside_cost) + { + dr0 = dr0_known_align; + if (!npeel) + do_peeling = false; } + if (dr0 == NULL) + do_peeling = false; + if (do_peeling) { stmt = DR_STMT (dr0);