From patchwork Wed Sep 2 03:26:33 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Cheng X-Patchwork-Id: 513274 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id A8DC31401E7 for ; Wed, 2 Sep 2015 13:26:59 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=eHSnK7F0; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=FUtNJnWG0DgJuRrEqHtqY/XBjeEW4h6STaoTbQWyGCT5bY1cYQ7F+ avqPSK6C6QAvpGu7tFuoAMvEXRy2tDK7uCcUwbJlbOv9Lu+LB/+ooWmgukiebbrT vk6jC/khqwg4Rj0qZmMvSOvNicsjmRg42ve7nHr7Kqr844+8hyP+aU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=gjuiacevoPHL87kCbOIfrkeHX+0=; b=eHSnK7F0r7HPiqg7mjck gljhZ9mHzW5qbs4ZjPTL3vsmtOV34Kj2TqKHQ1Nxjis3rmO/EdKIIsUraCjPIXU8 EPzsrpyJGgQ71t2i5cslN9FF0AZ8Lrwd2L4eMLEOkvuBTeEhJOZbli/4oLXWsjXX dQL25cFZsRQWGE1ABuBkmQg= Received: (qmail 117214 invoked by alias); 2 Sep 2015 03:26:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 117200 invoked by uid 89); 2 Sep 2015 03:26:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 02 Sep 2015 03:26:47 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-18-M71j7hKeSliZs7Nsgo6hHQ-1; Wed, 02 Sep 2015 04:26:42 +0100 Received: from shawin233 ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 2 Sep 2015 04:26:41 +0100 From: "Bin Cheng" To: Subject: [PATCH PR66388]Add sizetype cand for BIV of smaller type if it's used as index of memory ref Date: Wed, 2 Sep 2015 11:26:33 +0800 Message-ID: <000401d0e52f$2c1917d0$844b4770$@arm.com> MIME-Version: 1.0 X-MC-Unique: M71j7hKeSliZs7Nsgo6hHQ-1 X-IsSubscribed: yes Hi, This patch is a new approach to fix PR66388. IVO today computes iv_use with iv_cand which has at least same type precision as the use. On 64bit platforms like AArch64, this results in different iv_cand created for each address type iv_use, and register pressure increased. As a matter of fact, the BIV should be used for all iv_uses in some of these cases. It is a latent bug but recently getting worse because of overflow changes. The original approach at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue except it conflict with IV elimination. Seems to me it is impossible to mitigate the contradiction. This new approach fixes the issue by adding sizetype iv_cand for BIVs directly. In cases if the original BIV is preferred, the sizetype iv_cand will be chosen. As for code generation, the sizetype iv_cand has the same effect as the original BIV. Actually, it's better because BIV needs to be explicitly extended to sizetype to be used in address expression on most targets. One shortage of this approach is it may introduce more iv candidates. To minimize the impact, this patch does sophisticated code analysis and adds sizetype candidate for BIV only if it is used as index. Moreover, it avoids to add candidate of the original type if the BIV is only used as index. Statistics for compiling spec2k6 shows increase of candidate number is modest and can be ignored. There are two more patches following to fix corner cases revealed by this one. In together they bring obvious perf improvement for spec26k/int on aarch64. Spec2k6/int 400.perlbench 3.44% 445.gobmk -0.86% 456.hmmer 14.83% 458.sjeng 2.49% 462.libquantum -0.79% GEOMEAN 1.68% There is also about 0.36% improvement for spec2k6/fp, mostly because of case 436.cactusADM. I believe it can be further improved, but that should be another patch. I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As for spec2k6/int, though the geomean is improved slightly, 400.perlbench is regressed by ~3%. I can see BIVs are chosen for some loops instead of address candidates. Generally, the loop header will be simplified because iv elimination with BIV is simpler; the number of instructions in loop body isn't changed. I suspect the regression comes from different addressing modes. With BIV, complex addressing mode like [base + index << scale + disp] is used, rather than [base + disp]. I guess the former has more micro-ops, thus more expensive. This guess can be confirmed by manually suppressing the complex addressing mode with higher address cost. Now the problem becomes why overall cost of BIV is computed lower while the actual cost is higher. I noticed for most affected loops, loop header is bloated because of iv elimination using the old address candidate. The bloated loop header results in much higher cost than BIV. As a result, BIV is preferred. I also noticed the bloated loop header generally can be simplified (I have a following patch for this). After applying the local patch, the old address candidate is chosen, and most of regression is recovered. Conclusion is I think loop header bloated issue should be blamed for the regression, and it can be resolved. Bootstrap and test on x64_64 and aarch64. It fixes failure of gcc.target/i386/pr49781-1.c, without new breakage. So what do you think? Thanks, bin 2015-08-31 Bin Cheng * tree-affine.c (aff_combination_expand): New parameters. (tree_to_aff_combination_expand): Ditto. * tree-affine.h (aff_combination_expand): New declaration. (tree_to_aff_combination_expand): Ditto. * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields. (dump_iv): Dump no_overflow information. (alloc_iv): Initialize new field for struct iv. (struct expand_data): New struct for affine combination expanding. (stop_expand): New callback func for affine combination expanding. (find_deriving_biv_for_iv, record_biv_for_address_use): New functions. (idx_find_step): Call new functions above. (find_depends, add_candidate): New paramter. (add_iv_candidate_for_biv): Add sizetype cand for BIV. (get_computation_aff): Simplify convertion of cand for BIV. (get_computation_cost_at): Step cand's base if necessary. Index: gcc/tree-affine.c =================================================================== --- gcc/tree-affine.c (revision 227163) +++ gcc/tree-affine.c (working copy) @@ -625,11 +656,14 @@ struct name_expansion }; /* Expands SSA names in COMB recursively. CACHE is used to cache the - results. */ + results. If callback function STOP is specified, this function stops + expanding when it returns TRUE. DATA points to private structure + used by the callback function. */ void aff_combination_expand (aff_tree *comb ATTRIBUTE_UNUSED, - hash_map **cache) + hash_map **cache, + bool (*stop) (void *, void *), void *data) { unsigned i; aff_tree to_add, current, curre; @@ -654,6 +688,11 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_U name = TREE_OPERAND (e, 0); if (TREE_CODE (name) != SSA_NAME) continue; + + /* Don't expand further if STOP returns TRUE. */ + if (stop != NULL && (*stop) (data, name)) + continue; + def = SSA_NAME_DEF_STMT (name); if (!is_gimple_assign (def) || gimple_assign_lhs (def) != name) continue; @@ -702,7 +741,8 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_U if (e != name) rhs = fold_convert (type, rhs); } - tree_to_aff_combination_expand (rhs, comb->type, ¤t, cache); + tree_to_aff_combination_expand (rhs, comb->type, + ¤t, cache, stop, data); exp->expansion = current; exp->in_progress = 0; } @@ -735,14 +775,19 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_U a1 = a0 + a0; a2 = a1 + a1; a3 = a2 + a2; - ... */ + ... + + If callback function STOP is specified, this function stops expanding + when it returns TRUE. DATA points to private structure used by the + callback function. */ void tree_to_aff_combination_expand (tree expr, tree type, aff_tree *comb, - hash_map **cache) + hash_map **cache, + bool (*stop) (void *, void *), void *data) { tree_to_aff_combination (expr, type, comb); - aff_combination_expand (comb, cache); + aff_combination_expand (comb, cache, stop, data); } /* Frees memory occupied by struct name_expansion in *VALUE. Callback for Index: gcc/tree-affine.h =================================================================== --- gcc/tree-affine.h (revision 227163) +++ gcc/tree-affine.h (working copy) @@ -77,9 +77,12 @@ void tree_to_aff_combination (tree, tree, aff_tree tree aff_combination_to_tree (aff_tree *); void unshare_aff_combination (aff_tree *); bool aff_combination_constant_multiple_p (aff_tree *, aff_tree *, widest_int *); -void aff_combination_expand (aff_tree *, hash_map **); +void aff_combination_expand (aff_tree *, hash_map **, + bool (*)(void *, void *) = NULL, void * = NULL); void tree_to_aff_combination_expand (tree, tree, aff_tree *, - hash_map **); + hash_map **, + bool (*)(void *, void *) = NULL, + void * = NULL); tree get_inner_reference_aff (tree, aff_tree *, widest_int *); void free_affine_expand_cache (hash_map **); bool aff_comb_cannot_overlap_p (aff_tree *, const widest_int &, Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 227163) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -147,6 +147,8 @@ struct iv bool biv_p; /* Is it a biv? */ bool have_use_for; /* Do we already have a use for it? */ bool no_overflow; /* True if the iv doesn't overflow. */ + bool have_address_use;/* For biv, indicate if it's used in any address + type use. */ }; /* Per-ssa version information (induction variable descriptions, etc.). */ @@ -251,6 +253,8 @@ struct iv_cand where it is incremented. */ bitmap depends_on; /* The list of invariants that are used in step of the biv. */ + struct iv *orig_iv; /* The original iv if this cand is added from biv with + smaller type. */ }; /* Loop invariant expression hashtable entry. */ @@ -529,6 +533,9 @@ dump_iv (FILE *file, struct iv *iv, bool dump_name if (iv->biv_p) fprintf (file, " is a biv\n"); + + if (iv->no_overflow) + fprintf (file, " iv doesn't overflow wrto loop niter\n"); } /* Dumps information about the USE to FILE. */ @@ -1013,6 +1020,7 @@ alloc_iv (struct ivopts_data *data, tree base, tre iv->use_id = 0; iv->ssa_name = NULL_TREE; iv->no_overflow = no_overflow; + iv->have_address_use = false; return iv; } @@ -1621,6 +1629,106 @@ expr_invariant_in_loop_p (struct loop *loop, tree return true; } +/* Local data for affine combination expanding. */ + +struct expand_data +{ + struct ivopts_data *data; + struct iv *biv; +}; + +/* Callback function to tree_to_aff_combination_expand. + + Return true if biv or loop invariant are encountered, false otherwise. */ + +static bool +stop_expand (void *d1, void *d2) +{ + struct expand_data *exp_data = (struct expand_data *)d1; + tree ssa_name = (tree)d2; + struct ivopts_data *data; + struct iv *iv; + + if (!exp_data || !ssa_name || TREE_CODE (ssa_name) != SSA_NAME) + return false; + + data = exp_data->data; + gcc_assert (data != NULL); + iv = get_iv (data, ssa_name); + if (!iv || integer_zerop (iv->step)) + return true; + + if (iv->biv_p) + { + exp_data->biv = iv; + return true; + } + + return false; +} + +/* Return biv from which the IV is derived. */ + +static struct iv * +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) +{ + aff_tree aff; + struct expand_data exp_data; + + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) + return iv; + + /* Expand IV's ssa_name till the deriving biv is found. */ + exp_data.data = data; + exp_data.biv = NULL; + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), + &aff, &data->name_expansion_cache, + stop_expand, &exp_data); + return exp_data.biv; +} + +/* Record BIV, its predecessor and successor that they are used in + address type uses. */ + +static void +record_biv_for_address_use (struct ivopts_data *data, struct iv *biv) +{ + unsigned i; + tree type, base_1, base_2; + bitmap_iterator bi; + + if (!biv || !biv->biv_p || integer_zerop (biv->step) + || biv->have_address_use || !biv->no_overflow) + return; + + type = TREE_TYPE (biv->base); + if (!INTEGRAL_TYPE_P (type)) + return; + + biv->have_address_use = true; + base_1 = fold_build2 (PLUS_EXPR, type, biv->base, biv->step); + EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, i, bi) + { + struct iv *iv = ver_info (data, i)->iv; + + if (!iv || !iv->biv_p || integer_zerop (iv->step) + || iv->have_address_use || !iv->no_overflow) + continue; + + if (type != TREE_TYPE (iv->base) + || !INTEGRAL_TYPE_P (TREE_TYPE (iv->base))) + continue; + + if (!operand_equal_p (biv->step, iv->step, 0)) + continue; + + base_2 = fold_build2 (PLUS_EXPR, type, iv->base, iv->step); + if (operand_equal_p (base_1, iv->base, 0) + || operand_equal_p (base_2, biv->base, 0)) + iv->have_address_use = true; + } +} + /* Cumulates the steps of indices into DATA and replaces their values with the initial ones. Returns false when the value of the index cannot be determined. Callback for for_each_index. */ @@ -1711,6 +1819,10 @@ idx_find_step (tree base, tree *idx, void *data) step = fold_build2 (MULT_EXPR, sizetype, step, iv_step); dta->step = fold_build2 (PLUS_EXPR, sizetype, dta->step, step); + if (!iv->biv_p) + iv = find_deriving_biv_for_iv (dta->ivopts_data, iv); + + record_biv_for_address_use (dta->ivopts_data, iv); return true; } @@ -2603,7 +2715,8 @@ find_depends (tree *expr_p, int *ws ATTRIBUTE_UNUS static struct iv_cand * add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, enum iv_position pos, - struct iv_use *use, gimple incremented_at) + struct iv_use *use, gimple incremented_at, + struct iv *orig_iv = NULL) { unsigned i; struct iv_cand *cand = NULL; @@ -2685,6 +2798,7 @@ add_candidate_1 (struct ivopts_data *data, else cand->ainc_use = NULL; + cand->orig_iv = orig_iv; if (dump_file && (dump_flags & TDF_DETAILS)) dump_cand (dump_file, cand); } @@ -2793,15 +2907,17 @@ add_autoinc_candidates (struct ivopts_data *data, static void add_candidate (struct ivopts_data *data, - tree base, tree step, bool important, struct iv_use *use) + tree base, tree step, bool important, struct iv_use *use, + struct iv *orig_iv = NULL) { gcc_assert (use == NULL || use->sub_id == 0); if (ip_normal_pos (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL); + add_candidate_1 (data, base, step, important, + IP_NORMAL, use, NULL, orig_iv); if (ip_end_pos (data->current_loop) && allow_ip_end_pos_p (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_END, use, NULL); + add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv); } /* Adds standard iv candidates. */ @@ -2836,8 +2952,24 @@ add_iv_candidate_for_biv (struct ivopts_data *data tree def; struct iv_cand *cand; - add_candidate (data, iv->base, iv->step, true, NULL); + /* Check if this biv is used in address type use. */ + if (iv->no_overflow && iv->have_address_use + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) + { + tree type = unsigned_type_for (sizetype); + tree base = fold_convert (type, iv->base); + tree step = fold_convert (type, iv->step); + /* Add iv cand of same precision as index part in TARGET_MEM_REF. */ + add_candidate (data, base, step, true, NULL, iv); + /* Add iv cand of the original type only if it has nonlinear use. */ + if (iv->have_use_for) + add_candidate (data, iv->base, iv->step, true, NULL); + } + else + add_candidate (data, iv->base, iv->step, true, NULL); + /* The same, but with initial value zero. */ if (POINTER_TYPE_P (TREE_TYPE (iv->base))) add_candidate (data, size_int (0), iv->step, true, NULL); @@ -3358,6 +3490,28 @@ get_computation_aff (struct loop *loop, /* If the conversion is not noop, perform it. */ if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype)) { + if (cand->orig_iv != NULL && CONVERT_EXPR_P (cbase) + && (CONVERT_EXPR_P (cstep) || TREE_CODE (cstep) == INTEGER_CST)) + { + tree inner_base, inner_step, inner_type; + inner_base = TREE_OPERAND (cbase, 0); + if (CONVERT_EXPR_P (cstep)) + inner_step = TREE_OPERAND (cstep, 0); + else + inner_step = cstep; + + inner_type = TREE_TYPE (inner_base); + /* If candidate is added from a biv whose type is smaller than + ctype, we know both candidate and the biv won't overflow. + In this case, it's safe to skip the convertion in candidate. + As an example, (unsigned short)((unsigned long)A) equals to + (unsigned short)A, if A has a type no larger than short. */ + if (TYPE_PRECISION (inner_type) <= TYPE_PRECISION (uutype)) + { + cbase = inner_base; + cstep = inner_step; + } + } cstep = fold_convert (uutype, cstep); cbase = fold_convert (uutype, cbase); var = fold_convert (uutype, var); @@ -4525,6 +4681,13 @@ get_computation_cost_at (struct ivopts_data *data, (ratio, mem_mode, TYPE_ADDR_SPACE (TREE_TYPE (utype)))) { + if (cstepi == 0 && stmt_is_after_inc) + { + if (POINTER_TYPE_P (ctype)) + cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep); + else + cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep); + } cbase = fold_build2 (MULT_EXPR, ctype, cbase, build_int_cst (ctype, ratio)); cost = difference_cost (data,