From patchwork Tue Jul 20 15:31:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1507627 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=MQnFHNge; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GTjPl0jw3z9sRN for ; Wed, 21 Jul 2021 01:32:09 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 12A3A3985036 for ; Tue, 20 Jul 2021 15:32:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 12A3A3985036 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1626795126; bh=1qhbvpYCZmPe7OPrcuQCZxXE+NycGD9cELUk97BM9/k=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=MQnFHNgegzdAPqp9N1yTQYOY0z6wcjSf+3Lwwxq3oeDYmQ49DhjJBVqfhKxQmIZNt D/IkaH2XBgLX1s1vIwjfXffw8kYfeET5MnRzTcgPDumAaSPBvX8oZhLUBLRxmeu0Mm K0bqWRyOsYgrIOkxvuUyHwE6K7k1DNZHi9iKkhLU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E0F9E3858C39 for ; Tue, 20 Jul 2021 15:31:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E0F9E3858C39 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 79A9E31B for ; Tue, 20 Jul 2021 08:31:19 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 05A953F694 for ; Tue, 20 Jul 2021 08:31:18 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] unroll: Avoid unnecessary tail loops for constant niters Date: Tue, 20 Jul 2021 16:31:17 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" unroll and jam can decide to unroll the outer loop of a nest like: for (int j = 0; j < n; ++j) for (int i = 0; i < n; ++i) x[i] += __builtin_expf (y[j][i]); It then uses a tail loop to handle any left-over iterations. However, the code is structured so that this tail loop is always used. If n is a multiple of the unroll factor UF, the final UF iterations will use the tail loop rather than the unrolled loop. “Fixing” that for variable loop counts would mean introducing another runtime test: a branch around the tail loop if there are no more iterations. There's at least an argument that the overhead of doing that test might not pay for itself. But we use this structure even if the iteration count is provably a multiple of UF at compile time. E.g. with s/n/100/ and an unroll factor of 2, the first 98 iterations use the unrolled loop and the final 2 iterations use the original loop. This patch makes the unroller avoid a tail loop in that case. The end result seemed easier to follow if variables were declared at the point of initialisation, so that it's more obvious which ones are meaningful even when there's no tail loop. Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too. OK to install if testing passes? Richard gcc/ * tree-ssa-loop-manip.c (determine_exit_conditions): Return a null exit condition if no tail loop is needed, and if the original exit condition should therefore be kept as-is. (tree_transform_and_unroll_loop): Handle that case here too. gcc/testsuite/ * gcc.dg/unroll-9.c: New test/ --- gcc/testsuite/gcc.dg/unroll-9.c | 12 ++ gcc/tree-ssa-loop-manip.c | 306 +++++++++++++++++--------------- 2 files changed, 176 insertions(+), 142 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/unroll-9.c diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index 28ae1316fa0..41f9872ca10 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -997,8 +997,10 @@ can_unroll_loop_p (class loop *loop, unsigned factor, /* Determines the conditions that control execution of LOOP unrolled FACTOR times. DESC is number of iterations of LOOP. ENTER_COND is set to condition that must be true if the main loop can be entered. + If the loop does not always iterate an exact multiple of FACTOR times, EXIT_BASE, EXIT_STEP, EXIT_CMP and EXIT_BOUND are set to values describing - how the exit from the unrolled loop should be controlled. */ + how the exit from the unrolled loop should be controlled. Otherwise, + the trees are set to null and EXIT_CMP is set to ERROR_MARK. */ static void determine_exit_conditions (class loop *loop, class tree_niter_desc *desc, @@ -1079,6 +1081,16 @@ determine_exit_conditions (class loop *loop, class tree_niter_desc *desc, assum = fold_build2 (cmp, boolean_type_node, base, bound); cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, assum, cond); + if (integer_nonzerop (cond) + && integer_zerop (desc->may_be_zero)) + { + /* Convert the latch count to an iteration count. */ + tree niter = fold_build2 (PLUS_EXPR, type, desc->niter, + build_one_cst (type)); + if (multiple_of_p (type, niter, bigstep)) + return; + } + cond = force_gimple_operand (unshare_expr (cond), &stmts, false, NULL_TREE); if (stmts) gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts); @@ -1234,137 +1246,138 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, transform_callback transform, void *data) { - gcond *exit_if; - tree ctr_before, ctr_after; - tree enter_main_cond, exit_base, exit_step, exit_bound; - enum tree_code exit_cmp; - gphi *phi_old_loop, *phi_new_loop, *phi_rest; - gphi_iterator psi_old_loop, psi_new_loop; - tree init, next, new_init; - class loop *new_loop; - basic_block rest, exit_bb; - edge old_entry, new_entry, old_latch, precond_edge, new_exit; - edge new_nonexit, e; - gimple_stmt_iterator bsi; - use_operand_p op; - bool ok; - unsigned i; - profile_probability prob, prob_entry, scale_unrolled; - profile_count freq_e, freq_h; gcov_type new_est_niter = niter_for_unrolled_loop (loop, factor); unsigned irr = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP; - auto_vec to_remove; + enum tree_code exit_cmp; + tree enter_main_cond, exit_base, exit_step, exit_bound; determine_exit_conditions (loop, desc, factor, &enter_main_cond, &exit_base, &exit_step, &exit_cmp, &exit_bound); + bool single_loop_p = !exit_base; /* Let us assume that the unrolled loop is quite likely to be entered. */ + profile_probability prob_entry; if (integer_nonzerop (enter_main_cond)) prob_entry = profile_probability::always (); else prob_entry = profile_probability::guessed_always () .apply_scale (PROB_UNROLLED_LOOP_ENTERED, 100); - /* The values for scales should keep profile consistent, and somewhat close - to correct. - - TODO: The current value of SCALE_REST makes it appear that the loop that - is created by splitting the remaining iterations of the unrolled loop is - executed the same number of times as the original loop, and with the same - frequencies, which is obviously wrong. This does not appear to cause - problems, so we do not bother with fixing it for now. To make the profile - correct, we would need to change the probability of the exit edge of the - loop, and recompute the distribution of frequencies in its body because - of this change (scale the frequencies of blocks before and after the exit - by appropriate factors). */ - scale_unrolled = prob_entry; - - new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry, - prob_entry.invert (), scale_unrolled, - profile_probability::guessed_always (), - true); - gcc_assert (new_loop != NULL); - update_ssa (TODO_update_ssa); - - /* Prepare the cfg and update the phi nodes. Move the loop exit to the - loop latch (and make its condition dummy, for the moment). */ - rest = loop_preheader_edge (new_loop)->src; - precond_edge = single_pred_edge (rest); - split_edge (loop_latch_edge (loop)); - exit_bb = single_pred (loop->latch); - - /* Since the exit edge will be removed, the frequency of all the blocks - in the loop that are dominated by it must be scaled by - 1 / (1 - exit->probability). */ - if (exit->probability.initialized_p ()) - scale_dominated_blocks_in_loop (loop, exit->src, - /* We are scaling up here so probability - does not fit. */ - loop->header->count, - loop->header->count - - loop->header->count.apply_probability - (exit->probability)); - - bsi = gsi_last_bb (exit_bb); - exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node, - integer_zero_node, - NULL_TREE, NULL_TREE); - - gsi_insert_after (&bsi, exit_if, GSI_NEW_STMT); - new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE | irr); - rescan_loop_exit (new_exit, true, false); - - /* Set the probability of new exit to the same of the old one. Fix - the frequency of the latch block, by scaling it back by - 1 - exit->probability. */ - new_exit->probability = exit->probability; - new_nonexit = single_pred_edge (loop->latch); - new_nonexit->probability = exit->probability.invert (); - new_nonexit->flags = EDGE_TRUE_VALUE; - if (new_nonexit->probability.initialized_p ()) - scale_bbs_frequencies (&loop->latch, 1, new_nonexit->probability); - - old_entry = loop_preheader_edge (loop); - new_entry = loop_preheader_edge (new_loop); - old_latch = loop_latch_edge (loop); - for (psi_old_loop = gsi_start_phis (loop->header), - psi_new_loop = gsi_start_phis (new_loop->header); - !gsi_end_p (psi_old_loop); - gsi_next (&psi_old_loop), gsi_next (&psi_new_loop)) + gcond *exit_if = nullptr; + class loop *new_loop = nullptr; + basic_block rest; + edge new_exit; + if (!single_loop_p) { - phi_old_loop = psi_old_loop.phi (); - phi_new_loop = psi_new_loop.phi (); - - init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry); - op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry); - gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op))); - next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch); - - /* Prefer using original variable as a base for the new ssa name. - This is necessary for virtual ops, and useful in order to avoid - losing debug info for real ops. */ - if (TREE_CODE (next) == SSA_NAME - && useless_type_conversion_p (TREE_TYPE (next), - TREE_TYPE (init))) - new_init = copy_ssa_name (next); - else if (TREE_CODE (init) == SSA_NAME - && useless_type_conversion_p (TREE_TYPE (init), - TREE_TYPE (next))) - new_init = copy_ssa_name (init); - else if (useless_type_conversion_p (TREE_TYPE (next), TREE_TYPE (init))) - new_init = make_temp_ssa_name (TREE_TYPE (next), NULL, "unrinittmp"); - else - new_init = make_temp_ssa_name (TREE_TYPE (init), NULL, "unrinittmp"); + /* The values for scales should keep profile consistent, and somewhat + close to correct. + + TODO: The current value of SCALE_REST makes it appear that the loop + that is created by splitting the remaining iterations of the unrolled + loop is executed the same number of times as the original loop, and + with the same frequencies, which is obviously wrong. This does not + appear to cause problems, so we do not bother with fixing it for now. + To make the profile correct, we would need to change the probability + of the exit edge of the loop, and recompute the distribution of + frequencies in its body because of this change (scale the frequencies + of blocks before and after the exit by appropriate factors). */ + profile_probability scale_unrolled = prob_entry; + new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry, + prob_entry.invert (), scale_unrolled, + profile_probability::guessed_always (), + true); + gcc_assert (new_loop != NULL); + update_ssa (TODO_update_ssa); + + /* Prepare the cfg and update the phi nodes. Move the loop exit to the + loop latch (and make its condition dummy, for the moment). */ + rest = loop_preheader_edge (new_loop)->src; + edge precond_edge = single_pred_edge (rest); + split_edge (loop_latch_edge (loop)); + basic_block exit_bb = single_pred (loop->latch); + + /* Since the exit edge will be removed, the frequency of all the blocks + in the loop that are dominated by it must be scaled by + 1 / (1 - exit->probability). */ + if (exit->probability.initialized_p ()) + scale_dominated_blocks_in_loop (loop, exit->src, + /* We are scaling up here so + probability does not fit. */ + loop->header->count, + loop->header->count + - loop->header->count.apply_probability + (exit->probability)); + + gimple_stmt_iterator bsi = gsi_last_bb (exit_bb); + exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node, + integer_zero_node, + NULL_TREE, NULL_TREE); + + gsi_insert_after (&bsi, exit_if, GSI_NEW_STMT); + new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE | irr); + rescan_loop_exit (new_exit, true, false); + + /* Set the probability of new exit to the same of the old one. Fix + the frequency of the latch block, by scaling it back by + 1 - exit->probability. */ + new_exit->probability = exit->probability; + edge new_nonexit = single_pred_edge (loop->latch); + new_nonexit->probability = exit->probability.invert (); + new_nonexit->flags = EDGE_TRUE_VALUE; + if (new_nonexit->probability.initialized_p ()) + scale_bbs_frequencies (&loop->latch, 1, new_nonexit->probability); + + edge old_entry = loop_preheader_edge (loop); + edge new_entry = loop_preheader_edge (new_loop); + edge old_latch = loop_latch_edge (loop); + for (gphi_iterator psi_old_loop = gsi_start_phis (loop->header), + psi_new_loop = gsi_start_phis (new_loop->header); + !gsi_end_p (psi_old_loop); + gsi_next (&psi_old_loop), gsi_next (&psi_new_loop)) + { + gphi *phi_old_loop = psi_old_loop.phi (); + gphi *phi_new_loop = psi_new_loop.phi (); + + tree init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry); + use_operand_p op + = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry); + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op))); + tree next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch); + + /* Prefer using original variable as a base for the new ssa name. + This is necessary for virtual ops, and useful in order to avoid + losing debug info for real ops. */ + tree new_init; + if (TREE_CODE (next) == SSA_NAME + && useless_type_conversion_p (TREE_TYPE (next), + TREE_TYPE (init))) + new_init = copy_ssa_name (next); + else if (TREE_CODE (init) == SSA_NAME + && useless_type_conversion_p (TREE_TYPE (init), + TREE_TYPE (next))) + new_init = copy_ssa_name (init); + else if (useless_type_conversion_p (TREE_TYPE (next), + TREE_TYPE (init))) + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL, + "unrinittmp"); + else + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL, + "unrinittmp"); - phi_rest = create_phi_node (new_init, rest); + gphi *phi_rest = create_phi_node (new_init, rest); + add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION); + add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION); + SET_USE (op, new_init); + } - add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION); - add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION); - SET_USE (op, new_init); + remove_path (exit); + } + else + { + new_exit = exit; + rest = exit->dest; } - - remove_path (exit); /* Transform the loop. */ if (transform) @@ -1376,57 +1389,66 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, bitmap_ones (wont_exit); bitmap_clear_bit (wont_exit, factor - 1); - ok = gimple_duplicate_loop_to_header_edge + auto_vec to_remove; + bool ok = gimple_duplicate_loop_to_header_edge (loop, loop_latch_edge (loop), factor - 1, wont_exit, new_exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ); gcc_assert (ok); - FOR_EACH_VEC_ELT (to_remove, i, e) + for (edge e : to_remove) { ok = remove_path (e); gcc_assert (ok); } update_ssa (TODO_update_ssa); - /* Ensure that the frequencies in the loop match the new estimated - number of iterations, and change the probability of the new - exit edge. */ - - freq_h = loop->header->count; - freq_e = (loop_preheader_edge (loop))->count (); - if (freq_h.nonzero_p ()) + if (!single_loop_p) { - /* Avoid dropping loop body profile counter to 0 because of zero count - in loop's preheader. */ - if (freq_h.nonzero_p () && !(freq_e == profile_count::zero ())) - freq_e = freq_e.force_nonzero (); - scale_loop_frequencies (loop, freq_e.probability_in (freq_h)); + /* Ensure that the frequencies in the loop match the new estimated + number of iterations, and change the probability of the new + exit edge. */ + + profile_count freq_h = loop->header->count; + profile_count freq_e = (loop_preheader_edge (loop))->count (); + if (freq_h.nonzero_p ()) + { + /* Avoid dropping loop body profile counter to 0 because of zero + count in loop's preheader. */ + if (freq_h.nonzero_p () && !(freq_e == profile_count::zero ())) + freq_e = freq_e.force_nonzero (); + scale_loop_frequencies (loop, freq_e.probability_in (freq_h)); + } } - exit_bb = single_pred (loop->latch); + basic_block exit_bb = single_pred (loop->latch); new_exit = find_edge (exit_bb, rest); new_exit->probability = profile_probability::always () .apply_scale (1, new_est_niter + 1); - rest->count += new_exit->count (); + if (!single_loop_p) + rest->count += new_exit->count (); - new_nonexit = single_pred_edge (loop->latch); - prob = new_nonexit->probability; + edge new_nonexit = single_pred_edge (loop->latch); + profile_probability prob = new_nonexit->probability; new_nonexit->probability = new_exit->probability.invert (); prob = new_nonexit->probability / prob; if (prob.initialized_p ()) scale_bbs_frequencies (&loop->latch, 1, prob); - /* Finally create the new counter for number of iterations and add the new - exit instruction. */ - bsi = gsi_last_nondebug_bb (exit_bb); - exit_if = as_a (gsi_stmt (bsi)); - create_iv (exit_base, exit_step, NULL_TREE, loop, - &bsi, false, &ctr_before, &ctr_after); - gimple_cond_set_code (exit_if, exit_cmp); - gimple_cond_set_lhs (exit_if, ctr_after); - gimple_cond_set_rhs (exit_if, exit_bound); - update_stmt (exit_if); + if (!single_loop_p) + { + /* Finally create the new counter for number of iterations and add + the new exit instruction. */ + tree ctr_before, ctr_after; + gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb); + exit_if = as_a (gsi_stmt (bsi)); + create_iv (exit_base, exit_step, NULL_TREE, loop, + &bsi, false, &ctr_before, &ctr_after); + gimple_cond_set_code (exit_if, exit_cmp); + gimple_cond_set_lhs (exit_if, ctr_after); + gimple_cond_set_rhs (exit_if, exit_bound); + update_stmt (exit_if); + } checking_verify_flow_info (); checking_verify_loop_structure (); diff --git a/gcc/testsuite/gcc.dg/unroll-9.c b/gcc/testsuite/gcc.dg/unroll-9.c new file mode 100644 index 00000000000..2d65ec3691d --- /dev/null +++ b/gcc/testsuite/gcc.dg/unroll-9.c @@ -0,0 +1,12 @@ +/* { dg-options "-O3 -fdump-tree-unrolljam -fno-math-errno" } */ + +void +f (float *restrict x, float y[100][100]) +{ + for (int j = 0; j < 100; ++j) + for (int i = 0; i < 100; ++i) + x[i] += __builtin_expf (y[j][i]); +} + +/* The loop should be unrolled 2 times, without a tail loop. */ +/* { dg-final { scan-tree-dump-times "__builtin_expf" 2 "unrolljam" } } */