From patchwork Thu Nov 2 20:50:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajit Agarwal X-Patchwork-Id: 1858704 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=TkSgE1VU; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SLwy32mZ6z1yQx for ; Fri, 3 Nov 2023 07:50:51 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A3F23858C41 for ; Thu, 2 Nov 2023 20:50:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 7CE9C3858D37 for ; Thu, 2 Nov 2023 20:50:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7CE9C3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7CE9C3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698958238; cv=none; b=uKx1uGUS0PFWZeF4LchumSz5v1DhtLHnOtYL9GdP9go+/T4Q2ePvONq88CfpYKaoHIdVo5NpXl0HyVl9oW06L+AGD1JYNjZbAsYRAk1M4qHHUBYUrdbPRcCVK3l+2hP7gGOqekyj9pkbyBXbkCUG3k6KGGXwwVheTcOFkHJ7IW8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698958238; c=relaxed/simple; bh=7xbrHqE0Dp/gukceHf6EAYIp36SZUOMiM1hncbTkz9o=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=fGdcvIDj+pLpoq65ZaKlhGKxiYFfGgbcc7CwQA5ClYKPIWnxzCjeJWrHoj1K2nSZT3M4SAy/qwW3MzVIJ+Mfj38Hf8dMT7/1ef6g3w+oKG3yT3tmmYiuOXDHViqwg0T2h4hmxsGLLI4oJ5SHE63gUS3j5zX573o71ftllIilV9Q= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A2Kihxn006960; Thu, 2 Nov 2023 20:50:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=QypyT/uJ50ilxO0AgFifu8GfXEuoBAXcIZzoTu3i08w=; b=TkSgE1VUUUkdZuSu9OfjaguUyv7HLiUBJN/L20d9YmvV6b/zxiTJt6zauKfgOu6Q6uJy S5SxlXJBBTSCe85tuMdo2M0kxcimNLzcOQYLbV5mxa+zkujsCE6olT1wkkwWJmLjww3y IZoBWC99ztxOuiZGvzO3q2l1Sz6+rOvCkvkxBTdNvJZ0hefAIaUa98ezhxMg5R4d3mM4 Q+iyzvBcFdobLc9oMPVOA3XjD0PFFQDsqDWWluD3jQfiZZ3wyHwwshmjZhUR4ree3Hho 7oHwINxHTYx/c9ekHVww/MF9CWeK4ZrM7ib5/iKDfF+rIXy3kSAHtSAX4nQPbnWk9+6y VA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u4jk38r9c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Nov 2023 20:50:27 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3A2Kigk9006937; Thu, 2 Nov 2023 20:50:26 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u4jk38r95-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Nov 2023 20:50:26 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A2JThnQ011291; Thu, 2 Nov 2023 20:50:26 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u1eukh0ek-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Nov 2023 20:50:26 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A2KoObA55116180 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 2 Nov 2023 20:50:25 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D0DF558055; Thu, 2 Nov 2023 20:50:24 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BAA1A58059; Thu, 2 Nov 2023 20:50:22 +0000 (GMT) Received: from [9.43.97.62] (unknown [9.43.97.62]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP; Thu, 2 Nov 2023 20:50:22 +0000 (GMT) Message-ID: Date: Fri, 3 Nov 2023 02:20:21 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Jeff Law , Richard Biener , Segher Boessenkool , Peter Bergner , "Kewen.Lin" From: Ajit Agarwal Subject: [PATCH] tree-optimization: Add register pressure heuristics X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 1VO9S9y_HxPL4kNftxqn38_6e1ld47_8 X-Proofpoint-ORIG-GUID: voQhy6W7__B3N2WFBOXmsWE9UwNfNNgT X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-02_10,2023-11-02_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 bulkscore=0 phishscore=0 lowpriorityscore=0 suspectscore=0 spamscore=0 mlxscore=0 malwarescore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2311020168 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hello All: Currently code sinking heuristics are based on profile data like basic block count and sink frequency threshold. We have removed such heuristics and added register pressure heuristics based on live-in and live-out of early blocks and immediate dominator of use blocks of the same loop nesting depth. Such heuristics reduces register pressure when code sinking is done with same loop nesting depth. High register pressure region is the region where there are live-in of early blocks that has been modified by the early block. If there are modification of the variables in best block that are live-in in early block that are live-out of best block. Bootstrapped and regtested on powerpc64-linux-gnu. Thanks & Regards Ajit tree-optimization: Add register pressure heuristics Currently code sinking heuristics are based on profile data like basic block count and sink frequency threshold. We have removed such heuristics to add register pressure heuristics based on live-in and live-out of early blocks and immediate dominator of use blocks. High register pressure region is the region where there are live-in of early blocks that has been modified by the early block. If there are modification of the variables in best block that are live-in in early block that are live-out of best block. 2023-11-03 Ajit Kumar Agarwal gcc/ChangeLog: * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p as paramters. (sink_code_in_bb): Ditto. (select_best_block): Add register pressure heuristics to select the best blocks in the immediate dominator for same loop nest depth. (execute): Add live range analysis. (additional_var_map): New function. * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand tests on ssa_names. (verify_live_on_entry): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-sink-21.c: New test. * gcc.dg/tree-ssa/ssa-sink-22.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++ gcc/tree-ssa-live.cc | 11 ++- gcc/tree-ssa-sink.cc | 93 ++++++++++++++------- 4 files changed, 104 insertions(+), 34 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c new file mode 100644 index 00000000000..d3b79ca5803 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ +void bar(); +int j; +void foo(int a, int b, int c, int d, int e, int f) +{ + int l; + l = a + b + c + d +e + f; + if (a != 5) + { + bar(); + j = l; + } +} +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c new file mode 100644 index 00000000000..84e7938c54f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ +void bar(); +int j, x; +void foo(int a, int b, int c, int d, int e, int f) +{ + int l; + l = a + b + c + d +e + f; + if (a != 5) + { + bar(); + if (b != 3) + x = 3; + else + x = 5; + j = l; + } +} +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */ diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc index f06daf23035..998fe588278 100644 --- a/gcc/tree-ssa-live.cc +++ b/gcc/tree-ssa-live.cc @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live) def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); /* An undefined local variable does not need to be very alive. */ - if (ssa_undefined_value_p (ssa_name, false)) + if (virtual_operand_p (ssa_name) + || ssa_undefined_value_p (ssa_name, false)) return; /* Visit each use of SSA_NAME and if it isn't in the same block as the def, @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr) /* Verify that the info in LIVE matches the current cfg. */ - static void verify_live_on_entry (tree_live_info_p live) { @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live) tree d = NULL_TREE; bitmap loe; var = partition_to_var (map, i); + if (var == NULL_TREE) + continue; stmt = SSA_NAME_DEF_STMT (var); tmp = gimple_bb (stmt); + if (SSA_NAME_VAR (var)) d = ssa_default_def (cfun, SSA_NAME_VAR (var)); - loe = live_on_entry (live, e->dest); if (loe && bitmap_bit_p (loe, i)) { @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live) { /* An undefined local variable does not need to be very alive. */ - if (ssa_undefined_value_p (var, false)) + if (virtual_operand_p (var) + || ssa_undefined_value_p (var, false)) continue; /* The only way this var shouldn't be marked live on entry is diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc index a360c5cdd6e..d0c9ef1ab86 100644 --- a/gcc/tree-ssa-sink.cc +++ b/gcc/tree-ssa-sink.cc @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts) tree, return the best basic block between them (inclusive) to place statements. + The best basic block should be an immediate dominator of + best basic block if we've moved to same loop nest. + We want the most control dependent block in the shallowest loop nest. If the resulting block is in a shallower loop nest, then use it. Else @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts) static basic_block select_best_block (basic_block early_bb, basic_block late_bb, - gimple *stmt) + gimple *stmt, + tree_live_info_p &live) { basic_block best_bb = late_bb; basic_block temp_bb = late_bb; - int threshold; while (temp_bb != early_bb) { /* If we've moved into a lower loop nest, then that becomes our best block. */ - if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) + if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb)) best_bb = temp_bb; /* Walk up the dominator tree, hopefully we'll find a shallower loop nest. */ temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); } - /* Placing a statement before a setjmp-like function would be invalid (it cannot be reevaluated when execution follows an abnormal edge). If we selected a block with abnormal predecessors, just punt. */ @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb, && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb)) return early_bb; - /* Get the sinking threshold. If the statement to be moved has memory - operands, then increase the threshold by 7% as those are even more - profitable to avoid, clamping at 100%. */ - threshold = param_sink_frequency_threshold; - if (gimple_vuse (stmt) || gimple_vdef (stmt)) - { - threshold += 7; - if (threshold > 100) - threshold = 100; - } + int best_bb_liveout_cnt + = bitmap_count_bits (&live->liveout[best_bb->index]); + int early_bb_liveout_cnt + = bitmap_count_bits (&live->liveout[early_bb->index]); + int early_livein_cnt + = bitmap_count_bits (&live->livein[early_bb->index]); + + /* High register pressure region is the region where there are live-in of + early blocks that has been modified by the early block. If there are + modification of the variables in best block that are live-in in early + block that are live-out of best block. */ + bool live_in_rgn = (early_livein_cnt != 0 + && early_bb_liveout_cnt <= early_livein_cnt); + + bool high_reg_pressure_rgn + = ((best_bb_liveout_cnt != 0 || live_in_rgn) + && best_bb_liveout_cnt <= early_bb_liveout_cnt); /* If BEST_BB is at the same nesting level, then require it to have - significantly lower execution frequency to avoid gratuitous movement. */ + significantly high register pressure region to avoid gratuitous + movement. */ if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb) - /* If result of comparsion is unknown, prefer EARLY_BB. - Thus use !(...>=..) rather than (...<...) */ - && !(best_bb->count * 100 >= early_bb->count * threshold)) - return best_bb; + && high_reg_pressure_rgn) + { + /* Avoid sinking to immediate dominator if the statement to be moved + has memory operand and same loop nest. */ + if (best_bb != late_bb && gimple_vuse (stmt)) + return late_bb; + return best_bb; + } /* No better block found, so return EARLY_BB, which happens to be the statement's original block. */ @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb, static bool statement_sink_location (gimple *stmt, basic_block frombb, gimple_stmt_iterator *togsi, bool *zero_uses_p, - virtual_operand_live &vop_live) + virtual_operand_live &vop_live, + tree_live_info_p &live) { gimple *use; use_operand_p one_use = NULL_USE_OPERAND_P; @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb, if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb)) return false; - commondom = select_best_block (frombb, commondom, stmt); + commondom = select_best_block (frombb, commondom, stmt, live); if (commondom == frombb) return false; @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb, continue; break; } + use = USE_STMT (one_use); if (gimple_code (use) != GIMPLE_PHI) { - sinkbb = select_best_block (frombb, gimple_bb (use), stmt); + sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live); if (sinkbb == frombb) return false; - if (sinkbb == gimple_bb (use)) - *togsi = gsi_for_stmt (use); - else - *togsi = gsi_after_labels (sinkbb); + *togsi = gsi_after_labels (sinkbb); return true; } @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb, if (!sinkbb) return false; - sinkbb = select_best_block (frombb, sinkbb, stmt); + sinkbb = select_best_block (frombb, sinkbb, stmt, live); if (!sinkbb || sinkbb == frombb) return false; @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb) /* Perform code sinking on BB */ static unsigned -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live) +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live, + tree_live_info_p &live) { gimple_stmt_iterator gsi; edge_iterator ei; @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live) gimple_stmt_iterator togsi; bool zero_uses_p; - if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live)) + if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, + vop_live, live)) { gimple_stmt_iterator saved = gsi; if (!gsi_end_p (gsi)) @@ -814,6 +829,15 @@ private: bool unsplit_edges; }; // class pass_sink_code +void additional_var_map (var_map &map) +{ + map->bmp_bbs = BITMAP_ALLOC (NULL); + map->outofssa_p = false; + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) + bitmap_set_bit (map->bmp_bbs, bb->index); +} + unsigned int pass_sink_code::execute (function *fun) { @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun) calculate_dominance_info (CDI_DOMINATORS); virtual_operand_live vop_live; + var_map map = init_var_map (num_ssa_names); + additional_var_map (map); + tree_live_info_p live = calculate_live_ranges (map, true); int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); int n = inverted_rev_post_order_compute (fun, rpo); + for (int i = 0; i < n; ++i) - todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live); + todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live); free (rpo); + if (live) + delete_tree_live_info (live); + + if (map) + delete_var_map (map); statistics_counter_event (fun, "Sunk statements", sink_stats.sunk); statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);