From patchwork Mon Feb 4 19:06:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 1036205 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-495268-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Sj3QEcFK"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43tcdy36qJz9s4V for ; Tue, 5 Feb 2019 06:07:15 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=BkQv4 wRrVw4rnRlDLewWdjepaTY+aBxGpKxMELAddVzz1LK/dZcqP3T5XB9l6qgE9Fd2z 0GOgvZrxd3GEt+DA/KdsjlLnCALB3Rk3kbfNOKs4PpVzHH5ZLLU+y+GEmasgNG6b 2XHR/9wqgEzhaVGxMamlRbYopQnnx60MlGKHKk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=1qof5cpoAe/ WWvGOMywPUdiGeL0=; b=Sj3QEcFKp/8D1lGfsvKHRaFjHFCoPW3dWelzdmKqjT8 f+WwX8Ej6GfYvOdlYT0d8qpb9E0U9u7Y/063q1Y4RoQ9oh3Hu8cTOhb54XWWX6FJ LinmbCE3Y1WncfvozBi8qELGwiQP9VghC8gZQ8K4lopP5vSsjC8jYWhXxPSRHI/o = Received: (qmail 8208 invoked by alias); 4 Feb 2019 19:07:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8199 invoked by uid 89); 4 Feb 2019 19:07:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=UD:D, differed X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 04 Feb 2019 19:07:05 +0000 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x14J5PSE056408 for ; Mon, 4 Feb 2019 14:07:04 -0500 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by mx0a-001b2d01.pphosted.com with ESMTP id 2qet7q2pvw-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 04 Feb 2019 14:07:03 -0500 Received: from localhost by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 4 Feb 2019 19:07:03 -0000 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 4 Feb 2019 19:06:59 -0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x14J6weC14549122 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 4 Feb 2019 19:06:58 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CE7F1136055; Mon, 4 Feb 2019 19:06:57 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7B265136063; Mon, 4 Feb 2019 19:06:57 +0000 (GMT) Received: from ragesh4.local (unknown [9.211.125.192]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 4 Feb 2019 19:06:57 +0000 (GMT) To: gcc-patches List , Segher Boessenkool , Bill Schmidt , David Edelsohn From: Aaron Sawdey Subject: [PATCH, rs6000] PR target/89112 put branch probabilities on branches generated by inline expansion Date: Mon, 4 Feb 2019 13:06:57 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 x-cbid: 19020419-0036-0000-0000-00000A85606D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010536; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000279; SDB=6.01156356; UDB=6.00603155; IPR=6.00936815; MB=3.00025434; MTD=3.00000008; XFM=3.00000015; UTC=2019-02-04 19:07:01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19020419-0037-0000-0000-00004A99BB19 Message-Id: <21af6222-2dae-a21d-9eb8-8674c048d030@linux.ibm.com> X-IsSubscribed: yes This is the second part of the fix for 89112, fixing the conditions that caused it to happen. This patch adds REG_BR_PROB notes to the branches generated by inline expansion of memcmp and strncmp. This prevents any of the code from being marked as cold and moved to the end of the function, which is what caused the long branches in 89112. With this patch, the test case for 89112 does not have any long branches within the expansion of memcmp, and the code for each memcmp is contiguous. OK for trunk and 8 backport if bootstrap/regtest passes? Thanks! Aaron 2019-02-04 Aaron Sawdey * config/rs6000/rs6000-string.c (do_ifelse, expand_cmp_vec_sequence, expand_compare_loop, expand_block_compare_gpr, expand_strncmp_align_check, expand_strncmp_gpr_sequence): add branch probability. Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 268522) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -35,6 +35,8 @@ #include "expr.h" #include "output.h" #include "target.h" +#include "profile-count.h" +#include "predict.h" /* Expand a block clear operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. @@ -369,6 +371,7 @@ B is the second thing to be compared. CR is the condition code reg input, or NULL_RTX. TRUE_LABEL is the label to branch to if the condition is true. + P is the estimated branch probability for the branch. The return value is the CR used for the comparison. If CR is null_rtx, then a new register of CMPMODE is generated. @@ -377,7 +380,7 @@ static void do_ifelse (machine_mode cmpmode, rtx_code comparison, - rtx a, rtx b, rtx cr, rtx true_label) + rtx a, rtx b, rtx cr, rtx true_label, profile_probability p) { gcc_assert ((a == NULL_RTX && b == NULL_RTX && cr != NULL_RTX) || (a != NULL_RTX && b != NULL_RTX)); @@ -395,7 +398,8 @@ rtx cmp_rtx = gen_rtx_fmt_ee (comparison, VOIDmode, cr, const0_rtx); rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, label_ref, pc_rtx); - rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + rtx_insn *j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + add_reg_br_prob_note (j, p); JUMP_LABEL (j) = true_label; LABEL_NUSES (true_label) += 1; } @@ -781,7 +785,8 @@ rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, dst_label); rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, lab_ref, pc_rtx); - rtx j2 = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + rtx_insn *j2 = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + add_reg_br_prob_note (j2, profile_probability::likely ()); JUMP_LABEL (j2) = dst_label; LABEL_NUSES (dst_label) += 1; @@ -1036,7 +1041,7 @@ /* Difference found is stored here before jump to diff_label. */ rtx diff = gen_reg_rtx (word_mode); - rtx j; + rtx_insn *j; /* Example of generated code for 35 bytes aligned 1 byte. @@ -1120,11 +1125,11 @@ /* Check for > max_bytes bytes. We want to bail out as quickly as possible if we have to go over to memcmp. */ do_ifelse (CCmode, GT, bytes_rtx, GEN_INT (max_bytes), - NULL_RTX, library_call_label); + NULL_RTX, library_call_label, profile_probability::even ()); /* Check for < loop_bytes bytes. */ do_ifelse (CCmode, LT, bytes_rtx, GEN_INT (loop_bytes), - NULL_RTX, cleanup_label); + NULL_RTX, cleanup_label, profile_probability::even ()); /* Loop compare bytes and iterations if bytes>max_bytes. */ rtx mb_reg = gen_reg_rtx (word_mode); @@ -1165,7 +1170,7 @@ { rtx lab_after = gen_label_rtx (); do_ifelse (CCmode, LE, bytes_rtx, GEN_INT (max_bytes), - NULL_RTX, lab_after); + NULL_RTX, lab_after, profile_probability::even ()); emit_move_insn (loop_cmp, mb_reg); emit_move_insn (iter, mi_reg); emit_label (lab_after); @@ -1236,7 +1241,7 @@ } do_ifelse (GET_MODE (dcond), NE, NULL_RTX, NULL_RTX, - dcond, diff_label); + dcond, diff_label, profile_probability::unlikely ()); if (TARGET_P9_MISC) { @@ -1260,6 +1265,7 @@ else j = emit_jump_insn (gen_bdnztf_si (loop_top_label, ctr, ctr, eqrtx, dcond)); + add_reg_br_prob_note (j, profile_probability::likely ()); JUMP_LABEL (j) = loop_top_label; LABEL_NUSES (loop_top_label) += 1; } @@ -1272,9 +1278,11 @@ code. If we exit here with a nonzero diff, it is because the second word differed. */ if (TARGET_P9_MISC) - do_ifelse (CCUNSmode, NE, NULL_RTX, NULL_RTX, dcond, diff_label); + do_ifelse (CCUNSmode, NE, NULL_RTX, NULL_RTX, dcond, + diff_label, profile_probability::unlikely ()); else - do_ifelse (CCmode, NE, diff, const0_rtx, NULL_RTX, diff_label); + do_ifelse (CCmode, NE, diff, const0_rtx, NULL_RTX, + diff_label, profile_probability::unlikely ()); if (library_call_label != NULL && bytes_is_const && bytes > max_bytes) { @@ -1317,7 +1325,7 @@ loop with a branch to cleanup_label. */ emit_move_insn (target, const0_rtx); do_ifelse (CCmode, EQ, cmp_rem, const0_rtx, - NULL_RTX, final_label); + NULL_RTX, final_label, profile_probability::unlikely ()); } rtx final_cleanup = gen_label_rtx (); @@ -1327,9 +1335,12 @@ { /* If remainder length < word length, branch to final cleanup compare. */ + if (!bytes_is_const) - do_ifelse (CCmode, LT, cmp_rem, GEN_INT (load_mode_size), - NULL_RTX, final_cleanup); + { + do_ifelse (CCmode, LT, cmp_rem, GEN_INT (load_mode_size), + NULL_RTX, final_cleanup, profile_probability::even ()); + } /* load and compare 8B */ do_load_for_compare_from_addr (load_mode, d1_1, @@ -1354,7 +1365,7 @@ } do_ifelse (GET_MODE (dcond), NE, NULL_RTX, NULL_RTX, - dcond, diff_label); + dcond, diff_label, profile_probability::even ()); do_add3 (src1_addr, src1_addr, GEN_INT (load_mode_size)); do_add3 (src2_addr, src2_addr, GEN_INT (load_mode_size)); @@ -1363,10 +1374,12 @@ if (bytes_is_const) bytes_remaining -= load_mode_size; else - /* See if remaining length is now zero. We previously set - target to 0 so we can just jump to the end. */ - do_ifelse (CCmode, EQ, cmp_rem, const0_rtx, - NULL_RTX, final_label); + { + /* See if remaining length is now zero. We previously set + target to 0 so we can just jump to the end. */ + do_ifelse (CCmode, EQ, cmp_rem, const0_rtx, NULL_RTX, + final_label, profile_probability::unlikely ()); + } } @@ -1450,7 +1463,7 @@ than one loop iteration, in which case go do the overlap load compare path. */ do_ifelse (CCmode, GT, bytes_rtx, GEN_INT (loop_bytes), - NULL_RTX, nonconst_overlap); + NULL_RTX, nonconst_overlap, profile_probability::even ()); rtx rem4k = gen_reg_rtx (word_mode); rtx dist1 = gen_reg_rtx (word_mode); @@ -1460,12 +1473,14 @@ emit_insn (gen_andsi3 (dist1, src1_addr, GEN_INT (0xfff))); else emit_insn (gen_anddi3 (dist1, src1_addr, GEN_INT (0xfff))); - do_ifelse (CCmode, LE, dist1, rem4k, NULL_RTX, handle4k_label); + do_ifelse (CCmode, LE, dist1, rem4k, NULL_RTX, + handle4k_label, profile_probability::very_unlikely ()); if (word_mode == SImode) emit_insn (gen_andsi3 (dist2, src2_addr, GEN_INT (0xfff))); else emit_insn (gen_anddi3 (dist2, src2_addr, GEN_INT (0xfff))); - do_ifelse (CCmode, LE, dist2, rem4k, NULL_RTX, handle4k_label); + do_ifelse (CCmode, LE, dist2, rem4k, NULL_RTX, + handle4k_label, profile_probability::very_unlikely ()); /* We don't have a 4k boundary to deal with, so do a load/shift/compare and jump to diff. */ @@ -1817,7 +1832,8 @@ rtx ne_rtx = gen_rtx_NE (VOIDmode, cr, const0_rtx); rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, ne_rtx, fin_ref, pc_rtx); - rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + rtx_insn *j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + add_reg_br_prob_note (j, profile_probability::unlikely ()); JUMP_LABEL (j) = final_label; LABEL_NUSES (final_label) += 1; } @@ -2095,7 +2111,8 @@ rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, lab_ref, pc_rtx); - rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + rtx_insn *j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + add_reg_br_prob_note (j, profile_probability::unlikely ()); JUMP_LABEL (j) = strncmp_label; LABEL_NUSES (strncmp_label) += 1; } @@ -2265,7 +2282,8 @@ rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, lab_ref, pc_rtx); - rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + rtx_insn *j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + add_reg_br_prob_note (j, profile_probability::unlikely ()); JUMP_LABEL (j) = final_move_label; LABEL_NUSES (final_move_label) += 1; @@ -2282,7 +2300,8 @@ rtx ifelse0 = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp0eq_rtx, lab_ref, pc_rtx); - rtx j0 = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse0)); + rtx_insn *j0 = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse0)); + add_reg_br_prob_note (j0, profile_probability::unlikely ()); JUMP_LABEL (j0) = final_move_label; LABEL_NUSES (final_move_label) += 1; } @@ -2325,7 +2344,8 @@ rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, lab_ref, pc_rtx); - rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + rtx_insn *j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse)); + add_reg_br_prob_note (j, profile_probability::unlikely ()); JUMP_LABEL (j) = dst_label; LABEL_NUSES (dst_label) += 1; }