From patchwork Sun Jun 4 21:41:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1790136 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=NXSfs/Rr; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QZ9DJ2Ssxz20Q8 for ; Mon, 5 Jun 2023 07:41:34 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EE59E3857BB3 for ; Sun, 4 Jun 2023 21:41:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EE59E3857BB3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685914892; bh=nMzJDRtv1+gpDzu1nYDh1CP2w9tsnumgsHGWzEHnWEs=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=NXSfs/RrMHMKl0gQWQnGymWr1t/O7YT1Y7AkHtNoWf+wILfhg+zucvcUJv20P6sVw cf+6ga2Bux6dFNbhHHaxMsW4hnwBGV9ZITIK90XnJ5xdeEgnUxH8BoDlXHKdxJN2Rn VzTwPu2iQh5GIrki5+fXBqF72jj/GGV/W6C3ftWc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by sourceware.org (Postfix) with ESMTPS id 05AD338582BC for ; Sun, 4 Jun 2023 21:41:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 05AD338582BC Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-39aa8a055e0so919733b6e.0 for ; Sun, 04 Jun 2023 14:41:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685914869; x=1688506869; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nMzJDRtv1+gpDzu1nYDh1CP2w9tsnumgsHGWzEHnWEs=; b=GsyZadc+Z43VGpHL8lJqjBFEc67RQE9nxq5ITb5auQnHpHHYhjkv3nRpYbszLhC6a/ LuydP0IBP5mbuW9CIf8qxkIfeqYFRSqqQHXsigjPMIx4EVrVVGB+WGKfJzTylakIPpja dzsToEGxTPLDkSjeOqxYGHNddvKelOthV8pT4C0GTe3u2H7xzOmKUQbRuyC/RwPCAVnn jud7MP1YT+rOkCH/ETMeS3sWaFeIEH3t03g7AuDcSBHp1/WKJCI9Y2dxuD8YFRkCvBRb KhZ3fVfm9IYS7il67xH3dHGM03/7MwZdc4iodjrGA4Zm73cOMipI4I9B986gQTlfLjp8 YEZw== X-Gm-Message-State: AC+VfDxs6QyqZGoRLmbTOPszDSOxGOnLxjQs7qdPyTIN7fpGzrLeDCEr kLNWBU2D4xkiXb/k9uCM30VpxQER+lE= X-Google-Smtp-Source: ACHHUZ7zKKTrR8TBGARt7mNobw5bUOCABu1LuDG2JVuE+s8+IX02trMDMdOXyPslX292xIO/l3HmKQ== X-Received: by 2002:a05:6358:c1d:b0:127:96d6:be18 with SMTP id f29-20020a0563580c1d00b0012796d6be18mr24569820rwj.21.1685914868343; Sun, 04 Jun 2023 14:41:08 -0700 (PDT) Received: from ?IPV6:2601:681:8d00:265::f0a? ([2601:681:8d00:265::f0a]) by smtp.gmail.com with ESMTPSA id m9-20020a17090a5a4900b0024e37e0a67dsm4515858pji.20.2023.06.04.14.41.07 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 04 Jun 2023 14:41:07 -0700 (PDT) Message-ID: Date: Sun, 4 Jun 2023 15:41:06 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Content-Language: en-US To: "gcc-patches@gcc.gnu.org" Subject: [RFA] Improve strcmp expansion when one input is a constant string. X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jeff Law via Gcc-patches From: Jeff Law Reply-To: Jeff Law Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" While investigating a RISC-V backend patch from Jivan I noticed a regression in terms of dynamic instruction counts for the omnetpp benchmark in spec2017. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620577.html The code we we with Jivan's patch at expansion time looks like this for each character in the input string: (insn 6 5 7 (set (reg:SI 137) (zero_extend:SI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 -1 (nil)) (insn 7 6 8 (set (reg:DI 138) (sign_extend:DI (plus:SI (reg:SI 137) (const_int -108 [0xffffffffffffff94])))) "j.c":5:11 -1 (nil)) (insn 8 7 9 (set (reg:SI 136) (subreg/s/u:SI (reg:DI 138) 0)) "j.c":5:11 -1 (expr_list:REG_EQUAL (plus:SI (reg:SI 137) (const_int -108 [0xffffffffffffff94])) (nil))) (insn 9 8 10 (set (reg:DI 139) (sign_extend:DI (reg:SI 136))) "j.c":5:11 -1 (nil)) (jump_insn 10 9 11 (set (pc) (if_then_else (ne (reg:DI 139) (const_int 0 [0])) (label_ref 64) (pc))) "j.c":5:11 -1 (nil)) Ignore insn 9. fwprop will turn it into a trivial copy from r138->r139 which will ultimately propagate away. All the paths eventually transfer to control to the label in question, either by jumping or falling thru on the last character. After a bit of cleanup by fwprop & friends we have: > (insn 6 3 7 2 (set (reg:SI 137 [ MEM [(void *)x_2(D)] ]) > (zero_extend:SI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 114 {zero_extendqisi2} > (nil)) > (insn 7 6 8 2 (set (reg:DI 138) > (sign_extend:DI (plus:SI (reg:SI 137 [ MEM [(void *)x_2(D)] ]) > (const_int -108 [0xffffffffffffff94])))) "j.c":5:11 6 {addsi3_extended} > (expr_list:REG_DEAD (reg:SI 137 [ MEM [(void *)x_2(D)] ]) > (nil))) > (insn 8 7 10 2 (set (reg:SI 136 [ MEM [(void *)x_2(D)]+11 ]) > (subreg/s/u:SI (reg:DI 138) 0)) "j.c":5:11 180 {*movsi_internal} > (nil)) > (jump_insn 10 8 73 2 (set (pc) > (if_then_else (ne (reg:DI 138) > (const_int 0 [0])) > (label_ref 64) > (pc))) "j.c":5:11 243 {*branchdi} > (expr_list:REG_DEAD (reg:DI 138) > (int_list:REG_BR_PROB 536870916 (nil))) > -> 64) insn 8 is the result of wanting the ultimate result of the strcmp to be an "int" type (SImode). Note that (reg 136) is the result of the strcmp. It gets set in each fragment of code that compares one element in the string. It's also live after the strcmp sequence. As a result combine isn't going to be able to clean this up. Note how (reg 136) births while (reg 138) is live and even though (reg 136) is a copy of (reg 138), IRA doesn't have the necessary code to determine that the regs do not conflict. As a result (reg 136) and (reg 138) must be allocated different hard registers and we get code like this: > lbu a5,0(a0) # 6 [c=28 l=4] zero_extendqisi2/1 > addiw a5,a5,-108 # 7 [c=8 l=4] addsi3_extended/1 > mv a4,a5 # 8 [c=4 l=4] *movsi_internal/0 > bne a5,zero,.L2 # 10 [c=4 l=4] *branchdi Note the annoying "mv". Rather than do a conversion for each character, we could do each step in word_mode and do the conversion once at the end of the whole sequence. So for each character we expand to: > (insn 6 5 7 (set (reg:DI 138) > (zero_extend:DI (mem:QI (reg/v/f:DI 135 [ x ]) [0 MEM [(void *)x_2(D)]+0 S1 A8]))) "j.c":5:11 -1 > (nil)) > > (insn 7 6 8 (set (reg:DI 137) > (plus:DI (reg:DI 138) > (const_int -108 [0xffffffffffffff94]))) "j.c":5:11 -1 > (nil)) > > (jump_insn 8 7 9 (set (pc) > (if_then_else (ne (reg:DI 137) > (const_int 0 [0])) > (label_ref 41) > (pc))) "j.c":5:11 -1 > (nil)) Good. Then at the end of the sequence we have: > (code_label 41 40 42 2 (nil) [0 uses]) > > (insn 42 41 43 (set (reg:SI 136) > (subreg:SI (reg:DI 137) 0)) "j.c":5:11 -1 > (nil)) Which seems like exactly what we want. At the assembly level we get: lbu a5,0(a0) # 6 [c=28 l=4] zero_extendqidi2/1 addi a0,a5,-108 # 7 [c=4 l=4] adddi3/1 bne a0,zero,.L2 # 8 [c=4 l=4] *branchdi [ ... ] At the end of the sequence we realize the narrowing subreg followed by an extnesion isn't necessary and just remove them. The ultimate result is omnetpp goes from a small regression to a small overall improvement with Jivan's patch. Bootstrapped and regression tested on x86. Also built and run spec2017 on riscv64. OK for the trunk? Jeff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 8400adaf5b4..f2e0d3b7d7f 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -7135,6 +7135,9 @@ inline_string_cmp (rtx target, tree var_str, const char *const_str, scalar_int_mode unit_mode = as_a TYPE_MODE (unit_type_node); + /* We do the intermediate steps in WORD_MODE, then convert + to mode at the end of the sequence. */ + rtx intermediate_result = gen_reg_rtx (word_mode); start_sequence (); for (unsigned HOST_WIDE_INT i = 0; i < length; i++) @@ -7145,25 +7148,27 @@ inline_string_cmp (rtx target, tree var_str, const char *const_str, rtx op0 = (const_str_n == 1) ? const_rtx : var_rtx; rtx op1 = (const_str_n == 1) ? var_rtx : const_rtx; - op0 = convert_modes (mode, unit_mode, op0, 1); - op1 = convert_modes (mode, unit_mode, op1, 1); - rtx diff = expand_simple_binop (mode, MINUS, op0, op1, - result, 1, OPTAB_WIDEN); + op0 = convert_modes (word_mode, unit_mode, op0, 1); + op1 = convert_modes (word_mode, unit_mode, op1, 1); + rtx diff = expand_simple_binop (word_mode, MINUS, op0, op1, + intermediate_result, 1, OPTAB_WIDEN); - /* Force the difference into result register. We cannot reassign - result here ("result = diff") or we may end up returning - uninitialized result when expand_simple_binop allocates a new - pseudo-register for returning. */ - if (diff != result) - emit_move_insn (result, diff); + /* Force the difference into intermediate result register. We cannot + reassign the intermediate result here ("intermediate_result = diff") + or we may end up returning uninitialized result when + expand_simple_binop allocates a new pseudo-register for returning. */ + if (diff != intermediate_result) + emit_move_insn (intermediate_result, diff); if (i < length - 1) - emit_cmp_and_jump_insns (result, CONST0_RTX (mode), NE, NULL_RTX, - mode, true, ne_label); + emit_cmp_and_jump_insns (intermediate_result, CONST0_RTX (word_mode), + NE, NULL_RTX, word_mode, true, ne_label); offset += GET_MODE_SIZE (unit_mode); } emit_label (ne_label); + /* Now convert the intermediate result to the final result. */ + emit_move_insn (result, gen_lowpart (mode, intermediate_result)); rtx_insn *insns = get_insns (); end_sequence (); emit_insn (insns);