From patchwork Mon Aug 17 10:42:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1346116 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=raI1s4lL; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BVVxQ6PSQz9sRK for ; Mon, 17 Aug 2020 20:42:49 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3309D3857C47; Mon, 17 Aug 2020 10:42:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 354A63858D37 for ; Mon, 17 Aug 2020 10:42:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 354A63858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=roger@nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=5riTSx6isMCcI7jGIeZaQOAVZOmhpH3fWDVUsfKSEsE=; b=raI1s4lLUK5sjHQ35tHUNL4Om+ /Qrpz3WIq00Vl3sH+J/4iOuJxf1iBmHUAeoeVWFejGHs9qS0HkR6NCa9JrnvEPrEM0/txViBguxdu f/BXlU60dVe/cqV8iI4WGME89uRXEVHhxOb0ktIW9v4cfxScUj5RPYcZAXMQbRjQ7x7nX7uuse8p/ 1UdV5E1S6S6GQLxkyg3SHdEyZQSUogAyEo84vBstuzCvwJhBzZsqiUJD6NE+RE47ZJEuTN2BTQRUC dKclyouwLjrQxU0BWyA/Y6qT53j8LwXdOos62lIxbVS3phS1ZvkePnIg0sDQ9h3W8iO9N9NUhe5pn Vhzsp0GA==; Received: from host86-137-89-56.range86-137.btcentralplus.com ([86.137.89.56]:57178 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1k7cbC-0006wo-LJ; Mon, 17 Aug 2020 06:42:42 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] x86_64: PR rtl-optimization/92180: class_likely_spilled vs. cant_combine_insn. Date: Mon, 17 Aug 2020 11:42:40 +0100 Message-ID: <006d01d67483$22813840$6783a8c0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdZ0gc1xWGBCPp1/QxepAymZ3Hiiqg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'Uros Bizjak' Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch catches a missed optimization opportunity where GCC currently generates worse code than LLVM. The issue, as nicely analyzed in bugzilla, boils down to the following three insns in combine: (insn 6 5 7 2 (parallel [ (set (reg:DI 85) (ashift:DI (reg:DI 85) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "pr92180.c":4:10 564 {*ashldi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 7 6 14 2 (parallel [ (set (reg:DI 84) (ior:DI (reg:DI 84) (reg:DI 85))) (clobber (reg:CC 17 flags)) ]) "pr92180.c":4:10 454 {*iordi_1} (expr_list:REG_DEAD (reg:DI 85) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 14 7 15 2 (set (reg/i:SI 0 ax) (subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal} (expr_list:REG_DEAD (reg:DI 84) (nil))) Normally, combine/simplify-rtx would notice that insns 6 and 7 (which update highpart bits) are unnecessary as the final insn 14 only requires to lowpart bits. The complication is that insn 14 sets a hard register in targetm.class_likely_spilled_p which prevents combine from performing its simplifications, and removing the redundant instructions. At first glance a fix would appear to require changes to combine, potentially affecting code generation on all small register class targets... An alternate (and I think clever) solution is to spot that this problematic situation can be avoided by the backend. At RTL expansion time, the middle-end has a clear separation between pseudos and hard registers, so the RTL initially contains: (insn 9 8 10 2 (set (reg:SI 86) (subreg:SI (reg:DI 82 [ _1 ]) 0)) "pr92180.c":6:10 -1 (nil)) (insn 10 9 14 2 (set (reg:SI 83 [ ]) (reg:SI 86)) "pr92180.c":6:10 -1 (nil)) (insn 14 10 15 2 (set (reg/i:SI 0 ax) (reg:SI 83 [ ])) "pr92180.c":7:1 -1 (nil)) which can be optimized without problems by combine; it is only the intervening passes (initially fwprop1) that propagate computations into sets of hard registers, and disable those opportunities. The solution proposed here is to have the x86 backend/recog prevent early RTL passes composing instructions (that set likely_spilled hard registers) that they (combine) can't simplify, until after reload. We allow sets from pseudo registers, immediate constants and memory accesses, but anything more complicated is performed via a temporary pseudo. Not only does this simplify things for the register allocator, but any remaining register-to-register moves are easily cleaned up by the late optimization passes after reload, such as peephole2 and cprop_hardreg. This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and a "make -k check" with no new failures. Ok for mainline? 2020-08-17 Roger Sayle gcc/ChangeLog PR rtl-optimization/92180 * config/i386/i386.c (ix86_hardreg_mov_ok): New function to determine whether (set DST SRC) should be allowed at this point. * config/i386/i386-protos.h (ix86_hardreg_mov_ok): Prototype here. * config/i386/i386-expand.c (ix86_expand_move): Check whether this is a complex set of a likely spilled hard register, and if so place the value in a pseudo, and load the hard reg from it. * config/i386/i386.md (*movdi_internal, *movsi_internal, *movhi_internal, *movqi_internal): Make these instructions conditional on ix86_hardreg_mov_ok. (*lea): Make this define_insn_and_split conditional on ix86_hardreg_mov_ok. gcc/testsuite/ChangeLog PR rtl-optimization/92180 * gcc.target/i386/pr92180.c: New test. Thanks in advance, Roger --- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/testsuite/gcc.target/i386/pr92180.c b/gcc/testsuite/gcc.target/i386/pr92180.c new file mode 100644 index 0000000..0d5c4a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr92180.c @@ -0,0 +1,9 @@ +/* PR rtl-optimization/92180 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int foo() { + return __builtin_ia32_rdtsc(); +} + +/* { dg-final { scan-assembler-not "sal" } } */ diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index f441ba9..e6e4433 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -190,6 +190,17 @@ ix86_expand_move (machine_mode mode, rtx operands[]) op0 = operands[0]; op1 = operands[1]; + /* Avoid complex sets of likely spilled hard registers before reload. */ + if (!ix86_hardreg_mov_ok (op0, op1)) + { + tmp = gen_reg_rtx (mode); + operands[0] = tmp; + ix86_expand_move (mode, operands); + operands[0] = op0; + operands[1] = tmp; + op1 = tmp; + } + switch (GET_CODE (op1)) { case CONST: diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index b6088f2..a10bc56 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -161,6 +161,7 @@ extern rtx ix86_find_base_term (rtx); extern bool ix86_check_movabs (rtx, int); extern bool ix86_check_no_addr_space (rtx); extern void ix86_split_idivmod (machine_mode, rtx[], bool); +extern bool ix86_hardreg_mov_ok (rtx, rtx); extern rtx assign_386_stack_local (machine_mode, enum ix86_stack_slot); extern int ix86_attr_length_immediate_default (rtx_insn *, bool); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 10eb2dd..06a4d18 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18511,6 +18511,22 @@ ix86_class_likely_spilled_p (reg_class_t rclass) return false; } +/* Return true if a set of DST by the expression SRC should be allowed. + This prevents complex sets of likely_spilled hard regs before reload. */ + +bool +ix86_hardreg_mov_ok (rtx dst, rtx src) +{ + /* Avoid complex sets of likely_spilled hard registers before reload. */ + if (REG_P (dst) && HARD_REGISTER_P (dst) + && !REG_P (src) && !MEM_P (src) + && !x86_64_immediate_operand (src, GET_MODE (dst)) + && ix86_class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dst))) + && !reload_completed) + return false; + return true; +} + /* If we are copying between registers from different register sets (e.g. FP and integer), we may need a memory location. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9d4e669..ab2fdc3 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2077,7 +2077,8 @@ "=r ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,m,?r ,?*Yd,?r,?*v,?*y,?*x,*k,*k ,*r,*m,*k") (match_operand:DI 1 "general_operand" "riFo,riF,Z,rem,i,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,v,*Yd,r ,*v,r ,*x ,*y ,*r,*km,*k,*k,CBC"))] - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" + "!(MEM_P (operands[0]) && MEM_P (operands[1])) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) { @@ -2297,7 +2298,8 @@ "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") (match_operand:SI 1 "general_operand" "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" + "!(MEM_P (operands[0]) && MEM_P (operands[1])) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) { @@ -2405,7 +2407,8 @@ (define_insn "*movhi_internal" [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k ,r,m,k") (match_operand:HI 1 "general_operand" "r ,rn,rm,rn,r,km,k,k,CBC"))] - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" + "!(MEM_P (operands[0]) && MEM_P (operands[1])) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) { @@ -2494,7 +2497,8 @@ "=Q,R,r,q,q,r,r ,?r,m ,k,k,r,m,k,k,k") (match_operand:QI 1 "general_operand" "Q ,R,r,n,m,q,rn, m,qn,r,k,k,k,m,C,BC"))] - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" + "!(MEM_P (operands[0]) && MEM_P (operands[1])) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { char buf[128]; const char *ops; @@ -5149,7 +5153,7 @@ (define_insn_and_split "*lea" [(set (match_operand:SWI48 0 "register_operand" "=r") (match_operand:SWI48 1 "address_no_seg_operand" "Ts"))] - "" + "ix86_hardreg_mov_ok (operands[0], operands[1])" { if (SImode_address_operand (operands[1], VOIDmode)) {