From patchwork Wed Sep 20 18:58:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 816402 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-462639-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="r/eVnMPD"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xy8Cb2Q4pz9s4q for ; Thu, 21 Sep 2017 04:58:34 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=UzKN4mgAuem9dwkl8tcdzO2vzdfmZ Jno2vjIuGVCa2A4CpYdQp/VYEk6WQ3CzbmUyiKN79JioJCfcAq8UOPQLnB1It7Ou 0eZt5W1pAOFOCiWu0KNU+Wcmt7zG7cN+so+Tce55A9/R1SmHKDZiCMbbGwJ5b2ME nGRuc291Q1l+cw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=ZZIEtOuFfCfDcm6HzrQmPcEP4AE=; b=r/e VnMPDKAK6ng2zpDgCWvlTm8B6z6+mMQfj2s7rqEbs5Vb9132SSxe0cksoWEP4ruT TYutUeEzSiiJGPFumcJc14uL8R6T8tzgnIVfgQP+BmRVyWHOMKghx39EqH9uA9ey Wg0RWwNo6xSCfJNYV03kh3PsQ7Jux9DCZgRWUOoc= Received: (qmail 8938 invoked by alias); 20 Sep 2017 18:58:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8927 invoked by uid 89); 20 Sep 2017 18:58:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=factors, contemporary, att, qq X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 20 Sep 2017 18:58:24 +0000 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DF327CD183; Wed, 20 Sep 2017 18:58:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DF327CD183 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jakub@redhat.com Received: from tucnak.zalov.cz (ovpn-116-102.ams2.redhat.com [10.36.116.102]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7509E60178; Wed, 20 Sep 2017 18:58:22 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id v8KIwJhN030663; Wed, 20 Sep 2017 20:58:19 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id v8KIwIN2029061; Wed, 20 Sep 2017 20:58:18 +0200 Date: Wed, 20 Sep 2017 20:58:17 +0200 From: Jakub Jelinek To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Improve x86_64 *movqi_internal at -Os (PR target/82260) Message-ID: <20170920185817.GX1701@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.7.1 (2016-10-04) X-IsSubscribed: yes Hi! As mentioned in the PR, in some cases (one or both *movqi_internal operands %dil/%sil/%bpl/%spl, neither %r*b) the movl variant is smaller and in cases where movb and movl is the same size, we might as well choose the faster instruction. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-09-20 Jakub Jelinek PR target/82260 * config/i386/i386.md (*movqi_internal): Replace =q <- q alternative with =Q <- Q, =R <- R and =r <- r alternatives, only enable the latter two for 64-bit, renumber alternatives, for -Os imov =q <- n alternative always use QI mode, for -Os imov =R <- R alternative always use SI mode, for imov =Q <- Q or =r <- r alternatives ignore -Os. * gcc.target/i386/pr82260-1.c: New test. * gcc.target/i386/pr82260-2.c: New test. Jakub --- gcc/config/i386/i386.md.jj 2017-09-20 10:46:08.000000000 +0200 +++ gcc/config/i386/i386.md 2017-09-20 15:15:19.365142707 +0200 @@ -2571,9 +2571,9 @@ (define_insn "*movhi_internal" (define_insn "*movqi_internal" [(set (match_operand:QI 0 "nonimmediate_operand" - "=q,q ,q ,r,r ,?r,m ,k,k,r,m,k") + "=Q,R,r,q,q,r,r ,?r,m ,k,k,r,m,k") (match_operand:QI 1 "general_operand" - "q ,qn,qm,q,rn,qm,qn,r,k,k,k,m"))] + "Q ,R,r,n,m,q,rn, m,qn,r,k,k,k,m"))] "!(MEM_P (operands[0]) && MEM_P (operands[1]))" { static char buf[128]; @@ -2589,17 +2589,17 @@ (define_insn "*movqi_internal" case TYPE_MSKMOV: switch (which_alternative) { - case 7: + case 9: ops = "kmov%s\t{%%k1, %%0|%%0, %%k1}"; break; - case 9: + case 11: ops = "kmov%s\t{%%1, %%k0|%%k0, %%1}"; break; - case 10: - case 11: + case 12: + case 13: gcc_assert (TARGET_AVX512DQ); /* FALLTHRU */ - case 8: + case 10: ops = "kmov%s\t{%%1, %%0|%%0, %%1}"; break; default: @@ -2619,51 +2619,67 @@ (define_insn "*movqi_internal" } } [(set (attr "isa") - (if_then_else (eq_attr "alternative" "10,11") - (const_string "avx512dq") - (const_string "*"))) + (cond [(eq_attr "alternative" "1,2") + (const_string "x64") + (eq_attr "alternative" "12,13") + (const_string "avx512dq") + ] + (const_string "*"))) (set (attr "type") - (cond [(eq_attr "alternative" "7,8,9,10,11") + (cond [(eq_attr "alternative" "9,10,11,12,13") (const_string "mskmov") - (and (eq_attr "alternative" "5") + (and (eq_attr "alternative" "7") (not (match_operand:QI 1 "aligned_operand"))) (const_string "imovx") (match_test "optimize_function_for_size_p (cfun)") (const_string "imov") - (and (eq_attr "alternative" "3") + (and (eq_attr "alternative" "5") (ior (not (match_test "TARGET_PARTIAL_REG_STALL")) (not (match_test "TARGET_QIMODE_MATH")))) (const_string "imov") - (eq_attr "alternative" "3,5") + (eq_attr "alternative" "5,7") (const_string "imovx") (and (match_test "TARGET_MOVX") - (eq_attr "alternative" "2")) + (eq_attr "alternative" "4")) (const_string "imovx") ] (const_string "imov"))) (set (attr "prefix") - (if_then_else (eq_attr "alternative" "7,8,9") + (if_then_else (eq_attr "alternative" "9,10,11") (const_string "vex") (const_string "orig"))) (set (attr "mode") - (cond [(eq_attr "alternative" "3,4,5") + (cond [(eq_attr "alternative" "5,6,7") (const_string "SI") - (eq_attr "alternative" "6") + (eq_attr "alternative" "8") (const_string "QI") - (and (eq_attr "alternative" "7,8,9") + (and (eq_attr "alternative" "9,10,11") (not (match_test "TARGET_AVX512DQ"))) (const_string "HI") (eq_attr "type" "imovx") (const_string "SI") + ;; For -Os, 8-bit immediates are always shorter than 32-bit + ;; ones. + (and (eq_attr "type" "imov") + (and (eq_attr "alternative" "3") + (match_test "optimize_function_for_size_p (cfun)"))) + (const_string "QI") + ;; For -Os, movl where one or both operands are NON_Q_REGS + ;; and both are LEGACY_REGS is shorter than movb. + ;; Otherwise movb and movl sizes are the same, so decide purely + ;; based on speed factors. + (and (eq_attr "type" "imov") + (and (eq_attr "alternative" "1") + (match_test "optimize_function_for_size_p (cfun)"))) + (const_string "SI") (and (eq_attr "type" "imov") - (and (eq_attr "alternative" "0,1") + (and (eq_attr "alternative" "0,1,2,3") (and (match_test "TARGET_PARTIAL_REG_DEPENDENCY") - (and (not (match_test "optimize_function_for_size_p (cfun)")) - (not (match_test "TARGET_PARTIAL_REG_STALL")))))) + (not (match_test "TARGET_PARTIAL_REG_STALL"))))) (const_string "SI") ;; Avoid partial register stalls when not using QImode arithmetic (and (eq_attr "type" "imov") - (and (eq_attr "alternative" "0,1") + (and (eq_attr "alternative" "0,1,2,3") (and (match_test "TARGET_PARTIAL_REG_STALL") (not (match_test "TARGET_QIMODE_MATH"))))) (const_string "SI") --- gcc/testsuite/gcc.target/i386/pr82260-1.c.jj 2017-09-20 15:27:43.696823321 +0200 +++ gcc/testsuite/gcc.target/i386/pr82260-1.c 2017-09-20 15:34:45.913536355 +0200 @@ -0,0 +1,26 @@ +/* PR target/82260 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Os -mtune=generic -masm=att" } */ +/* movl %esi, %ecx is shorter than movb %sil, %cl. While + movl %edx, %ecx is the same size as movb %dl, %cl and + movl %r8d, %ecx is the same size as movb %r8b, %cl, movl + is faster on contemporary CPUs. */ +/* { dg-final { scan-assembler-not {\mmovb\M} } } */ + +int +foo (int x, int c) +{ + return x >> c; +} + +int +bar (int x, int y, int z) +{ + return x >> z; +} + +int +baz (int x, int y, int z, int u, int v) +{ + return x >> v; +} --- gcc/testsuite/gcc.target/i386/pr82260-2.c.jj 2017-09-20 15:30:51.690469279 +0200 +++ gcc/testsuite/gcc.target/i386/pr82260-2.c 2017-09-20 15:35:31.358967291 +0200 @@ -0,0 +1,25 @@ +/* PR target/82260 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-Os -mtune=generic -masm=att -mtune-ctrl=^partial_reg_dependency" } */ +/* { dg-final { scan-assembler-not {\mmovb\t%sil, %cl} } } */ +/* { dg-final { scan-assembler {\mmovl\t%esi, %ecx} } } */ +/* { dg-final { scan-assembler {\mmovb\t%dl, %cl} } } */ +/* { dg-final { scan-assembler {\mmovb\t%r8b, %cl} } } */ + +int +foo (int x, int c) +{ + return x >> c; +} + +int +bar (int x, int y, int z) +{ + return x >> z; +} + +int +baz (int x, int y, int z, int u, int v) +{ + return x >> v; +}