From patchwork Sat Jun 3 22:45:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1790007 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=SGwd1SNW; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QYZhp4ZT9z20QH for ; Sun, 4 Jun 2023 08:45:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7CB7E3858291 for ; Sat, 3 Jun 2023 22:45:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 9D4193858D1E for ; Sat, 3 Jun 2023 22:45:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9D4193858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=mqOa6XbXyt12HvG69IJ3ewdnBOKZzCtI466zep75XI8=; b=SGwd1SNWsE5Jwv4PR+4VPZ1Hfa gLGtPQ467vO3uvvZp5xM8wxUbSmHAUnugP8FD8lbxYMK1UoVxOaQ1Y4JAP5QiJO/Xl9ECkl3nnb8z e+EwtG7Gp0Wh5UBubit4CRSWHFwD1cvWHb7NK4uJ4dJ66baawlq3sjn9+laipuRPs/JxwQLtfP+57 lRLPe+Q5auRvIfCVteYyp7yaNHLqLFqajje9z5SN87FoMx72yAdu+Dut3HUOpnqZyDIgikxrh5WQ3 uvBLZXrH0sqm1l7eEe/HbOAwGqfx7kJ7yLzn35q+Go/SlxlZiIrBKYvT6yn4TUv+XhLapknVe+zv3 UnRczrRw==; Received: from host86-169-41-81.range86-169.btcentralplus.com ([86.169.41.81]:57455 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1q5Zzx-0005xD-2x; Sat, 03 Jun 2023 18:45:26 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Add support for stc, clc and cmc instructions in i386.md Date: Sat, 3 Jun 2023 23:45:23 +0100 Message-ID: <00d701d9966d$16552220$42ff6660$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdmWbCmPreFWFaM4ThymvzaM1hj5+w== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MEDICAL_SUBJECT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is the latest revision of my patch to add support for the STC (set carry flag), CLC (clear carry flag) and CMC (complement carry flag) instructions to the i386 backend, incorporating Uros' previous feedback. The significant changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate implementations on pentium4 (which has a notoriously slow STC) when not optimizing for size. An example of the use of the stc instruction is: unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) { return __builtin_ia32_addcarryx_u32 (1, a, b, c); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret with this patch now generates: stc adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret An example of the use of the cmc instruction (where the carry from a first adc is inverted/complemented as input to a second adc) is: unsigned int bar (unsigned int a, unsigned int b, unsigned int c, unsigned int d) { unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setnc %al movl %edi, o1(%rip) addb $-1, %al adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret and now generates: stc adcl %esi, %edi cmc movl %edi, o1(%rip) adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2022-06-03 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_builtin) : Use new x86_stc or negqi_ccc_1 instructions to set the carry flag. * config/i386/i386.h (TARGET_SLOW_STC): New define. * config/i386/i386.md (UNSPEC_CLC): New UNSPEC for clc. (UNSPEC_STC): New UNSPEC for stc. (UNSPEC_CMC): New UNSPEC for cmc. (*x86_clc): New define_insn. (*x86_clc_xor): New define_insn for pentium4 without -Os. (x86_stc): New define_insn. (define_split): Convert x86_stc into alternate implementation on pentium4. (x86_cmc): New define_insn. (*x86_cmc_1): New define_insn_and_split to recognize cmc pattern. (*setcc_qi_negqi_ccc_1_): New define_insn_and_split to recognize (and eliminate) the carry flag being copied to itself. (*setcc_qi_negqi_ccc_2_): Likewise. (neg_ccc_1): Renamed from *neg_ccc_1 for gen function. * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag. gcc/testsuite/ChangeLog * gcc.target/i386/cmc-1.c: New test case. * gcc.target/i386/stc-1.c: Likewise. Thanks, Roger diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 5d21810..9e02fdd 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -13948,8 +13948,6 @@ rdseed_step: arg3 = CALL_EXPR_ARG (exp, 3); /* unsigned int *sum_out. */ op1 = expand_normal (arg0); - if (!integer_zerop (arg0)) - op1 = copy_to_mode_reg (QImode, convert_to_mode (QImode, op1, 1)); op2 = expand_normal (arg1); if (!register_operand (op2, mode0)) @@ -13967,7 +13965,7 @@ rdseed_step: } op0 = gen_reg_rtx (mode0); - if (integer_zerop (arg0)) + if (op1 == const0_rtx) { /* If arg0 is 0, optimize right away into add or sub instruction that sets CCCmode flags. */ @@ -13977,7 +13975,14 @@ rdseed_step: else { /* Generate CF from input operand. */ - emit_insn (gen_addqi3_cconly_overflow (op1, constm1_rtx)); + if (!CONST_INT_P (op1)) + { + op1 = convert_to_mode (QImode, op1, 1); + op1 = copy_to_mode_reg (QImode, op1); + emit_insn (gen_negqi_ccc_1 (op1, op1)); + } + else + emit_insn (gen_x86_stc ()); /* Generate instruction that consumes CF. */ op1 = gen_rtx_REG (CCCmode, FLAGS_REG); diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index c7439f8..5ac9c78 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -448,6 +448,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD] #define TARGET_DEST_FALSE_DEP_FOR_GLC \ ix86_tune_features[X86_TUNE_DEST_FALSE_DEP_FOR_GLC] +#define TARGET_SLOW_STC ix86_tune_features[X86_TUNE_SLOW_STC] /* Feature tests against the various architecture variations. */ enum ix86_arch_indices { diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e6ebc46..482548c 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -114,6 +114,9 @@ UNSPEC_INSN_FALSE_DEP UNSPEC_SBB UNSPEC_CC_NE + UNSPEC_CLC + UNSPEC_STC + UNSPEC_CMC ;; For SSE/MMX support: UNSPEC_FIX_NOTRUNC @@ -1999,6 +2002,64 @@ [(set_attr "type" "ssecomi") (set_attr "prefix" "evex") (set_attr "mode" "HF")]) + +;; Clear carry flag. +(define_insn "*x86_clc" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC))] + "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)" + "clc" + [(set_attr "length" "1") + (set_attr "length_immediate" "0") + (set_attr "modrm" "0")]) + +(define_insn "*x86_clc_xor" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC)) + (clobber (match_scratch:SI 0 "=r"))] + "TARGET_SLOW_STC && !optimize_function_for_size_p (cfun)" + "xor{l}\t%0, %0" + [(set_attr "type" "alu1") + (set_attr "mode" "SI") + (set_attr "length_immediate" "0")]) + +;; Set carry flag. +(define_insn "x86_stc" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_STC))] + "" + "stc" + [(set_attr "length" "1") + (set_attr "length_immediate" "0") + (set_attr "modrm" "0")]) + +;; On Pentium 4, set the carry flag using mov $1,%al;neg %al. +(define_split + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_STC))] + "TARGET_SLOW_STC + && !optimize_insn_for_size_p () + && can_create_pseudo_p ()" + [(set (match_dup 0) (const_int 1)) + (parallel + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 0) (const_int 0)] UNSPEC_CC_NE)) + (set (match_dup 0) (neg:QI (match_dup 0)))])] + "operands[0] = gen_reg_rtx (QImode);") + +;; Complement carry flag. +(define_insn "*x86_cmc" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(reg:CCC FLAGS_REG)] UNSPEC_CMC))] + "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)" + "cmc" + [(set_attr "length" "1") + (set_attr "length_immediate" "0") + (set_attr "modrm" "0")]) + +(define_insn_and_split "*x86_cmc_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(geu:QI (reg:CCC FLAGS_REG) (const_int 0)) + (const_int 0)] UNSPEC_CC_NE))] + "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(reg:CCC FLAGS_REG)] UNSPEC_CMC))]) ;; Push/pop instructions. @@ -8107,6 +8168,25 @@ "#" "&& 1" [(const_int 0)]) + +;; Set the carry flag from the carry flag. +(define_insn_and_split "*setcc_qi_negqi_ccc_1_" + [(set (reg:CCC FLAGS_REG) + (ltu:CCC (reg:CC_CCC FLAGS_REG) (const_int 0)))] + "ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)]) + +;; Set the carry flag from the carry flag. +(define_insn_and_split "*setcc_qi_negqi_ccc_2_" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(ltu:QI (reg:CC_CCC FLAGS_REG) (const_int 0)) + (const_int 0)] UNSPEC_CC_NE))] + "ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)]) ;; Overflow setting add instructions @@ -11942,7 +12022,7 @@ [(set_attr "type" "negnot") (set_attr "mode" "SI")]) -(define_insn "*neg_ccc_1" +(define_insn "neg_ccc_1" [(set (reg:CCC FLAGS_REG) (unspec:CCC [(match_operand:SWI 1 "nonimmediate_operand" "0") diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index e1c72cd..c3229d2 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -698,3 +698,7 @@ DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", m_NONE) /* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion before a transfer of control flow out of the function. */ DEF_TUNE (X86_TUNE_EMIT_VZEROUPPER, "emit_vzeroupper", ~m_KNL) + +/* X86_TUNE_SLOW_STC: This disables use of stc, clc and cmc carry flag + modifications on architectures where theses operations are slow. */ +DEF_TUNE (X86_TUNE_SLOW_STC, "slow_stc", m_PENT4) diff --git a/gcc/testsuite/gcc.target/i386/cmc-1.c b/gcc/testsuite/gcc.target/i386/cmc-1.c new file mode 100644 index 0000000..58e922a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cmc-1.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int o1; +unsigned int o2; + +unsigned int foo_xor (unsigned int a, unsigned int b, + unsigned int c, unsigned int d) +{ + unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); + return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2); +} + +unsigned int foo_sub (unsigned int a, unsigned int b, + unsigned int c, unsigned int d) +{ + unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); + return __builtin_ia32_addcarryx_u32 (1 - c1, c, d, &o2); +} + +unsigned int foo_eqz (unsigned int a, unsigned int b, + unsigned int c, unsigned int d) +{ + unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); + return __builtin_ia32_addcarryx_u32 (c1 == 0, c, d, &o2); +} + +/* { dg-final { scan-assembler "cmc" } } */ diff --git a/gcc/testsuite/gcc.target/i386/stc-1.c b/gcc/testsuite/gcc.target/i386/stc-1.c new file mode 100644 index 0000000..857c939 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/stc-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) +{ + return __builtin_ia32_addcarryx_u32 (1, a, b, c); +} + +unsigned int bar (unsigned int b, unsigned int *c) +{ + return __builtin_ia32_addcarryx_u32 (1, 2, b, c); +} + +unsigned int baz (unsigned int a, unsigned int *c) +{ + return __builtin_ia32_addcarryx_u32 (1, a, 3, c); +} + +/* { dg-final { scan-assembler "stc" } } */