From patchwork Fri Jun 16 06:22:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 1795715 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=QO705fny; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qj8GV4l8Cz20Wy for ; Fri, 16 Jun 2023 16:22:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 906333854E52 for ; Fri, 16 Jun 2023 06:22:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 906333854E52 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686896560; bh=EB9uOVaRw2pVlO+KUJbP0iz9aCJrUAuLOX3UCO5aatE=; h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=QO705fnyIKFhvoJm4i7C1YfOk+Te0H0NKPmbMZF08abk4HUQUW3fR0NaK3GM6EfVf hQYB5rIKeR+sk2UBDi0umGdRhYkhJ0B4lb+3bnM8UlW8o8Y014ENEqYkIHXcJtwbH4 4QVmD80G4rIdVLKB7OOP9+mGtkEUoXkllVtsDbF8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2055.outbound.protection.outlook.com [40.107.8.55]) by sourceware.org (Postfix) with ESMTPS id 393F23857342 for ; Fri, 16 Jun 2023 06:22:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 393F23857342 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jRUW6HnqsqRgGget0cMizzcxS8njDEBQ9J4kD86MLVgZ1StdSkEh2dbJ74FweyM8JC7P0m0kfJJ8vHg0Ys+faFy52/e/nDeGv2RMBBz8+BbL8mlJL9dxMciXR4Jw+np005LZudCZWvMZ1JFj+ipqEM6cKnooXKyd6var6WNv1gdAwmhPPrMLEv5KmbVHgALAi8aiUwWXYfCgkYp9m3M3t9F/TrDWQlm++1kP8dy6O7R5MVksg1UyLB42sXfdzGH6wz3+BPijVTTdH3DcwsVNILKldD8VgUlwlJ8qcGpgaGad0wu1xYnlpm6jxqDqMeNDBzQqVs/65bmPiFNP+BpSKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EB9uOVaRw2pVlO+KUJbP0iz9aCJrUAuLOX3UCO5aatE=; b=LU+SStli1sbLdFNageuAdNbND/usuQ8BvETOaZlvrOSsd0ojcWAKTou/WQ1fyAomEMucrDg/3k0xRfbTlO/858SfAfqnjb3ielyhQkHcZNa6KDVQlKLnSXp46pKCHLfZNTBR76isfFSnK4M8SSPIwjPP2Ga+wDMCTxSUkkwLj4LOxOg6h4DtOgrGQcKocIAWnFF27PT/KWZ9sq8k4AVuOjShlUlcG+7ciSHks2lH32iJairEWJkk2NKhjut6U4Gp/qayM0ewQRxaBfHp/LsWS5LCg/hn+j0Jg+i6P1Xl9rj+CiS5QGSNbNv7qnJ7x5H07EXXNFEgoiPid47gOZ3cWw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by VI1PR04MB7008.eurprd04.prod.outlook.com (2603:10a6:803:13b::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.29; Fri, 16 Jun 2023 06:22:19 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6455.039; Fri, 16 Jun 2023 06:22:19 +0000 Message-ID: Date: Fri, 16 Jun 2023 08:22:16 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F To: "gcc-patches@gcc.gnu.org" Cc: Kirill Yukhin , Hongtao Liu Content-Language: en-US X-ClientProxiedBy: FR0P281CA0088.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1e::8) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|VI1PR04MB7008:EE_ X-MS-Office365-Filtering-Correlation-Id: c74202c8-785e-4521-85a5-08db6e3209a9 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Y3f6+kCJEJJbMUYP0CW45j1Vx+hLN95A6UNfDTjF1JJ6ohyCLuKQXWeHVkdKpHNMm/w0lxrXxTFHa84tCQarOrLYsQphicpq3zOnaPCq92VsrYgdaQAgBqu+uy1YVJtIvjY0IAnpsU/lXhrCqul0FNeUTzEY2bFIaaowVDhCaf1a2Wk5c0Qc+8YCxCdNSn0AXGdxIvWB1rmqPE+XML3zd2C5f6OTxUhEkMbp+WtF3CD2pW+Rxdm0+hi+61PgSvg1PmkyB2SIWFDiBoyVKHFZmNVyfucsvB4leKdZPa4YqSzNMotqHe/IE+cFKGGD0HecLp1RrfttrUK97UUWqQXgJEZpLQzFNDM/dxkZqBYMITafpXNFy9nj3YBX2G3BqzrrZAIzDbvf6j9ZQK+nBEd99jYnWEszRH+QSMwVq5IatsvP+IS+utRxZyNhokvuZ8sBmfR+Wi2EDUmioGoYcUvbRAjHBVxUF455VTnAFB6OkE1at/Q2HSDh3tYySdNwqb5i7XbYM72aDURzxs8b7aLnG3nSRxv4GTKvi3cJmAJAKBVDoKoYy7vkkh8PvumJy1qi2t//NMg0tESdUxi0JZc+Nn5UM3hdKas4+MCQDUIwEHfDiWbwaSTDHGHLXPQ+6M3S9mBFVVjZHlqDt3zVcuLNSQ== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR04MB6560.eurprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(396003)(366004)(136003)(346002)(376002)(39860400002)(451199021)(2906002)(41300700001)(5660300002)(8676002)(8936002)(316002)(4326008)(6916009)(66476007)(66556008)(66946007)(36756003)(83380400001)(2616005)(6506007)(6512007)(26005)(186003)(31696002)(38100700002)(31686004)(86362001)(478600001)(54906003)(6486002)(6666004)(43740500002)(45980500001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?KRjww1tyd4Ngr4+OJOVL3wQE9Dsp?= =?utf-8?q?HVJ8Mh5S8Yc4MP6YO2+YmAgj//eUPfLosr7FzwQrvDF6rmXapKSUltSRvJWExO+l/?= =?utf-8?q?ksy7fcQ1JKmyCMST+9itJjsB+mLLdqtXJqSaCvWwTYnVc9TkdCuN6dHsz67Yz6rJS?= =?utf-8?q?wSKokorjX52xE+Txhayr6oJ1C0BrkvzslaSPJpwLg36FNUtUkQrSPbHLeOrKEXq8o?= =?utf-8?q?ntHJUVFfWYuyJ7CKEQvmClmlmtTsHchnCxceeojKS745kytdBRJh45nISMh1cUfaX?= =?utf-8?q?MF82LoSq0hEB2MGxrjh/PVOrosMp8yqjvxKzGBTv1++hy7Nz9qrxfURIDW1weh8PS?= =?utf-8?q?kxYAiyckhW7b7SGlHVw4PTT3hDUq4VlWUazssve/arx5QPmElLPUL8H1HRegm6DK2?= =?utf-8?q?Dxl4qsyUztn9DO3/bL7pdsIM433/CMU3SreNAfgzcMGVqkkMA8TvS5484azRi/Uhz?= =?utf-8?q?eCm5JlSZn/Cah4RXPzafeUFxCQuiB2KD6tWJ7cKkTRTOrtAlUyj8Rge14UVRldNZY?= =?utf-8?q?L8VgqPavd8l0n6JxYGEPJ/g4UZZSmGDpn+Bj50o6qH2fVvvd2n5YiCNyCmFao62WR?= =?utf-8?q?81aoK2FvFHN97e9aDCTuaqhiZW/Asx6rvbCCDfJsISHIeylB+tLZbTIalITNhHzWs?= =?utf-8?q?B6f/j6enP+9GVFhluNGTH4ywFSoJs3zgPKJyCTaDQcrk78kXMD8yNbpyG0vMPNgYd?= =?utf-8?q?J8j4nmrHZLwFmV6UqSx81/WwRr0MdFWo0MTyplP5wnUB7CoN3ThJfVLVUwsFXQ9TD?= =?utf-8?q?lYRfPQkZRdr1v0r6OOsGK2b1NJWe54Iz0NF9DuU1ToYNua6ZEDGu9OuvZiYC0rRrZ?= =?utf-8?q?oGt5vKy/4S5eL8pmcWpLp96EwIr03sFdwX3zAqQ4sLWODR7xH3PdMmR/5vTrofQEF?= =?utf-8?q?q/IQFvVtwB3YbddwzHCJT//P9ZJkSue7+eTEPXixJ/Zpk4edLtuwS7MumlNwdv671?= =?utf-8?q?YXIvVfAV+8pCIAgW9oOdDxm+kGEwmJJpYfpjAMbTKPyvXCSjxZCazycUTj8FBvDeu?= =?utf-8?q?RrjnbdYd0dDxD++Tn6enQyJeg1vD38Qu3GqzUmwHhU9JJdeXfaRL+2rfmN8AVeiwi?= =?utf-8?q?fllNp4vHqpKXoJvNwhhZt0SG3GhD0ajHeltwcQHO2I/zbRdiKQvb0OXJ8Tr2ZnToy?= =?utf-8?q?M6NMzJ+k3ks60h6/zrJ3k5F+hNImNzSc+7BP419PQzNm8e5YPknv8M6C1m1drl2tD?= =?utf-8?q?Yzh64L70WF0GKwG20FscvVPSwkef5B5X99OZSt2MBBVY49xIKjxk9s6npsGeQNOdr?= =?utf-8?q?05CX2iNG8Y+Cq3WaF9XC3nzSHbr9wg0a8D4JyteE7b1tqurp4sInS7rUOJjw90Rve?= =?utf-8?q?mtVA86Bx+WvhUS+s0ioojXKFOTK/m1mV8kcSte4ZVTfG1+dIEoavftWdAVpuqSxPC?= =?utf-8?q?2QRQy3bpa1gjCa44PTC6JbxIAuIfxbFL4M7WKNaZdg90rc/aG+OdTRRCy0EFW1FlK?= =?utf-8?q?ovWBmZohhPxWBdHyiHgxKKvLLYJ0LsINykMIVF5XGBXd4O7vsXTDapIK/P7hw5Qgz?= =?utf-8?q?tP4xDlw+x+90?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: c74202c8-785e-4521-85a5-08db6e3209a9 X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2023 06:22:19.3893 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: CpsvmzFh83ljLvTBRlXzTkwxZ3vz/RDD9od8/Rfdf/+B/ZUJq7ROVY2f84pNMkHBITJn6oDK5AlwSmFSp+R8EA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR04MB7008 X-Spam-Status: No, score=-3027.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Beulich via Gcc-patches From: Jan Beulich Reply-To: Jan Beulich Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a single-insn one (leaving aside moves which the compiler decides to insert for unclear reasons, and leaving aside the fact that bcst_mem_operand() is too restrictive for broadcast to be embedded right into VPTERNLOG*). Along with this also request value duplication in ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating excess space allocation in .rodata.*, filled with zeros which are never read. gcc/ * config/i386/i386-expand.cc (ix86_expand_copysign): Request value duplication by ix86_build_signbit_mask() when AVX512F and not HFmode. * config/i386/sse.md (*_vternlog_all): Convert to 2-alternative form. Adjust "mode" attribute. Add "enabled" attribute. (*_vpternlog_1): Also permit when TARGET_AVX512F && !TARGET_PREFER_AVX256. (*_vpternlog_2): Likewise. (*_vpternlog_3): Likewise. --- I guess the underlying pattern, going along the lines of what one_cmpl2 uses, can be applied elsewhere as well. HFmode could use embedded broadcast too for copysign and alike, but that would need to be V2HF -> V8HF (for which I don't think there are any existing patterns). --- v2: Respect -mprefer-vector-width=. --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) else dest = NULL_RTX; op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); - mask = ix86_build_signbit_mask (vmode, 0, 0); + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 0); if (CONST_DOUBLE_P (operands[1])) { --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12597,11 +12597,11 @@ (set_attr "mode" "")]) (define_insn "*_vternlog_all" - [(set (match_operand:V 0 "register_operand" "=v") + [(set (match_operand:V 0 "register_operand" "=v,v") (unspec:V - [(match_operand:V 1 "register_operand" "0") - (match_operand:V 2 "register_operand" "v") - (match_operand:V 3 "bcst_vector_operand" "vmBr") + [(match_operand:V 1 "register_operand" "0,0") + (match_operand:V 2 "register_operand" "v,v") + (match_operand:V 3 "bcst_vector_operand" "vBr,m") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] "TARGET_AVX512F @@ -12609,10 +12609,22 @@ it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 || GET_CODE (operands[3]) != VEC_DUPLICATE)" - "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" +{ + if (TARGET_AVX512VL) + return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"; + else + return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}"; +} [(set_attr "type" "sselog") (set_attr "prefix" "evex") - (set_attr "mode" "")]) + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "1") + (symbol_ref " == 64 || TARGET_AVX512VL") + (const_string "*")))]) ;; There must be lots of other combinations like ;; @@ -12641,7 +12653,8 @@ (any_logic2:V (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12725,7 +12738,8 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12808,7 +12822,8 @@ (match_operand:V 1 "regmem_or_bitnot_regmem_operand") (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split ()" "#" "&& 1"