From patchwork Tue Jun 20 07:06:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 1796931 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=iDnO7OVT; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qld4H2MfHz20WT for ; Tue, 20 Jun 2023 17:07:25 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4F0883858035 for ; Tue, 20 Jun 2023 07:07:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4F0883858035 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687244843; bh=lfvJEojE2zlhKpGPQJgNH/lWVgvWFIXdZyWZ+hVYHlU=; h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=iDnO7OVTPEdb7Hfr3gbIcvdXmqDvAS4xVdqx+XJJTKKjmCZx0nmW+hmlZZL5IivSF YZ4F3u41H84it1twa6JGbIvp3k1VUpotUXpZkIbLFBv5oiiOfW7I4P4mZ2LDfd3AfZ 8nkPqO2HDCzRw4QGAiOIaYTb/rwGRfmK4jvHlMF0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2083.outbound.protection.outlook.com [40.107.249.83]) by sourceware.org (Postfix) with ESMTPS id F14913858D1E for ; Tue, 20 Jun 2023 07:07:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F14913858D1E ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SVliaA2fESn7lKNygxWYhMsSNl2AJBVa9n5M47efHp4hd59VkddZ8UQIdPvi7buscbXcC7Do6Doxkk1DIqFxdvjua6GtOx3o2f23c+IMDCkAFDZY/Gh5n783Qod7uwdVXn1qUn8tJRf/Y0a+3Vg6iV0Zzk/DFTOhyj9gGWZV/Fdxuky7d5hC3UfByhTEd3sMhizRt/DicdRo9vimlVBV3MHFDlYTZlhgSAMRVOd+ZSYG4oS6RmhqHux2AspkAMnDhSBwIKCFrT6w9nrqSVYVZy7OKOwhkIVXM4M1jFpatsd3/Ze+iXX+/hjTlkW8/qKok6gNvrzxpYf2HT1Cd0Z3jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lfvJEojE2zlhKpGPQJgNH/lWVgvWFIXdZyWZ+hVYHlU=; b=HJUEhTPS31rKj2kAHH+zJ+MhIJ+CpIYmqL98QxxBfarQ3q9gr6p/nEpVydNrIlVs7EC3qzCwKcQZeQsidyFBXZZcJBjl4B1uxqKCMINd+RSgbNY/DjY5H0iA4gJYm28+VbTb9O7i5IorMTJd8e/SYyPemT88pIxPkXl+zD5IS52RWg1UxZUNbtmugULSpqJHE26BxkuH5e9JmXlKjINP0oBiUrzI+S1Zrz7TEikvXgbU3J21rldoGvrp4xpBIW8xQr3fqd9HPxE6s9XdVSWzwKt1aF3Hn7sli24b8c2NRchj9hAcqkpj8YUL0K8DlvqtYCkB0mR7bGY5uJQPhS2XsA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by DU0PR04MB9633.eurprd04.prod.outlook.com (2603:10a6:10:311::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36; Tue, 20 Jun 2023 07:06:58 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6500.036; Tue, 20 Jun 2023 07:06:57 +0000 Message-ID: <169ca252-3828-b466-4d47-a8fe720ec4ef@suse.com> Date: Tue, 20 Jun 2023 09:06:57 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F To: "gcc-patches@gcc.gnu.org" Cc: Kirill Yukhin , Hongtao Liu Content-Language: en-US X-ClientProxiedBy: FR3P281CA0064.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:4b::12) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|DU0PR04MB9633:EE_ X-MS-Office365-Filtering-Correlation-Id: a141a522-314c-43ba-6c26-08db715cefca X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: COOsCQQBzMXQJC71CgMgLrYols7b4tv9AD1N5DtHZK1O4ws4fChz1SDhIzNaLYLX+0D6uG9ktcwnD2ZBBQL1FFg78HyLVKCwNJqEm0IFIPvj0R8oHk6HbEvvyiZZ9WRcvHH8dO7xh8O/epJIRD4fL/SV0C/UTqNLKAS5pdItHjVmxKm0+x0LEa0liFJ5E/D9lZTkhGi6UTh4hZC2g9DTAK90LO4DhLlXjkuko1pmQ76jnFF+PbhI0jF04KCrc88AKH/oXutFRFM8E/avQe447oNNC/g1ahVJei3muzkU3bZeVzvNqFxzi+eFdh2Y077E9gLQfQXSNO+zTV7uAsmWop4hoR3P8IbcBlWqyj0Y/qNkXh1QPOfOa6t7nj/Yb3feXhBDPrNtbpKHXk8NDakEqMD5xvy9nTHRYOjpxBN9fghvrQik/7oCqItvfai7+weZeUwT9L+y00LEGm62j2xuBqLFAtywvj0VYjktWrUljRWc2eiT8MW69mjIJrRlFhH1TRCrL+CnLDRuQ+b969bxmed1D3Nb5PxpiZSNvVoLvUc3fUl72raCxFVl13oNuLkyytD1iW8e47X+G5gXTkJ+aOwhLK5ytG8vUssjvPa7sS/j1Mo62qkBpcEXVAKIf4LNkjVYMxFSBk7Dqwjy1FMT8w== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR04MB6560.eurprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(366004)(396003)(376002)(346002)(39860400002)(136003)(451199021)(2616005)(86362001)(83380400001)(31696002)(2906002)(84970400001)(31686004)(5660300002)(4326008)(54906003)(66556008)(66946007)(186003)(8676002)(8936002)(66476007)(478600001)(36756003)(41300700001)(6916009)(6512007)(6506007)(38100700002)(26005)(6486002)(316002)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?ji3jZc/y3ceoL/4ztX/Csv/NzE7C?= =?utf-8?q?uPPaaSgF80A+bJAjAhdxSrK9dAUbsSu+Mm1cUk+QT1pNCsrNt05k4Ep9f1bzGJuFR?= =?utf-8?q?w/yOSf9pP7tTFRRRHgnGm0h9Yj6xYg0Bb+dnIoe8EfjmlwFICj5kxWAFD9MV5i4z6?= =?utf-8?q?bERA7cTyV+H5IbsTZWRDQG6Q4SPjLeRvJAvblcoDVBvIFpqwcOBaPom5PvdkyD2i1?= =?utf-8?q?jHD90pTgxcyNsw/2Q6OYBKRiu/m3GDlLHscmemUFW12NT+cnlGz1W/RjkKap1MoPE?= =?utf-8?q?DahidAaexKccvLqcXKMbHcehU5mCCTiyn/Mg1br28RAWbbSbzy3i4tXYWlc44hUHs?= =?utf-8?q?GSGafazSDqjLpn/67dbL6x4vwKsDZV31k0G2K1nwuUtXopL/fyCpeWonQYNhaPd4k?= =?utf-8?q?rcvnDYhDot2Nqf8lTLBjSG/eoeZLUE7Ph+HFpOd+1sfC3/g57Rcabzr3V1JjKMUmN?= =?utf-8?q?Io4iMGEiYu/XN8jzHxhU0xr9gRfVRjZBp2hd50ZLvBdoho9taIUDCBWPxTx3H1pjK?= =?utf-8?q?gnFa0Mm3tpd3BYqKE3BxIjp4BPyC7X5GzcOyU6bQYmetAs+qO07cRqn1ExmlHEDEm?= =?utf-8?q?p8y1icWJZVtGmkayfstbSRKmRzKBz2VTbC5fGYy/SP4UTmz2fHKMOyUhQs2VQfiQw?= =?utf-8?q?yPOl9FytDmMIkI00yEmngyisfuCQyxGgdGnZ134RsAMBfHiEw+hqp3RrYrciLf5R3?= =?utf-8?q?rlviRDSvTj7qybR8a88FMX5koEhN46WYFD/AGmM9KmSTBQY9vLgu4jgLG3Dj8hn+r?= =?utf-8?q?Vnlx3XHFTrRQS2n3Bjb3d15Y/AkaZHv+BnxQPhsWCJQxEsrydCD/JbK6R7n4GQqj1?= =?utf-8?q?hfMg/Bdphn9E1ALPWLWupYsmvexxBoaVARcUFoSA4mwD8Abu73Mj1VKGUX5nNlOfU?= =?utf-8?q?2qnwgXLraIsiDiqlcF84KQlcsCpVZvNm7kRdt1to9bc01Y/5XzKRqC1LF2Ua+qmpk?= =?utf-8?q?o6+S3kNqmN9lYyDBI24wxpbjHyKoXXpsP7xJ/8S4w+QeM0hAOG18f4wnXzgSNma5P?= =?utf-8?q?izw5fIkGsLH7KiyYHx3fInLvo+YKUz43MZj+QYMy52Wnv+9S1WI+mIXf0FjwvR7Lb?= =?utf-8?q?M3JjYgYJORou+SJEcJNU1vFVJ8H5dGm5hq/N+8WVmMk1/Et8AVEdWuJUg/ymf/kBV?= =?utf-8?q?MeljqAoYk5U+22LyznFC0B4xrQevs9JF0EvtFUE+X5E+e0wyxVWA5h/hWpcB8D5dF?= =?utf-8?q?2iu1lieBFkvog+1bDnDRbWbBquQ4cjxL+SXHvzbLkDXAjMOupi9DbuyvFtzK/57AE?= =?utf-8?q?UOsRx70gd5d550iZ1qKIly26V1sHDUgCYQrtYUzjRS8qt+/r9vCAMoSZdmKN+dHhx?= =?utf-8?q?Ufb3rYsCQIyRAsPY5vTVsOPjEi3XGnnHlisVEhscFTomXJa3ciYXV+b5kt7vV0dR5?= =?utf-8?q?UTVoSTz8NHx4mWTxmImG9Iv7kle6AzyJvVle7DImeZOwvgRq9Yy2sGnM/14n8HsU1?= =?utf-8?q?v16wV2Jlne4AParFozqcXA5OOnqGFYVOw+uF5qLazO3e/FzP9tELiRMoz0MKk8Ynw?= =?utf-8?q?AFSx5Kpy0o/X?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: a141a522-314c-43ba-6c26-08db715cefca X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jun 2023 07:06:57.8190 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: DI2m+XpsFEHbHQyCiWY+KJZIWMSdI3Wm8rZUR8ADaCqJnzhjGNRl/3OnzqrYuAswbWJsy/SQmG6hDzYoJqw8SA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR04MB9633 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Beulich via Gcc-patches From: Jan Beulich Reply-To: Jan Beulich Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a single-insn one (leaving aside moves which the compiler decides to insert for unclear reasons, and leaving aside the fact that bcst_mem_operand() is too restrictive for broadcast to be embedded right into VPTERNLOG*). While there also bring *_vternlog_all's in sync with that of the three splitters. Along with this also request value duplication in ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating excess space allocation in .rodata.*, filled with zeros which are never read. gcc/ * config/i386/i386-expand.cc (ix86_expand_copysign): Request value duplication by ix86_build_signbit_mask() when AVX512F and not HFmode. * config/i386/sse.md (*_vternlog_all): Convert to 2-alternative form. Adjust "mode" attribute. Add "enabled" attribute. (*_vpternlog_1): Also permit when TARGET_AVX512F && !TARGET_PREFER_AVX256. (*_vpternlog_2): Likewise. (*_vpternlog_3): Likewise. gcc/testsuite/ * gcc.target/i386/avx512f-copysign.c: New test. --- I haven't been able to find documentation on the dejagnu(?) regex syntax (?:...). With ordinary (...) failing (producing twice as many matches), I could only derive this from other scan-assembler patterns. I guess the underlying pattern, going along the lines of what one_cmpl2 uses, can be applied elsewhere as well. HFmode could use embedded broadcast too for copysign and alike, but that would need to be V2HF -> V8HF (for which I don't think there are any existing patterns). --- v3: Adjust insn conditional as well. Add testcase. v2: Respect -mprefer-vector-width=. --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) else dest = NULL_RTX; op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); - mask = ix86_build_signbit_mask (vmode, 0, 0); + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 0); if (CONST_DOUBLE_P (operands[1])) { --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12399,22 +12399,35 @@ (set_attr "mode" "")]) (define_insn "*_vternlog_all" - [(set (match_operand:V 0 "register_operand" "=v") + [(set (match_operand:V 0 "register_operand" "=v,v") (unspec:V - [(match_operand:V 1 "register_operand" "0") - (match_operand:V 2 "register_operand" "v") - (match_operand:V 3 "bcst_vector_operand" "vmBr") + [(match_operand:V 1 "register_operand" "0,0") + (match_operand:V 2 "register_operand" "v,v") + (match_operand:V 3 "bcst_vector_operand" "vBr,m") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] - "TARGET_AVX512F + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) /* Disallow embeded broadcast for vector HFmode since it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 || GET_CODE (operands[3]) != VEC_DUPLICATE)" - "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" +{ + if (TARGET_AVX512VL) + return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"; + else + return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}"; +} [(set_attr "type" "sselog") (set_attr "prefix" "evex") - (set_attr "mode" "")]) + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "1") + (symbol_ref " == 64 || TARGET_AVX512VL") + (const_string "*")))]) ;; There must be lots of other combinations like ;; @@ -12443,7 +12456,8 @@ (any_logic2:V (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12527,7 +12541,8 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12610,7 +12625,8 @@ (match_operand:V 1 "regmem_or_bitnot_regmem_operand") (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "( == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split ()" "#" "&& 1" --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-copysign.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -mno-avx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$(?:216|228|0xd8|0xe4)," 5 } } */ + +double cs_df (double x, double y) +{ + return __builtin_copysign (x, y); +} + +float cs_sf (float x, float y) +{ + return __builtin_copysignf (x, y); +} + +typedef double __attribute__ ((vector_size (16))) v2df; +typedef double __attribute__ ((vector_size (32))) v4df; +typedef double __attribute__ ((vector_size (64))) v8df; + +v2df cs_v2df (v2df x, v2df y) +{ + return __builtin_ia32_copysignpd (x, y); +} + +v4df cs_v4df (v4df x, v4df y) +{ + return __builtin_ia32_copysignpd256 (x, y); +} + +v8df cs_v8df (v8df x, v8df y) +{ + return __builtin_ia32_copysignpd512 (x, y); +}