From patchwork Mon Aug 2 06:31:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1512277 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=QzoL/Www; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GdSsv1Nn2z9sRK for ; Mon, 2 Aug 2021 16:34:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C8B2A3835C11 for ; Mon, 2 Aug 2021 06:34:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C8B2A3835C11 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627886096; bh=zks2MtP0KEsRTaf68jyOj8Tnnjpxnw9NLgUzDySDEd4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QzoL/WwwXPq6zacd8+AI2CwdQxB3jTbXEXscZC1SfrYXasgE8grS1ZmsqQ2Grm+4K FX4Xa/miGZ9Gs596gjEIZt7QRCLylJ1XgftwnCgtAuDokftYWYNYIK/+Q5+2KLuN0d STkYcZz0YZAmjJt2QTIxq/To5OgUQukzFAA3ti3o= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id E611C3839833 for ; Mon, 2 Aug 2021 06:31:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E611C3839833 X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="193678131" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="193678131" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 23:31:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="418585764" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga006.jf.intel.com with ESMTP; 01 Aug 2021 23:31:21 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1726VH1M022130; Sun, 1 Aug 2021 23:31:19 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/6] Update hf soft-fp from glibc. Date: Mon, 2 Aug 2021 14:31:11 +0800 Message-Id: <20210802063116.999830-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210802063116.999830-1-hongtao.liu@intel.com> References: <20210802063116.999830-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: joseph@codesourcery.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" libgcc/ChangeLog * soft-fp/eqhf2.c: New file. * soft-fp/extendhfdf2.c: New file. * soft-fp/extendhfsf2.c: New file. * soft-fp/extendhfxf2.c: New file. * soft-fp/half.h (FP_CMP_EQ_H): New marco. * soft-fp/truncdfhf2.c: New file * soft-fp/truncsfhf2.c: New file * soft-fp/truncxfhf2.c: New file --- libgcc/soft-fp/eqhf2.c | 49 +++++++++++++++++++++++++++++++++ libgcc/soft-fp/extendhfdf2.c | 53 ++++++++++++++++++++++++++++++++++++ libgcc/soft-fp/extendhfsf2.c | 49 +++++++++++++++++++++++++++++++++ libgcc/soft-fp/half.h | 1 + libgcc/soft-fp/truncdfhf2.c | 52 +++++++++++++++++++++++++++++++++++ libgcc/soft-fp/truncsfhf2.c | 48 ++++++++++++++++++++++++++++++++ 6 files changed, 252 insertions(+) create mode 100644 libgcc/soft-fp/eqhf2.c create mode 100644 libgcc/soft-fp/extendhfdf2.c create mode 100644 libgcc/soft-fp/extendhfsf2.c create mode 100644 libgcc/soft-fp/truncdfhf2.c create mode 100644 libgcc/soft-fp/truncsfhf2.c diff --git a/libgcc/soft-fp/eqhf2.c b/libgcc/soft-fp/eqhf2.c new file mode 100644 index 00000000000..6d6634e5c54 --- /dev/null +++ b/libgcc/soft-fp/eqhf2.c @@ -0,0 +1,49 @@ +/* Software floating-point emulation. + Return 0 iff a == b, 1 otherwise + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + In addition to the permissions in the GNU Lesser General Public + License, the Free Software Foundation gives you unlimited + permission to link the compiled version of this file into + combinations with other programs, and to distribute those + combinations without any restriction coming from the use of this + file. (The Lesser General Public License restrictions do apply in + other respects; for example, they cover modification of the file, + and distribution when not linked into a combine executable.) + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "soft-fp.h" +#include "half.h" + +CMPtype +__eqhf2 (HFtype a, HFtype b) +{ + FP_DECL_EX; + FP_DECL_H (A); + FP_DECL_H (B); + CMPtype r; + + FP_INIT_EXCEPTIONS; + FP_UNPACK_RAW_H (A, a); + FP_UNPACK_RAW_H (B, b); + FP_CMP_EQ_H (r, A, B, 1); + FP_HANDLE_EXCEPTIONS; + + return r; +} + +strong_alias (__eqhf2, __nehf2); diff --git a/libgcc/soft-fp/extendhfdf2.c b/libgcc/soft-fp/extendhfdf2.c new file mode 100644 index 00000000000..337ba791d48 --- /dev/null +++ b/libgcc/soft-fp/extendhfdf2.c @@ -0,0 +1,53 @@ +/* Software floating-point emulation. + Return an IEEE half converted to IEEE double + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + In addition to the permissions in the GNU Lesser General Public + License, the Free Software Foundation gives you unlimited + permission to link the compiled version of this file into + combinations with other programs, and to distribute those + combinations without any restriction coming from the use of this + file. (The Lesser General Public License restrictions do apply in + other respects; for example, they cover modification of the file, + and distribution when not linked into a combine executable.) + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define FP_NO_EXACT_UNDERFLOW +#include "soft-fp.h" +#include "half.h" +#include "double.h" + +DFtype +__extendhfdf2 (HFtype a) +{ + FP_DECL_EX; + FP_DECL_H (A); + FP_DECL_D (R); + DFtype r; + + FP_INIT_EXCEPTIONS; + FP_UNPACK_RAW_H (A, a); +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D + FP_EXTEND (D, H, 2, 1, R, A); +#else + FP_EXTEND (D, H, 1, 1, R, A); +#endif + FP_PACK_RAW_D (r, R); + FP_HANDLE_EXCEPTIONS; + + return r; +} diff --git a/libgcc/soft-fp/extendhfsf2.c b/libgcc/soft-fp/extendhfsf2.c new file mode 100644 index 00000000000..a02f46d9a99 --- /dev/null +++ b/libgcc/soft-fp/extendhfsf2.c @@ -0,0 +1,49 @@ +/* Software floating-point emulation. + Return an IEEE half converted to IEEE single + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + In addition to the permissions in the GNU Lesser General Public + License, the Free Software Foundation gives you unlimited + permission to link the compiled version of this file into + combinations with other programs, and to distribute those + combinations without any restriction coming from the use of this + file. (The Lesser General Public License restrictions do apply in + other respects; for example, they cover modification of the file, + and distribution when not linked into a combine executable.) + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define FP_NO_EXACT_UNDERFLOW +#include "soft-fp.h" +#include "half.h" +#include "single.h" + +SFtype +__extendhfsf2 (HFtype a) +{ + FP_DECL_EX; + FP_DECL_H (A); + FP_DECL_S (R); + SFtype r; + + FP_INIT_EXCEPTIONS; + FP_UNPACK_RAW_H (A, a); + FP_EXTEND (S, H, 1, 1, R, A); + FP_PACK_RAW_S (r, R); + FP_HANDLE_EXCEPTIONS; + + return r; +} diff --git a/libgcc/soft-fp/half.h b/libgcc/soft-fp/half.h index c7823ac61d3..4108f5cb3c2 100644 --- a/libgcc/soft-fp/half.h +++ b/libgcc/soft-fp/half.h @@ -167,4 +167,5 @@ union _FP_UNION_H #define _FP_FRAC_HIGH_RAW_H(X) _FP_FRAC_HIGH_1 (X) #define _FP_FRAC_HIGH_DW_H(X) _FP_FRAC_HIGH_1 (X) +#define FP_CMP_EQ_H(r, X, Y, ex) _FP_CMP_EQ (H, 1, (r), X, Y, (ex)) #endif /* !SOFT_FP_HALF_H */ diff --git a/libgcc/soft-fp/truncdfhf2.c b/libgcc/soft-fp/truncdfhf2.c new file mode 100644 index 00000000000..8bcb2787692 --- /dev/null +++ b/libgcc/soft-fp/truncdfhf2.c @@ -0,0 +1,52 @@ +/* Software floating-point emulation. + Truncate IEEE double into IEEE half. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + In addition to the permissions in the GNU Lesser General Public + License, the Free Software Foundation gives you unlimited + permission to link the compiled version of this file into + combinations with other programs, and to distribute those + combinations without any restriction coming from the use of this + file. (The Lesser General Public License restrictions do apply in + other respects; for example, they cover modification of the file, + and distribution when not linked into a combine executable.) + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "soft-fp.h" +#include "half.h" +#include "double.h" + +HFtype +__truncdfhf2 (DFtype a) +{ + FP_DECL_EX; + FP_DECL_D (A); + FP_DECL_H (R); + HFtype r; + + FP_INIT_ROUNDMODE; + FP_UNPACK_SEMIRAW_D (A, a); +#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D + FP_TRUNC (H, D, 1, 2, R, A); +#else + FP_TRUNC (H, D, 1, 1, R, A); +#endif + FP_PACK_SEMIRAW_H (r, R); + FP_HANDLE_EXCEPTIONS; + + return r; +} diff --git a/libgcc/soft-fp/truncsfhf2.c b/libgcc/soft-fp/truncsfhf2.c new file mode 100644 index 00000000000..25bee29f7f5 --- /dev/null +++ b/libgcc/soft-fp/truncsfhf2.c @@ -0,0 +1,48 @@ +/* Software floating-point emulation. + Truncate IEEE single into IEEE half. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + In addition to the permissions in the GNU Lesser General Public + License, the Free Software Foundation gives you unlimited + permission to link the compiled version of this file into + combinations with other programs, and to distribute those + combinations without any restriction coming from the use of this + file. (The Lesser General Public License restrictions do apply in + other respects; for example, they cover modification of the file, + and distribution when not linked into a combine executable.) + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "soft-fp.h" +#include "half.h" +#include "single.h" + +HFtype +__truncsfhf2 (SFtype a) +{ + FP_DECL_EX; + FP_DECL_S (A); + FP_DECL_H (R); + HFtype r; + + FP_INIT_ROUNDMODE; + FP_UNPACK_SEMIRAW_S (A, a); + FP_TRUNC (H, S, 1, 1, R, A); + FP_PACK_SEMIRAW_H (r, R); + FP_HANDLE_EXCEPTIONS; + + return r; +} From patchwork Mon Aug 2 06:31:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1512274 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=wruyQcER; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GdSpM6kstz9sRf for ; Mon, 2 Aug 2021 16:31:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 678843835C39 for ; Mon, 2 Aug 2021 06:31:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 678843835C39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627885910; bh=efK3+N780ipFXvUoHU+fpLZXy/H2Fsyf91ynL9gBC/M=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=wruyQcERzDLDuNadP+sCphSFhMyGUWyTS+zjg3MUNurG2tIgbJjn2tvbtvyiZcgg8 DKiNfEgnLB5Px4LD99APl0APDndBpnBIy14OEOrfxncu9/tbeaMR4GnV8eX5byKY3+ NmqSpPVZp7JpsKqmxHgIO3H4XFhMcpUQe4zm+AtM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id DDE363861823 for ; Mon, 2 Aug 2021 06:31:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDE363861823 X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="235322742" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="235322742" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 23:31:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="419201069" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 01 Aug 2021 23:31:23 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1726VH1N022130; Sun, 1 Aug 2021 23:31:21 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above. Date: Mon, 2 Aug 2021 14:31:12 +0800 Message-Id: <20210802063116.999830-3-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210802063116.999830-1-hongtao.liu@intel.com> References: <20210802063116.999830-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: joseph@codesourcery.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode. * config/i386/i386.c (enum x86_64_reg_class): Add X86_64_SSEHF_CLASS. (merge_classes): Handle X86_64_SSEHF_CLASS. (examine_argument): Ditto. (construct_container): Ditto. (classify_argument): Ditto, and set HFmode/HCmode to X86_64_SSEHF_CLASS. (function_value_32): Return _FLoat16/Complex Float16 by %xmm0. (function_value_64): Return _Float16/Complex Float16 by SSE register. (ix86_print_operand): Handle CONST_DOUBLE HFmode. (ix86_secondary_reload): Require gpr as intermediate register to store _Float16 from sse register when sse4 is not available. (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under sse2. (ix86_scalar_mode_supported_p): Ditto. (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined. * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode. (VALID_INT_MODE_P): Add HFmode and HCmode. * config/i386/i386.md (*pushhf_rex64): New define_insn. (*pushhf): Ditto. (*movhf_internal): Ditto. * doc/extend.texi (Half-Precision Floating Point): Documemt _Float16 for x86. * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0) which is used by extract_bit_field but not backends. gcc/lto/ChangeLog: * lto-lang.c (lto_type_for_mode): Return float16_type_node when mode == TYPE_MODE (float16_type_node). gcc/testsuite/ChangeLog * gcc.target/i386/sse2-float16-1.c: New test. * gcc.target/i386/sse2-float16-2.c: Ditto. * gcc.target/i386/sse2-float16-3.c: Ditto. * gcc.target/i386/float16-5.c: Ditto. --- gcc/config/i386/i386-modes.def | 1 + gcc/config/i386/i386.c | 91 +++++++++++++- gcc/config/i386/i386.h | 3 +- gcc/config/i386/i386.md | 118 +++++++++++++++++- gcc/doc/extend.texi | 13 ++ gcc/emit-rtl.c | 5 + gcc/lto/lto-lang.c | 3 + gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++ .../gcc.target/i386/sse2-float16-1.c | 8 ++ .../gcc.target/i386/sse2-float16-2.c | 16 +++ .../gcc.target/i386/sse2-float16-3.c | 12 ++ 11 files changed, 274 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def index 4e7014be034..9232f59a925 100644 --- a/gcc/config/i386/i386-modes.def +++ b/gcc/config/i386/i386-modes.def @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format); FLOAT_MODE (TF, 16, ieee_quad_format); +FLOAT_MODE (HF, 2, ieee_half_format); /* In ILP32 mode, XFmode has size 12 and alignment 4. In LP64 mode, XFmode has size and alignment 16. */ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ff96134fb37..7979e240426 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -387,6 +387,7 @@ enum x86_64_reg_class X86_64_INTEGER_CLASS, X86_64_INTEGERSI_CLASS, X86_64_SSE_CLASS, + X86_64_SSEHF_CLASS, X86_64_SSESF_CLASS, X86_64_SSEDF_CLASS, X86_64_SSEUP_CLASS, @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2) return X86_64_MEMORY_CLASS; /* Rule #4: If one of the classes is INTEGER, the result is INTEGER. */ - if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS) - || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS)) + if ((class1 == X86_64_INTEGERSI_CLASS + && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS)) + || (class2 == X86_64_INTEGERSI_CLASS + && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS))) return X86_64_INTEGERSI_CLASS; if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS) @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type, /* The partial classes are now full classes. */ if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4) subclasses[0] = X86_64_SSE_CLASS; + if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2) + subclasses[0] = X86_64_SSE_CLASS; if (subclasses[0] == X86_64_INTEGERSI_CLASS && !((bit_offset % 64) == 0 && bytes == 4)) subclasses[0] = X86_64_INTEGER_CLASS; @@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type, gcc_unreachable (); case E_CTImode: return 0; + case E_HFmode: + if (!(bit_offset % 64)) + classes[0] = X86_64_SSEHF_CLASS; + else + classes[0] = X86_64_SSE_CLASS; + return 1; case E_SFmode: if (!(bit_offset % 64)) classes[0] = X86_64_SSESF_CLASS; @@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type, classes[0] = X86_64_SSE_CLASS; classes[1] = X86_64_SSEUP_CLASS; return 2; + case E_HCmode: + classes[0] = X86_64_SSE_CLASS; + if (!(bit_offset % 64)) + return 1; + else + { + classes[1] = X86_64_SSEHF_CLASS; + return 2; + } case E_SCmode: classes[0] = X86_64_SSE_CLASS; if (!(bit_offset % 64)) @@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return, (*int_nregs)++; break; case X86_64_SSE_CLASS: + case X86_64_SSEHF_CLASS: case X86_64_SSESF_CLASS: case X86_64_SSEDF_CLASS: (*sse_nregs)++; @@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode, /* First construct simple cases. Avoid SCmode, since we want to use single register to pass this type. */ - if (n == 1 && mode != SCmode) + if (n == 1 && mode != SCmode && mode != HCmode) switch (regclass[0]) { case X86_64_INTEGER_CLASS: case X86_64_INTEGERSI_CLASS: return gen_rtx_REG (mode, intreg[0]); case X86_64_SSE_CLASS: + case X86_64_SSEHF_CLASS: case X86_64_SSESF_CLASS: case X86_64_SSEDF_CLASS: if (mode != BLKmode) @@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode, GEN_INT (i*8)); intreg++; break; + case X86_64_SSEHF_CLASS: + exp [nexps++] + = gen_rtx_EXPR_LIST (VOIDmode, + gen_rtx_REG (HFmode, + GET_SSE_REGNO (sse_regno)), + GEN_INT (i*8)); + sse_regno++; + break; case X86_64_SSESF_CLASS: exp [nexps++] = gen_rtx_EXPR_LIST (VOIDmode, @@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode, /* Most things go in %eax. */ regno = AX_REG; + /* Return _Float16/_Complex _Foat16 by sse register. */ + if (mode == HFmode) + regno = FIRST_SSE_REG; + if (mode == HCmode) + { + rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1)); + XVECEXP (ret, 0, 0) + = gen_rtx_EXPR_LIST (VOIDmode, + gen_rtx_REG (SImode, FIRST_SSE_REG), + GEN_INT (0)); + return ret; + } + /* Override FP return register with %xmm0 for local functions when SSE math is enabled or for functions with sseregparm attribute. */ if ((fn || fntype) && (mode == SFmode || mode == DFmode)) @@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode, switch (mode) { + case E_HFmode: + case E_HCmode: case E_SFmode: case E_SCmode: case E_DFmode: @@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code) (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P'); } + else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode) + { + long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x), + REAL_MODE_FORMAT (HFmode)); + if (ASSEMBLER_DIALECT == ASM_ATT) + putc ('$', file); + fprintf (file, "0x%04x", (unsigned int) l); + } + else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode) { long l; @@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass, return NO_REGS; } + /* Require movement to gpr, and then store to memory. */ + if (mode == HFmode + && !TARGET_SSE4_1 + && SSE_CLASS_P (rclass) + && !in_p && MEM_P (x)) + { + sri->extra_cost = 1; + return GENERAL_REGS; + } + /* This condition handles corner case where an expression involving pointers gets vectorized. We're trying to use the address of a stack slot as a vector initializer. @@ -21555,10 +21619,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode) return default_decimal_float_supported_p (); else if (mode == TFmode) return true; + else if (mode == HFmode && TARGET_SSE2) + return true; else return default_scalar_mode_supported_p (mode); } +/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE + if MODE is HFmode, and punt to the generic implementation otherwise. */ + +static bool +ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode) +{ + /* NB: Always return TRUE for HFmode so that the _Float16 type will + be defined by the C front-end for AVX512FP16 intrinsics. We will + issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't + enabled. */ + return ((mode == HFmode && TARGET_SSE2) + ? true + : default_libgcc_floating_mode_supported_p (mode)); +} + /* Implements target hook vector_mode_supported_p. */ static bool ix86_vector_mode_supported_p (machine_mode mode) @@ -23820,6 +23901,10 @@ ix86_run_selftests (void) #undef TARGET_SCALAR_MODE_SUPPORTED_P #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p +#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P +#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \ +ix86_libgcc_floating_mode_supported_p + #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 0c2c93daf32..b1e66ee192e 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define VALID_SSE2_REG_MODE(MODE) \ ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \ || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ - || (MODE) == V2DImode || (MODE) == DFmode) + || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode) #define VALID_SSE_REG_MODE(MODE) \ ((MODE) == V1TImode || (MODE) == TImode \ @@ -1047,6 +1047,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == CQImode || (MODE) == CHImode \ || (MODE) == CSImode || (MODE) == CDImode \ || (MODE) == SDmode || (MODE) == DDmode \ + || (MODE) == HFmode || (MODE) == HCmode \ || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ || (TARGET_64BIT \ && ((MODE) == TImode || (MODE) == CTImode \ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8b809c49fe0..d475347172d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF]) ;; All x87 floating point modes (define_mode_iterator X87MODEF [SF DF XF]) +;; All x87 floating point modes plus HF +(define_mode_iterator X87MODEFH [SF DF XF HF]) + ;; All SSE floating point modes (define_mode_iterator SSEMODEF [SF DF TF]) (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")]) @@ -3130,6 +3133,32 @@ (define_split operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx); }) +(define_insn "*pushhf_rex64" + [(set (match_operand:HF 0 "push_operand" "=X,X") + (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))] + "TARGET_64BIT" +{ + /* Anything else should be already split before reg-stack. */ + gcc_assert (which_alternative == 0); + return "push{q}\t%q1"; +} + [(set_attr "isa" "*,sse4") + (set_attr "type" "push,multi") + (set_attr "mode" "DI,TI")]) + +(define_insn "*pushhf" + [(set (match_operand:HF 0 "push_operand" "=X,X") + (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))] + "!TARGET_64BIT" +{ + /* Anything else should be already split before reg-stack. */ + gcc_assert (which_alternative == 0); + return "push{l}\t%k1"; +} + [(set_attr "isa" "*,sse4") + (set_attr "type" "push,multi") + (set_attr "mode" "SI,TI")]) + (define_insn "*pushsf_rex64" [(set (match_operand:SF 0 "push_operand" "=X,X,X") (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))] @@ -3158,10 +3187,11 @@ (define_insn "*pushsf" (set_attr "unit" "i387,*,*") (set_attr "mode" "SF,SI,SF")]) +(define_mode_iterator MODESH [SF HF]) ;; %%% Kill this when call knows how to work this out. (define_split - [(set (match_operand:SF 0 "push_operand") - (match_operand:SF 1 "any_fp_register_operand"))] + [(set (match_operand:MODESH 0 "push_operand") + (match_operand:MODESH 1 "any_fp_register_operand"))] "reload_completed" [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2))) (set (match_dup 0) (match_dup 1))] @@ -3209,8 +3239,8 @@ (define_expand "movtf" "ix86_expand_move (TFmode, operands); DONE;") (define_expand "mov" - [(set (match_operand:X87MODEF 0 "nonimmediate_operand") - (match_operand:X87MODEF 1 "general_operand"))] + [(set (match_operand:X87MODEFH 0 "nonimmediate_operand") + (match_operand:X87MODEFH 1 "general_operand"))] "" "ix86_expand_move (mode, operands); DONE;") @@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal" ] (const_string "*")))]) +(define_insn "*movhf_internal" + [(set (match_operand:HF 0 "nonimmediate_operand" + "=?r,?m,v,v,?r,m,?v,v") + (match_operand:HF 1 "general_operand" + "rmF,rF,C,v, v,v, r,m"))] + "!(MEM_P (operands[0]) && MEM_P (operands[1])) + && (lra_in_progress + || reload_completed + || !CONST_DOUBLE_P (operands[1]) + || (TARGET_SSE && TARGET_SSE_MATH + && standard_sse_constant_p (operands[1], HFmode) == 1) + || memory_operand (operands[0], HFmode))" +{ + switch (get_attr_type (insn)) + { + case TYPE_IMOV: + return "mov{w}\t{%1, %0|%0, %1}"; + + case TYPE_SSELOG1: + return standard_sse_constant_opcode (insn, operands); + + case TYPE_SSEMOV: + return ix86_output_ssemov (insn, operands); + + case TYPE_SSELOG: + if (SSE_REG_P (operands[0])) + return MEM_P (operands[1]) + ? "pinsrw\t{$0, %1, %0|%0, %1, 0}" + : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}"; + else + return MEM_P (operands[1]) + ? "pextrw\t{$0, %1, %0|%0, %1, 0}" + : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}"; + + default: + gcc_unreachable (); + } +} + [(set (attr "isa") + (cond [(eq_attr "alternative" "2,3,4,6,7") + (const_string "sse2") + (eq_attr "alternative" "5") + (const_string "sse4") + ] + (const_string "*"))) + (set (attr "type") + (cond [(eq_attr "alternative" "0,1") + (const_string "imov") + (eq_attr "alternative" "2") + (const_string "sselog1") + (eq_attr "alternative" "4,5,6,7") + (const_string "sselog") + ] + (const_string "ssemov"))) + (set (attr "memory") + (cond [(eq_attr "alternative" "4,6") + (const_string "none") + (eq_attr "alternative" "5") + (const_string "store") + (eq_attr "alternative" "7") + (const_string "load") + ] + (const_string "*"))) + (set (attr "prefix") + (cond [(eq_attr "alternative" "0,1") + (const_string "orig") + ] + (const_string "maybe_vex"))) + (set (attr "mode") + (cond [(eq_attr "alternative" "0,1") + (const_string "HI") + (eq_attr "alternative" "2") + (const_string "V4SF") + (eq_attr "alternative" "4,5,6,7") + (const_string "TI") + (eq_attr "alternative" "3") + (const_string "SF") + ] + (const_string "*")))]) + (define_split [(set (match_operand 0 "any_fp_register_operand") (match_operand 1 "memory_operand"))] diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index b83cd4919bb..f42fd633725 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128; @section Half-Precision Floating Point @cindex half-precision floating point @cindex @code{__fp16} data type +@cindex @code{__Float16} data type On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating point via the @code{__fp16} type defined in the ARM C Language Extensions. @@ -1150,6 +1151,18 @@ calls. It is recommended that portable code use the @code{_Float16} type defined by ISO/IEC TS 18661-3:2015. @xref{Floating Types}. +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision +(16-bit) floating point via the @code{_Float16} type which is defined by +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16} +which contains same data format as C. + +Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all +operations will be emulated by software emulation and the @code{float} +instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep +the intermediate result of the operation as 32-bit precision. This may lead +to inconsistent behavior between software emulation and AVX512-FP16 +instructions. + @node Decimal Float @section Decimal Floating Types @cindex decimal floating types diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index ff3b4449b37..775ee397836 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode, fix them all. */ if (omode == word_mode) ; + /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF)) + here. Though extract_bit_field is the culprit here, not the backends. */ + else if (known_gt (regsize, osize) && known_gt (osize, isize) + && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode)) + ; /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though store_bit_field is the culprit here, and not the backends. */ else if (known_ge (osize, regsize) && known_ge (isize, osize)) diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c index c13c7e45ac1..92f499643b5 100644 --- a/gcc/lto/lto-lang.c +++ b/gcc/lto/lto-lang.c @@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p) return unsigned_p ? unsigned_intTI_type_node : intTI_type_node; #endif + if (float16_type_node && mode == TYPE_MODE (float16_type_node)) + return float16_type_node; + if (mode == TYPE_MODE (float_type_node)) return float_type_node; diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c new file mode 100644 index 00000000000..ebc0af1490b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/float16-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-msse2 -O2" } */ +_Float16 +foo (int a) +{ + union { + int a; + _Float16 b; + }c; + c.a = a; + return c.b; +} diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c new file mode 100644 index 00000000000..1b645eb499d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-sse2" } */ + +_Float16/* { dg-error "is not supported on this target" } */ +foo (_Float16 x) /* { dg-error "is not supported on this target" } */ +{ + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c new file mode 100644 index 00000000000..3da7683fc31 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx512f" } */ + +union flt +{ + _Float16 flt; + short s; +}; + +_Float16 +foo (union flt x) +{ + return x.flt; +} + +/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c new file mode 100644 index 00000000000..60ff9d4ab80 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx512f" } */ + +#include + +_Complex _Float16 +foo (_Complex _Float16 x) +{ + return x; +} + +/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */ From patchwork Mon Aug 2 06:31:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1512278 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=nlq0BtEe; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GdSvB526bz9sRK for ; Mon, 2 Aug 2021 16:36:06 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 08F38383581A for ; Mon, 2 Aug 2021 06:36:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 08F38383581A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627886164; bh=yP2bYh9Dz25pMkooyId0pQpGk0ZppiaZy4FA1exf5Iw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=nlq0BtEe5pxRRst2E+4ktZuayvVyGCgHHTdXseWglUUc1NOoZYOEv6THE5E/PwOv/ Vgnm/SfzQhkVbL4SkPqs5X5OD68ftDagwULbPGyfi3k8NR7mrtu3yJuiX1BkjiWWEK z6tWUfvCCYmSozjbzNbfcsB+AJUmfs36xKJHsjhk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id 90BD43836028 for ; Mon, 2 Aug 2021 06:31:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 90BD43836028 X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="235322746" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="235322746" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 23:31:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="466188169" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga008.jf.intel.com with ESMTP; 01 Aug 2021 23:31:24 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1726VH1O022130; Sun, 1 Aug 2021 23:31:23 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/6] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations. Date: Mon, 2 Aug 2021 14:31:13 +0800 Message-Id: <20210802063116.999830-4-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210802063116.999830-1-hongtao.liu@intel.com> References: <20210802063116.999830-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: joseph@codesourcery.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" libgcc/ChangeLog: * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro. * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto. * config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto. * config/i386/t-softfp: Add hf soft-fp. * config.host: Add i386/64/t-softfp. * config/i386/64/t-softfp: New file. --- libgcc/config.host | 5 +---- libgcc/config/i386/32/sfp-machine.h | 1 + libgcc/config/i386/32/t-softfp | 1 + libgcc/config/i386/64/sfp-machine.h | 1 + libgcc/config/i386/64/t-softfp | 1 + libgcc/config/i386/sfp-machine.h | 1 + libgcc/config/i386/t-softfp | 5 +++++ 7 files changed, 11 insertions(+), 4 deletions(-) create mode 100644 libgcc/config/i386/64/t-softfp diff --git a/libgcc/config.host b/libgcc/config.host index 50f00062232..96da9ef1cce 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*) ;; i[34567]86-*-* | x86_64-*-*) tmake_file="${tmake_file} t-softfp-tf" - if test "${host_address}" = 32; then - tmake_file="${tmake_file} i386/${host_address}/t-softfp" - fi - tmake_file="${tmake_file} i386/t-softfp t-softfp" + tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp t-softfp" ;; esac diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h index 1fa282d7afe..e24cbc8d180 100644 --- a/libgcc/config/i386/32/sfp-machine.h +++ b/libgcc/config/i386/32/sfp-machine.h @@ -86,6 +86,7 @@ #define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) +#define _FP_NANFRAC_H _FP_QNANBIT_H #define _FP_NANFRAC_S _FP_QNANBIT_S #define _FP_NANFRAC_D _FP_QNANBIT_D, 0 /* Even if XFmode is 12byte, we have to pad it to diff --git a/libgcc/config/i386/32/t-softfp b/libgcc/config/i386/32/t-softfp index a48a5b3b116..86478cf5f20 100644 --- a/libgcc/config/i386/32/t-softfp +++ b/libgcc/config/i386/32/t-softfp @@ -3,3 +3,4 @@ softfp_int_modes := si di # Provide fallbacks for __builtin_copysignq and __builtin_fabsq. LIB2ADD += $(srcdir)/config/i386/32/tf-signs.c + diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h index 1ff94c23ea4..e1c616699bb 100644 --- a/libgcc/config/i386/64/sfp-machine.h +++ b/libgcc/config/i386/64/sfp-machine.h @@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_2_udiv(Q,R,X,Y) +#define _FP_NANFRAC_H _FP_QNANBIT_H #define _FP_NANFRAC_S _FP_QNANBIT_S #define _FP_NANFRAC_D _FP_QNANBIT_D #define _FP_NANFRAC_E _FP_QNANBIT_E, 0 diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp new file mode 100644 index 00000000000..f9d8b3a945c --- /dev/null +++ b/libgcc/config/i386/64/t-softfp @@ -0,0 +1 @@ +softfp_extras := fixhfti fixunshfti floattihf floatuntihf diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h index 8319f0550bc..f15d29d3755 100644 --- a/libgcc/config/i386/sfp-machine.h +++ b/libgcc/config/i386/sfp-machine.h @@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); #define _FP_KEEPNANFRACP 1 #define _FP_QNANNEGATEDP 0 +#define _FP_NANSIGN_H 1 #define _FP_NANSIGN_S 1 #define _FP_NANSIGN_D 1 #define _FP_NANSIGN_E 1 diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp index 685d9cf8502..4ac214eb0ce 100644 --- a/libgcc/config/i386/t-softfp +++ b/libgcc/config/i386/t-softfp @@ -1 +1,6 @@ LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c + +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf + +softfp_extras += eqhf2 \ No newline at end of file From patchwork Mon Aug 2 06:31:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1512276 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=siVG4+Z7; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GdSs2104nz9sRK for ; Mon, 2 Aug 2021 16:34:14 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CFAB53835C24 for ; Mon, 2 Aug 2021 06:34:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CFAB53835C24 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627886051; bh=lF3PKcYKaYAElmvLVj25nxWms5UucmN8X5Y2/0VJVt0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=siVG4+Z7xCXwrGP9OXc3iHgMz8uGDSjaxbq1wQHnKLX9cK72NYdxiHrw1i+CuQom+ gnktcvJ5g3hhhAFed5fUftomaEQfDctowFW/57kmTA/8RuFi8U22JB+scOmo8hFB8Q E0n/U8Djgc7zyGfIw8FlT3W7QZoSVxMxrL/U9fFA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 6ED3E387549C for ; Mon, 2 Aug 2021 06:31:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6ED3E387549C X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="274461983" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="274461983" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 23:31:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="666456994" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 01 Aug 2021 23:31:26 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1726VH1P022130; Sun, 1 Aug 2021 23:31:25 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16. Date: Mon, 2 Aug 2021 14:31:14 +0800 Message-Id: <20210802063116.999830-5-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210802063116.999830-1-hongtao.liu@intel.com> References: <20210802063116.999830-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: joseph@codesourcery.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ada/ChangeLog: * gcc-interface/misc.c (gnat_post_options): Issue an error for -fexcess-precision=16. gcc/c-family/ChangeLog: * c-common.c (excess_precision_mode_join): Update below comments. (c_ts18661_flt_eval_method): Set excess_precision_type to EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16. * c-cppbuiltin.c (cpp_atomic_builtins): Update below comments. (c_cpp_flt_eval_method_iec_559): Set excess_precision_type to EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16. gcc/ChangeLog: * common.opt: Support -fexcess-precision=16. * config/aarch64/aarch64.c (aarch64_excess_precision): Return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when EXCESS_PRECISION_TYPE_FLOAT16. * config/arm/arm.c (arm_excess_precision): Ditto. * config/i386/i386.c (ix86_get_excess_precision): Ditto. * config/m68k/m68k.c (m68k_excess_precision): Issue an error when EXCESS_PRECISION_TYPE_FLOAT16. * config/s390/s390.c (s390_excess_precision): Ditto. * coretypes.h (enum excess_precision_type): Add EXCESS_PRECISION_TYPE_FLOAT16. * doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents. * doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto. * doc/extend.texi (Half-Precision): Document -fexcess-precision=16. * flag-types.h (enum excess_precision): Add EXCESS_PRECISION_FLOAT16. * target.def (excess_precision): Update document. * tree.c (excess_precision_type): Set excess_precision_type to EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16. gcc/fortran/ChangeLog: * options.c (gfc_post_options): Issue an error for -fexcess-precision=16. gcc/testsuite/ChangeLog: * gcc.target/i386/float16-6.c: New test. --- gcc/ada/gcc-interface/misc.c | 3 +++ gcc/c-family/c-common.c | 6 ++++-- gcc/c-family/c-cppbuiltin.c | 6 ++++-- gcc/common.opt | 5 ++++- gcc/config/aarch64/aarch64.c | 1 + gcc/config/arm/arm.c | 1 + gcc/config/i386/i386.c | 2 ++ gcc/config/m68k/m68k.c | 2 ++ gcc/config/s390/s390.c | 2 ++ gcc/coretypes.h | 3 ++- gcc/doc/extend.texi | 3 ++- gcc/doc/tm.texi | 14 ++++++++++---- gcc/doc/tm.texi.in | 3 +++ gcc/flag-types.h | 3 ++- gcc/fortran/options.c | 3 +++ gcc/target.def | 11 +++++++---- gcc/testsuite/gcc.target/i386/float16-6.c | 8 ++++++++ gcc/tree.c | 3 ++- 18 files changed, 62 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index 186367ac6d1..96199bd4b63 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -256,6 +256,9 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED) /* Excess precision other than "fast" requires front-end support. */ if (flag_excess_precision == EXCESS_PRECISION_STANDARD) sorry ("%<-fexcess-precision=standard%> for Ada"); + else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16) + sorry ("%<-fexcess-precision=16%> for Ada"); + flag_excess_precision = EXCESS_PRECISION_FAST; /* No psABI change warnings for Ada. */ diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index aacdfb46a02..7e72062c77c 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -8772,7 +8772,7 @@ excess_precision_mode_join (enum flt_eval_method x, This relates to the effective excess precision seen by the user, which is the join point of the precision the target requests for - -fexcess-precision={standard,fast} and the implicit excess precision + -fexcess-precision={standard,fast,16} and the implicit excess precision the target uses. */ static enum flt_eval_method @@ -8784,7 +8784,9 @@ c_ts18661_flt_eval_method (void) enum excess_precision_type flag_type = (flag_excess_precision == EXCESS_PRECISION_STANDARD ? EXCESS_PRECISION_TYPE_STANDARD - : EXCESS_PRECISION_TYPE_FAST); + : (flag_excess_precision == EXCESS_PRECISION_FLOAT16 + ? EXCESS_PRECISION_TYPE_FLOAT16 + : EXCESS_PRECISION_TYPE_FAST)); enum flt_eval_method requested = targetm.c.excess_precision (flag_type); diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index f79f939bd10..5f30354a33c 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -753,7 +753,7 @@ cpp_atomic_builtins (cpp_reader *pfile) /* Return TRUE if the implicit excess precision in which the back-end will compute floating-point calculations is not more than the explicit excess precision that the front-end will apply under - -fexcess-precision=[standard|fast]. + -fexcess-precision=[standard|fast|16]. More intuitively, return TRUE if the excess precision proposed by the front-end is the excess precision that will actually be used. */ @@ -764,7 +764,9 @@ c_cpp_flt_eval_method_iec_559 (void) enum excess_precision_type front_end_ept = (flag_excess_precision == EXCESS_PRECISION_STANDARD ? EXCESS_PRECISION_TYPE_STANDARD - : EXCESS_PRECISION_TYPE_FAST); + : (flag_excess_precision == EXCESS_PRECISION_FLOAT16 + ? EXCESS_PRECISION_TYPE_FLOAT16 + : EXCESS_PRECISION_TYPE_FAST)); enum flt_eval_method back_end = targetm.c.excess_precision (EXCESS_PRECISION_TYPE_IMPLICIT); diff --git a/gcc/common.opt b/gcc/common.opt index d9da1131eda..3dd74766400 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1518,7 +1518,7 @@ Perform a number of minor, expensive optimizations. fexcess-precision= Common Joined RejectNegative Enum(excess_precision) Var(flag_excess_precision) Init(EXCESS_PRECISION_DEFAULT) Optimization SetByCombined --fexcess-precision=[fast|standard] Specify handling of excess floating-point precision. +-fexcess-precision=[fast|standard|16] Specify handling of excess floating-point precision. Enum Name(excess_precision) Type(enum excess_precision) UnknownError(unknown excess precision style %qs) @@ -1529,6 +1529,9 @@ Enum(excess_precision) String(fast) Value(EXCESS_PRECISION_FAST) EnumValue Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD) +EnumValue +Enum(excess_precision) String(16) Value(EXCESS_PRECISION_FLOAT16) + ; Whether we permit the extended set of values for FLT_EVAL_METHOD ; introduced in ISO/IEC TS 18661-3, or limit ourselves to those in C99/C11. fpermitted-flt-eval-methods= diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3bdf19d71b5..c986a93a243 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -24797,6 +24797,7 @@ aarch64_excess_precision (enum excess_precision_type type) ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT); case EXCESS_PRECISION_TYPE_IMPLICIT: + case EXCESS_PRECISION_TYPE_FLOAT16: return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16; default: gcc_unreachable (); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6d781e23ee9..e2a18615860 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -25599,6 +25599,7 @@ arm_excess_precision (enum excess_precision_type type) ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT); case EXCESS_PRECISION_TYPE_IMPLICIT: + case EXCESS_PRECISION_TYPE_FLOAT16: return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16; default: gcc_unreachable (); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7979e240426..dc673c89bc8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type) return (type == EXCESS_PRECISION_TYPE_STANDARD ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT : FLT_EVAL_METHOD_UNPREDICTABLE); + case EXCESS_PRECISION_TYPE_FLOAT16: + return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16; default: gcc_unreachable (); } diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index 3f63c60fa92..2fef457c09e 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -7115,6 +7115,8 @@ m68k_excess_precision (enum excess_precision_type type) return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE; + case EXCESS_PRECISION_TYPE_FLOAT16: + error ("%<-fexcess-precision=16%> is not supported on this target"); default: gcc_unreachable (); } diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index b1d3b99784d..234ee4ac9c4 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16515,6 +16515,8 @@ s390_excess_precision (enum excess_precision_type type) ensure consistency with the implementation in glibc, report that float is evaluated to the range and precision of double. */ return FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE; + case EXCESS_PRECISION_TYPE_FLOAT16: + error ("%<-fexcess-precision=16%> is not supported on this target"); default: gcc_unreachable (); } diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 406572e947d..07b9aa656c5 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -424,7 +424,8 @@ enum excess_precision_type { EXCESS_PRECISION_TYPE_IMPLICIT, EXCESS_PRECISION_TYPE_STANDARD, - EXCESS_PRECISION_TYPE_FAST + EXCESS_PRECISION_TYPE_FAST, + EXCESS_PRECISION_TYPE_FLOAT16 }; /* Level of size optimization. */ diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index f42fd633725..3a1978efc97 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -1161,7 +1161,8 @@ operations will be emulated by software emulation and the @code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 -instructions. +instructions. Using @option{-fexcess-precision=16} and will force round +back after each operation. @node Decimal Float @section Decimal Floating Types diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c8f4abe3e41..9fac173a217 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -982,20 +982,26 @@ Do not define this macro if it would never modify @var{m}. Return a value, with the same meaning as the C99 macro @code{FLT_EVAL_METHOD} that describes which excess precision should be applied. @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT}, -@code{EXCESS_PRECISION_TYPE_FAST}, or -@code{EXCESS_PRECISION_TYPE_STANDARD}. For +@code{EXCESS_PRECISION_TYPE_FAST}, +@code{EXCESS_PRECISION_TYPE_STANDARD}, or +@code{EXCESS_PRECISION_TYPE_FLOAT16}. For @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which precision and range operations will be implictly evaluated in regardless of the excess precision explicitly added. For -@code{EXCESS_PRECISION_TYPE_STANDARD} and +@code{EXCESS_PRECISION_TYPE_STANDARD}, +@code{EXCESS_PRECISION_TYPE_FLOAT16}, and @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the explicit excess precision that should be added depending on the value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}. Note that unpredictable explicit excess precision does not make sense, so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE} -when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or +when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD}, +@code{EXCESS_PRECISION_TYPE_FLOAT16} or @code{EXCESS_PRECISION_TYPE_FAST}. @end deftypefn +Return a value, with the same meaning as the C99 macro +@code{FLT_EVAL_METHOD} that describes which excess precision should be +applied. @deftypefn {Target Hook} machine_mode TARGET_PROMOTE_FUNCTION_MODE (const_tree @var{type}, machine_mode @var{mode}, int *@var{punsignedp}, const_tree @var{funtype}, int @var{for_return}) Like @code{PROMOTE_MODE}, but it is applied to outgoing function arguments or diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9c4b5016053..90a8d790758 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -929,6 +929,9 @@ Do not define this macro if it would never modify @var{m}. @end defmac @hook TARGET_C_EXCESS_PRECISION +Return a value, with the same meaning as the C99 macro +@code{FLT_EVAL_METHOD} that describes which excess precision should be +applied. @hook TARGET_PROMOTE_FUNCTION_MODE diff --git a/gcc/flag-types.h b/gcc/flag-types.h index e43d1de490d..5eeb5046222 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -198,7 +198,8 @@ enum excess_precision { EXCESS_PRECISION_DEFAULT, EXCESS_PRECISION_FAST, - EXCESS_PRECISION_STANDARD + EXCESS_PRECISION_STANDARD, + EXCESS_PRECISION_FLOAT16 }; /* The options for which values of FLT_EVAL_METHOD are permissible. */ diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c index 1723f689a57..847e20e8829 100644 --- a/gcc/fortran/options.c +++ b/gcc/fortran/options.c @@ -267,6 +267,9 @@ gfc_post_options (const char **pfilename) support. */ if (flag_excess_precision == EXCESS_PRECISION_STANDARD) sorry ("%<-fexcess-precision=standard%> for Fortran"); + else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16) + sorry ("%<-fexcess-precision=16%> for Fortran"); + flag_excess_precision = EXCESS_PRECISION_FAST; /* Fortran allows associative math - but we cannot reassociate if diff --git a/gcc/target.def b/gcc/target.def index 2e40448e6c5..b0bd79a0671 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6192,18 +6192,21 @@ DEFHOOK "Return a value, with the same meaning as the C99 macro\n\ @code{FLT_EVAL_METHOD} that describes which excess precision should be\n\ applied. @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},\n\ -@code{EXCESS_PRECISION_TYPE_FAST}, or\n\ -@code{EXCESS_PRECISION_TYPE_STANDARD}. For\n\ +@code{EXCESS_PRECISION_TYPE_FAST},\n\ +@code{EXCESS_PRECISION_TYPE_STANDARD}, or\n\ +@code{EXCESS_PRECISION_TYPE_FLOAT16}. For\n\ @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which\n\ precision and range operations will be implictly evaluated in regardless\n\ of the excess precision explicitly added. For\n\ -@code{EXCESS_PRECISION_TYPE_STANDARD} and\n\ +@code{EXCESS_PRECISION_TYPE_STANDARD}, \n\ +@code{EXCESS_PRECISION_TYPE_FLOAT16}, and\n\ @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the\n\ explicit excess precision that should be added depending on the\n\ value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.\n\ Note that unpredictable explicit excess precision does not make sense,\n\ so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}\n\ -when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or\n\ +when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},\n\ +@code{EXCESS_PRECISION_TYPE_FLOAT16} or\n\ @code{EXCESS_PRECISION_TYPE_FAST}.", enum flt_eval_method, (enum excess_precision_type type), default_excess_precision) diff --git a/gcc/testsuite/gcc.target/i386/float16-6.c b/gcc/testsuite/gcc.target/i386/float16-6.c new file mode 100644 index 00000000000..599f4495086 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/float16-6.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-msse2 -O2 -fdump-tree-gimple -fexcess-precision=16" } */ +/* { dg-final { scan-tree-dump-not "\\(float\\)" "gimple" } } */ +_Float16 +foo (_Float16 a, _Float16 b, _Float16 c) +{ + return a + b + c; +} diff --git a/gcc/tree.c b/gcc/tree.c index bead1ac134c..20dfbe00b88 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -7633,7 +7633,8 @@ excess_precision_type (tree type) enum excess_precision_type requested_type = (flag_excess_precision == EXCESS_PRECISION_FAST ? EXCESS_PRECISION_TYPE_FAST - : EXCESS_PRECISION_TYPE_STANDARD); + : (flag_excess_precision == EXCESS_PRECISION_FLOAT16 + ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD)); enum flt_eval_method target_flt_eval_method = targetm.c.excess_precision (requested_type); From patchwork Mon Aug 2 06:44:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1512285 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=KOXVlFzY; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GdT5S52rRz9sT6 for ; Mon, 2 Aug 2021 16:44:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4DD3F3835417 for ; Mon, 2 Aug 2021 06:44:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4DD3F3835417 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627886697; bh=72Z06iGOxpQhv3b0Tjruh5sJ+xg7E7BzNY5/w11UA3c=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=KOXVlFzYiG2sYNL5u8KjgFOu9pLxl0mD1KMpsIyt+Rm4IrLDsfETQng58nLpWvBNO 04uuSENZG5FImrkFTTjDgMRxVX7/HRRw2UnCyoxZGt7xsbjKSQ99y8UyEY6Xf4/W8V bJYEJ2qJN1CFrt/AupnjNPlUZvwNtZVCO3VQDSj8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 8892A383D024 for ; Mon, 2 Aug 2021 06:44:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8892A383D024 X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="200598486" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="200598486" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 23:44:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="520295835" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga002.fm.intel.com with ESMTP; 01 Aug 2021 23:44:29 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1726iQ9j031246; Sun, 1 Aug 2021 23:44:26 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions. Date: Mon, 2 Aug 2021 14:44:26 +0800 Message-Id: <20210802064426.1001702-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210802063116.999830-1-hongtao.liu@intel.com> References: <20210802063116.999830-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: "Guo, Xuepeng" , Xu Dianhong , Wang Hongyu , "H . J . Lu" , Liu Hongtao , joseph@codesourcery.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: "Guo, Xuepeng" gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect FEATURE_AVX512FP16. * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512FP16_SET, OPTION_MASK_ISA_AVX512FP16_UNSET, OPTION_MASK_ISA2_AVX512FP16_SET, OPTION_MASK_ISA2_AVX512FP16_UNSET): New. (OPTION_MASK_ISA2_AVX512BW_UNSET, OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16. (ix86_handle_option): Handle -mavx512fp16. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AVX512FP16. * common/config/i386/i386-isas.h: Add entry for AVX512FP16. * config.gcc: Add avx512fp16intrin.h. * config/i386/avx512fp16intrin.h: New intrinsic header. * config/i386/cpuid.h: Add bit_AVX512FP16. * config/i386/i386-builtin-types.def: (FLOAT16): New primitive type. * config/i386/i386-builtins.c: Support _Float16 type for i386 backend. (ix86_init_float16_builtins): New function. (ix86_float16_type_node): New. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX512FP16__. * config/i386/i386-expand.c (ix86_expand_branch): Support HFmode. (ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH && SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P. (ix86_expand_fp_movcc): Ditto. * config/i386/i386-isa.def: Add PTA define for AVX512FP16. * config/i386/i386-options.c (isa2_opts): Add -mavx512fp16. (ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute. * config/i386/i386.c (ix86_get_ssemov): Use vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector. (ix86_get_excess_precision): Use FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16 existed. (sse_store_index): Use SFmode cost for HFmode cost. (inline_memory_move_cost): Add HFmode, and perfer SSE cost over GPR cost for HFmode. (ix86_hard_regno_mode_ok): Allow HImode in sse register. (ix86_mangle_type): Add manlging for _Float16 type. (inline_secondary_memory_needed): No memory is needed for 16bit movement between gpr and sse reg under TARGET_AVX512FP16. (ix86_multiplication_cost): Adjust TARGET_SSE_MATH && SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P. (ix86_division_cost): Ditto. (ix86_rtx_costs): Ditto. (ix86_add_stmt_cost): Ditto. (ix86_optab_supported_p): Ditto. * config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode. (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode. (PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16. * config/i386/i386.md (mode): Add HFmode. (MODE_SIZE): Add HFmode. (isa): Add avx512fp16. (enabled): Handle avx512fp16. (ssemodesuffix): Add sh suffix for HFmode. (comm): Add mult, div. (plusminusmultdiv): New code iterator. (insn): Add mult, div. (*movhf_internal): Adjust for avx512fp16 instruction. (*movhi_internal): Ditto. (*cmpihf): New define_insn for HFmode. (*ieee_shf3): Likewise. (extendhf2): Likewise. (trunchf2): Likewise. (floathf2): Likewise. (*hf): Likewise. (cbranchhf4): New expander. (movhfcc): Likewise. (hf3): Likewise. (mulhf3): Likewise. (divhf3): Likewise. * config/i386/i386.opt: Add mavx512fp16. * config/i386/immintrin.h: Include avx512fp16intrin.h. * doc/invoke.texi: Add mavx512fp16. * doc/extend.texi: Add avx512fp16 Usage Notes. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options. * gcc.target/i386/avx-2.c: Ditto. * gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16. * gcc.target/i386/funcspec-56.inc: Add new target attribute check. * gcc.target/i386/sse-13.c: Add -mavx512fp16. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp: (check_effective_target_avx512fp16): New. * g++.target/i386/float16-1.C: New test. * g++.target/i386/float16-2.C: Ditto. * g++.target/i386/float16-3.C: Ditto. * gcc.target/i386/avx512fp16-12a.c: Ditto. * gcc.target/i386/avx512fp16-12b.c: Ditto. * gcc.target/i386/float16-3a.c: Ditto. * gcc.target/i386/float16-3b.c: Ditto. * gcc.target/i386/float16-4a.c: Ditto. * gcc.target/i386/float16-4b.c: Ditto. * gcc.target/i386/pr54855-12.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. Co-Authored-By: H.J. Lu Co-Authored-By: Liu Hongtao Co-Authored-By: Wang Hongyu Co-Authored-By: Xu Dianhong --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.c | 26 ++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 1 + gcc/config.gcc | 2 +- gcc/config/i386/avx512fp16intrin.h | 53 ++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin-types.def | 1 + gcc/config/i386/i386-builtins.c | 23 +++ gcc/config/i386/i386-c.c | 2 + gcc/config/i386/i386-expand.c | 5 +- gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.c | 4 +- gcc/config/i386/i386.c | 133 ++++++++++---- gcc/config/i386/i386.h | 11 +- gcc/config/i386/i386.md | 172 ++++++++++++++++-- gcc/config/i386/i386.opt | 4 + gcc/config/i386/immintrin.h | 4 + gcc/doc/extend.texi | 8 + gcc/doc/invoke.texi | 10 +- gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/g++.target/i386/float16-1.C | 8 + gcc/testsuite/g++.target/i386/float16-2.C | 14 ++ gcc/testsuite/g++.target/i386/float16-3.C | 10 + gcc/testsuite/gcc.target/i386/avx-1.c | 2 +- gcc/testsuite/gcc.target/i386/avx-2.c | 2 +- gcc/testsuite/gcc.target/i386/avx512-check.h | 3 + .../gcc.target/i386/avx512fp16-12a.c | 21 +++ .../gcc.target/i386/avx512fp16-12b.c | 27 +++ gcc/testsuite/gcc.target/i386/float16-3a.c | 10 + gcc/testsuite/gcc.target/i386/float16-3b.c | 10 + gcc/testsuite/gcc.target/i386/float16-4a.c | 10 + gcc/testsuite/gcc.target/i386/float16-4b.c | 10 + gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/pr54855-12.c | 14 ++ gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 13 +- 41 files changed, 558 insertions(+), 76 deletions(-) create mode 100644 gcc/config/i386/avx512fp16intrin.h create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index 458f41de776..1835ac64e67 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model, set_feature (FEATURE_AVX5124FMAPS); if (edx & bit_AVX512VP2INTERSECT) set_feature (FEATURE_AVX512VP2INTERSECT); + if (edx & bit_AVX512FP16) + set_feature (FEATURE_AVX512FP16); } __cpuid_count (7, 1, eax, ebx, ecx, edx); diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 76ab1a14e54..00c65ba15ab 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW #define OPTION_MASK_ISA_AVX512VBMI2_SET \ (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET) +#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET +#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16 #define OPTION_MASK_ISA_AVX512VNNI_SET \ (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET) #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI @@ -231,6 +233,8 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2 +#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET +#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16 #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ @@ -313,7 +317,8 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA2_AVX512BF16_UNSET \ | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \ | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \ - | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET) + | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \ + | OPTION_MASK_ISA2_AVX512FP16_UNSET) #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \ (OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET @@ -326,7 +331,9 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET) #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET -#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET +#define OPTION_MASK_ISA2_AVX512BW_UNSET \ + (OPTION_MASK_ISA2_AVX512BF16_UNSET \ + | OPTION_MASK_ISA2_AVX512FP16_UNSET) /* Set 1 << value as value of -malign-FLAG option. */ @@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavx512fp16: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET; + } + return true; + case OPT_mavx512vnni: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index e68dd656046..4e0659fc7b2 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -228,6 +228,7 @@ enum processor_features FEATURE_AESKLE, FEATURE_WIDEKL, FEATURE_AVXVNNI, + FEATURE_AVX512FP16, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index 898c18f3dda..a6783660278 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL) ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl") ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni") + ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 3df9b52cf25..a354351408c 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*) tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h - mwaitintrin.h" + mwaitintrin.h avx512fp16intrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h new file mode 100644 index 00000000000..38d63161ba6 --- /dev/null +++ b/gcc/config/i386/avx512fp16intrin.h @@ -0,0 +1,53 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef __AVX512FP16INTRIN_H_INCLUDED +#define __AVX512FP16INTRIN_H_INCLUDED + +#ifndef __AVX512FP16__ +#pragma GCC push_options +#pragma GCC target("avx512fp16") +#define __DISABLE_AVX512FP16__ +#endif /* __AVX512FP16__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16))); +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32))); +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); + +#ifdef __DISABLE_AVX512FP16__ +#undef __DISABLE_AVX512FP16__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512FP16__ */ + +#endif /* __AVX512FP16INTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index aebc17c6827..82b8050028b 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -126,6 +126,7 @@ #define bit_AVX5124VNNIW (1 << 2) #define bit_AVX5124FMAPS (1 << 3) #define bit_AVX512VP2INTERSECT (1 << 8) +#define bit_AVX512FP16 (1 << 23) #define bit_IBT (1 << 20) #define bit_UINTR (1 << 5) #define bit_PCONFIG (1 << 18) diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 3ca313c19ec..1768b88d748 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node) DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node) DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node) DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node) +DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node) DEF_PRIMITIVE_TYPE (FLOAT, float_type_node) DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node) DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node) diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c index 204e2903126..668f09f12a0 100644 --- a/gcc/config/i386/i386-builtins.c +++ b/gcc/config/i386/i386-builtins.c @@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX, /* Table for the ix86 builtin non-function types. */ static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1]; +tree ix86_float16_type_node = NULL_TREE; /* Retrieve an element from the above table, building some of the types lazily. */ @@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void) BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv); } +static void +ix86_init_float16_builtins (void) +{ + /* Provide the _Float16 type and float16_type_node if needed so that + it can be used in AVX512FP16 intrinsics and builtins. */ + if (!float16_type_node) + { + ix86_float16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (ix86_float16_type_node) = 16; + SET_TYPE_MODE (ix86_float16_type_node, HFmode); + layout_type (ix86_float16_type_node); + } + else + ix86_float16_type_node = float16_type_node; + + if (!maybe_get_identifier ("_Float16") && TARGET_SSE2) + lang_hooks.types.register_builtin_type (ix86_float16_type_node, + "_Float16"); +} + static void ix86_init_builtin_types (void) { @@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void) it. */ lang_hooks.types.register_builtin_type (float128_type_node, "__float128"); + ix86_init_float16_builtins (); + const_string_type_node = build_pointer_type (build_qualified_type (char_type_node, TYPE_QUAL_CONST)); diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 5ed0de006fb..cc64f855ecc 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__PTWRITE__"); if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16) def_or_undef (parse_in, "__AVX512BF16__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16) + def_or_undef (parse_in, "__AVX512FP16__"); if (TARGET_MMX_WITH_SSE) def_or_undef (parse_in, "__MMX_WITH_SSE__"); if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 69ea79e6123..b7d050a1e42 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label) switch (mode) { + case E_HFmode: case E_SFmode: case E_DFmode: case E_XFmode: @@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1) bool unordered_compare = ix86_unordered_fp_compare (code); rtx op0 = *pop0, op1 = *pop1; machine_mode op_mode = GET_MODE (op0); - bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode); + bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode); /* All of the unordered compare instructions only work on registers. The same is true of the fcomi compare instructions. The XFmode @@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[]) rtx op0 = XEXP (operands[1], 0); rtx op1 = XEXP (operands[1], 1); - if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode)) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) { machine_mode cmode; diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index a0d46cbc892..83d9302ea3d 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -108,3 +108,4 @@ DEF_PTA(HRESET) DEF_PTA(KL) DEF_PTA(WIDEKL) DEF_PTA(AVXVNNI) +DEF_PTA(AVX512FP16) diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 3416a4f1752..df191763e4b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mhreset", OPTION_MASK_ISA2_HRESET }, { "-mkl", OPTION_MASK_ISA2_KL }, { "-mwidekl", OPTION_MASK_ISA2_WIDEKL }, - { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI } + { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, + { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 } }; static struct ix86_target_opts isa_opts[] = { @@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16), IX86_ATTR_ISA ("hreset", OPT_mhreset), IX86_ATTR_ISA ("avxvnni", OPT_mavxvnni), + IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index dc673c89bc8..71bbcf968c5 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands) case MODE_SI: return "%vmovd\t{%1, %0|%0, %1}"; + case MODE_HI: + if (GENERAL_REG_P (operands[0])) + return "vmovw\t{%1, %k0|%k0, %1}"; + else if (GENERAL_REG_P (operands[1])) + return "vmovw\t{%k1, %0|%0, %k1}"; + else + return "vmovw\t{%1, %0|%0, %1}"; + case MODE_DF: if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1])) return "vmovsd\t{%d1, %0|%0, %d1}"; @@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands) else return "%vmovss\t{%1, %0|%0, %1}"; + case MODE_HF: + if (REG_P (operands[0]) && REG_P (operands[1])) + return "vmovsh\t{%d1, %0|%0, %d1}"; + else + return "vmovsh\t{%1, %0|%0, %1}"; + case MODE_V1DF: gcc_assert (!TARGET_AVX); return "movlpd\t{%1, %0|%0, %1}"; @@ -13955,7 +13969,7 @@ output_387_binary_op (rtx_insn *insn, rtx *operands) if (is_sse) { - p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd"; + p = (GET_MODE (operands[0]) == SFmode ? "ss" : "sd"); strcat (buf, p); if (TARGET_AVX) @@ -19132,10 +19146,19 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1, if (!TARGET_SSE2) return true; + if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))) + return true; + + int msize = GET_MODE_SIZE (mode); + /* Between SSE and general, we have moves no larger than word size. */ - if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)) - || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode) - || GET_MODE_SIZE (mode) > UNITS_PER_WORD) + if (msize > UNITS_PER_WORD) + return true; + + /* In addition to SImode moves, AVX512FP16 also enables HImode moves. */ + int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode); + + if (msize < minsize) return true; /* If the target says that inter-unit moves are more expensive @@ -19229,21 +19252,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, static inline int sse_store_index (machine_mode mode) { - switch (GET_MODE_SIZE (mode)) - { - case 4: - return 0; - case 8: - return 1; - case 16: - return 2; - case 32: - return 3; - case 64: - return 4; - default: - return -1; - } + /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store + costs to processor_costs, which requires changes to all entries in + processor cost table. */ + if (mode == E_HFmode) + mode = E_SFmode; + switch (GET_MODE_SIZE (mode)) + { + case 4: + return 0; + case 8: + return 1; + case 16: + return 2; + case 32: + return 3; + case 64: + return 4; + default: + return -1; + } } /* Return the cost of moving data of mode M between a @@ -19270,6 +19298,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in) int index; switch (mode) { + case E_HFmode: case E_SFmode: index = 0; break; @@ -19370,11 +19399,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in) } break; case 2: - if (in == 2) - return MAX (ix86_cost->hard_register.int_load[1], - ix86_cost->hard_register.int_store[1]); - return in ? ix86_cost->hard_register.int_load[1] - : ix86_cost->hard_register.int_store[1]; + { + int cost; + if (in == 2) + cost = MAX (ix86_cost->hard_register.int_load[1], + ix86_cost->hard_register.int_store[1]); + else + cost = in ? ix86_cost->hard_register.int_load[1] + : ix86_cost->hard_register.int_store[1]; + if (mode == E_HFmode) + { + /* Prefer SSE over GPR for HFmode. */ + int sse_cost; + int index = sse_store_index (mode); + if (in == 2) + sse_cost = MAX (ix86_cost->hard_register.sse_load[index], + ix86_cost->hard_register.sse_store[index]); + else + sse_cost = (in + ? ix86_cost->hard_register.sse_load [index] + : ix86_cost->hard_register.sse_store [index]); + if (sse_cost >= cost) + cost = sse_cost + 1; + } + return cost; + } default: if (in == 2) cost = MAX (ix86_cost->hard_register.int_load[2], @@ -19548,6 +19597,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) - XI mode - any of 512-bit wide vector mode - any scalar mode. */ + /* For AVX512FP16, vmovw supports movement of HImode + between gpr and sse registser. */ if (TARGET_AVX512F && (mode == XImode || VALID_AVX512F_REG_MODE (mode) @@ -19831,7 +19882,7 @@ ix86_multiplication_cost (const struct processor_costs *cost, if (VECTOR_MODE_P (mode)) inner_mode = GET_MODE_INNER (mode); - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) return inner_mode == DFmode ? cost->mulsd : cost->mulss; else if (X87_FLOAT_MODE_P (mode)) return cost->fmul; @@ -19883,7 +19934,7 @@ ix86_division_cost (const struct processor_costs *cost, if (VECTOR_MODE_P (mode)) inner_mode = GET_MODE_INNER (mode); - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) return inner_mode == DFmode ? cost->divsd : cost->divss; else if (X87_FLOAT_MODE_P (mode)) return cost->fdiv; @@ -20303,7 +20354,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, return true; } - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) { *total = cost->addss; return false; @@ -20336,7 +20387,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, /* FALLTHRU */ case NEG: - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) { *total = cost->sse_op; return false; @@ -20418,14 +20469,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, return false; case FLOAT_EXTEND: - if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)) + if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) *total = 0; else *total = ix86_vec_cost (mode, cost->addss); return false; case FLOAT_TRUNCATE: - if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)) + if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) *total = cost->fadd; else *total = ix86_vec_cost (mode, cost->addss); @@ -20435,7 +20486,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, /* SSE requires memory load for the constant operand. It may make sense to account for this. Of course the constant operand may or may not be reused. */ - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) *total = cost->sse_op; else if (X87_FLOAT_MODE_P (mode)) *total = cost->fabs; @@ -20444,7 +20495,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, return false; case SQRT: - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd; else if (X87_FLOAT_MODE_P (mode)) *total = cost->fsqrt; @@ -21928,6 +21979,10 @@ ix86_mangle_type (const_tree type) switch (TYPE_MODE (type)) { + case E_HFmode: + /* _Float16 is "DF16_". + Align with clang's decision in https://reviews.llvm.org/D33719. */ + return "DF16_"; case E_TFmode: /* __float128 is "g". */ return "g"; @@ -22551,7 +22606,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count, case MINUS_EXPR: if (kind == scalar_stmt) { - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) stmt_cost = ix86_cost->addss; else if (X87_FLOAT_MODE_P (mode)) stmt_cost = ix86_cost->fadd; @@ -22569,7 +22624,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count, stmt_cost = ix86_multiplication_cost (ix86_cost, mode); break; case NEGATE_EXPR: - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) stmt_cost = ix86_cost->sse_op; else if (X87_FLOAT_MODE_P (mode)) stmt_cost = ix86_cost->fchs; @@ -22625,7 +22680,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count, case BIT_XOR_EXPR: case BIT_AND_EXPR: case BIT_NOT_EXPR: - if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode)) stmt_cost = ix86_cost->sse_op; else if (VECTOR_MODE_P (mode)) stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op); @@ -23327,14 +23382,18 @@ ix86_get_excess_precision (enum excess_precision_type type) /* The fastest type to promote to will always be the native type, whether that occurs with implicit excess precision or otherwise. */ - return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; + return TARGET_AVX512FP16 + ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 + : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; case EXCESS_PRECISION_TYPE_STANDARD: case EXCESS_PRECISION_TYPE_IMPLICIT: /* Otherwise, the excess precision we want when we are in a standards compliant mode, and the implicit precision we provide would be identical were it not for the unpredictable cases. */ - if (!TARGET_80387) + if (TARGET_AVX512FP16 && TARGET_SSE_MATH) + return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16; + else if (!TARGET_80387) return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; else if (!TARGET_MIX_SSE_I387) { diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index b1e66ee192e..8fcd5693624 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define VALID_AVX512F_SCALAR_MODE(MODE) \ ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode \ - || (MODE) == SFmode) + || (MODE) == SFmode \ + || (TARGET_AVX512FP16 && ((MODE) == HImode || (MODE) == HFmode))) #define VALID_AVX512F_REG_MODE(MODE) \ ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ @@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define VALID_FP_MODE_P(MODE) \ ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode \ - || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode) \ + || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode) #define VALID_INT_MODE_P(MODE) \ ((MODE) == QImode || (MODE) == HImode \ @@ -1072,6 +1073,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define SSE_FLOAT_MODE_P(MODE) \ ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode)) +#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE) \ + ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH) \ + || (TARGET_AVX512FP16 && (MODE) == HFmode)) + #define FMA4_VEC_FLOAT_MODE_P(MODE) \ (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \ || (MODE) == V8SFmode || (MODE) == V4DFmode)) @@ -2265,7 +2270,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE - | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI; + | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16; constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1; constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d475347172d..777d11261ac 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -496,7 +496,7 @@ (define_attr "type" ;; Main data type used by the insn (define_attr "mode" - "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF, + "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF, V2DF,V2SF,V1DF,V8DF" (const_string "unknown")) @@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx, sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx, avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f, avx512bw,noavx512bw,avx512dq,noavx512dq, - avx512vl,noavx512vl, - avxvnni,avx512vnnivl" + avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16" (const_string "base")) ;; Define instruction set of MMX instructions @@ -885,6 +884,8 @@ (define_attr "enabled" "" (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI") (eq_attr "isa" "avx512vnnivl") (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL") + (eq_attr "isa" "avx512fp16") + (symbol_ref "TARGET_AVX512FP16") (eq_attr "mmx_isa" "native") (symbol_ref "!TARGET_MMX_WITH_SSE") @@ -906,6 +907,7 @@ (define_asm_attributes (set_attr "type" "multi")]) (define_code_iterator plusminus [plus minus]) +(define_code_iterator plusminusmultdiv [plus minus mult div]) (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus]) @@ -921,7 +923,8 @@ (define_code_attr multdiv_mnemonic ;; Mark commutative operators as such in constraints. (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%") - (minus "") (ss_minus "") (us_minus "")]) + (minus "") (ss_minus "") (us_minus "") + (mult "%") (div "")]) ;; Mapping of max and min (define_code_iterator maxmin [smax smin umax umin]) @@ -1021,7 +1024,8 @@ (define_code_attr insn (minus "sub") (ss_minus "sssub") (us_minus "ussub") (sign_extend "extend") (zero_extend "zero_extend") (ashift "ashl") (lshiftrt "lshr") (ashiftrt "ashr") - (rotate "rotl") (rotatert "rotr")]) + (rotate "rotl") (rotatert "rotr") + (mult "mul") (div "div")]) ;; All integer modes. (define_mode_iterator SWI1248x [QI HI SI DI]) @@ -1089,8 +1093,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")]) ;; compile time constant, it is faster to use than ;; GET_MODE_SIZE (mode). For XFmode which depends on ;; command line options just use GET_MODE_SIZE macro. -(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16") - (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)") +(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") + (TI "16") (HF "2") (SF "4") (DF "8") + (XF "GET_MODE_SIZE (XFmode)") (V16QI "16") (V32QI "32") (V64QI "64") (V8HI "16") (V16HI "32") (V32HI "64") (V4SI "16") (V8SI "32") (V16SI "64") @@ -1222,8 +1227,8 @@ (define_mode_iterator MODEF [SF DF]) ;; All x87 floating point modes (define_mode_iterator X87MODEF [SF DF XF]) -;; All x87 floating point modes plus HF -(define_mode_iterator X87MODEFH [SF DF XF HF]) +;; All x87 floating point modes plus HFmode +(define_mode_iterator X87MODEFH [HF SF DF XF]) ;; All SSE floating point modes (define_mode_iterator SSEMODEF [SF DF TF]) @@ -1231,7 +1236,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")]) ;; SSE instruction suffix for various modes (define_mode_attr ssemodesuffix - [(SF "ss") (DF "sd") + [(HF "sh") (SF "ss") (DF "sd") (V16SF "ps") (V8DF "pd") (V8SF "ps") (V4DF "pd") (V4SF "ps") (V2DF "pd") @@ -1496,6 +1501,23 @@ (define_expand "cstorexf4" DONE; }) +(define_expand "cbranchhf4" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_operand:HF 1 "cmp_fp_expander_operand") + (match_operand:HF 2 "cmp_fp_expander_operand"))) + (set (pc) (if_then_else + (match_operator 0 "ix86_fp_comparison_operator" + [(reg:CC FLAGS_REG) + (const_int 0)]) + (label_ref (match_operand 3)) + (pc)))] + "TARGET_AVX512FP16" +{ + ix86_expand_branch (GET_CODE (operands[0]), + operands[1], operands[2], operands[3]); + DONE; +}) + (define_expand "cbranch4" [(set (reg:CC FLAGS_REG) (compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand") @@ -1705,6 +1727,17 @@ (define_insn "*cmpi" (eq_attr "alternative" "0") (symbol_ref "true") (symbol_ref "false"))))]) + +(define_insn "*cmpihf" + [(set (reg:CCFP FLAGS_REG) + (compare:CCFP + (match_operand:HF 0 "register_operand" "v") + (match_operand:HF 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX512FP16" + "vcomish\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecomi") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) ;; Push/pop instructions. @@ -2436,8 +2469,8 @@ (define_insn "*movsi_internal" (symbol_ref "true")))]) (define_insn "*movhi_internal" - [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k") - (match_operand:HI 1 "general_operand" "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))] + [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m") + (match_operand:HI 1 "general_operand" "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))] "!(MEM_P (operands[0]) && MEM_P (operands[1])) && ix86_hardreg_mov_ok (operands[0], operands[1])" @@ -2463,6 +2496,9 @@ (define_insn "*movhi_internal" gcc_unreachable (); } + case TYPE_SSEMOV: + return ix86_output_ssemov (insn, operands); + case TYPE_MSKLOG: if (operands[1] == const0_rtx) return "kxorw\t%0, %0, %0"; @@ -2477,8 +2513,15 @@ (define_insn "*movhi_internal" return "mov{w}\t{%1, %0|%0, %1}"; } } - [(set (attr "type") - (cond [(eq_attr "alternative" "4,5,6,7") + [(set (attr "isa") + (cond [(eq_attr "alternative" "9,10,11,12,13") + (const_string "avx512fp16") + ] + (const_string "*"))) + (set (attr "type") + (cond [(eq_attr "alternative" "9,10,11,12,13") + (const_string "ssemov") + (eq_attr "alternative" "4,5,6,7") (const_string "mskmov") (eq_attr "alternative" "8") (const_string "msklog") @@ -2503,6 +2546,8 @@ (define_insn "*movhi_internal" (set (attr "mode") (cond [(eq_attr "type" "imovx") (const_string "SI") + (eq_attr "alternative" "11") + (const_string "HF") (and (eq_attr "alternative" "1,2") (match_operand:HI 1 "aligned_operand")) (const_string "SI") @@ -3727,7 +3772,10 @@ (define_insn "*movhf_internal" (eq_attr "alternative" "2") (const_string "sselog1") (eq_attr "alternative" "4,5,6,7") - (const_string "sselog") + (if_then_else + (match_test ("TARGET_AVX512FP16")) + (const_string "ssemov") + (const_string "sselog")) ] (const_string "ssemov"))) (set (attr "memory") @@ -3750,9 +3798,15 @@ (define_insn "*movhf_internal" (eq_attr "alternative" "2") (const_string "V4SF") (eq_attr "alternative" "4,5,6,7") - (const_string "TI") + (if_then_else + (match_test "TARGET_AVX512FP16") + (const_string "HI") + (const_string "TI")) (eq_attr "alternative" "3") - (const_string "SF") + (if_then_else + (match_test "TARGET_AVX512FP16") + (const_string "HF") + (const_string "SF")) ] (const_string "*")))]) @@ -4493,6 +4547,17 @@ (define_split emit_move_insn (operands[0], CONST0_RTX (V2DFmode)); }) +(define_insn "extendhf2" + [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v") + (float_extend:MODEF + (match_operand:HF 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX512FP16" + "vcvtsh2\t{%1, %0, %0|%0, %0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + + (define_expand "extendxf2" [(set (match_operand:XF 0 "nonimmediate_operand") (float_extend:XF (match_operand:MODEF 1 "general_operand")))] @@ -4670,6 +4735,18 @@ (define_insn "truncxf2" (symbol_ref "flag_unsafe_math_optimizations") ] (symbol_ref "true")))]) + +;; Conversion from {SF,DF}mode to HFmode. + +(define_insn "trunchf2" + [(set (match_operand:HF 0 "register_operand" "=v") + (float_truncate:HF + (match_operand:MODEF 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX512FP16" + "vcvt2sh\t{%1, %d0|%d0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) ;; Signed conversion to DImode. @@ -5046,6 +5123,16 @@ (define_insn "*float2" (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")] (symbol_ref "true")))]) +(define_insn "floathf2" + [(set (match_operand:HF 0 "register_operand" "=v") + (any_float:HF + (match_operand:SWI48 1 "nonimmediate_operand" "rm")))] + "TARGET_AVX512FP16" + "vcvtsi2sh\t{%1, %d0|%d0, %1}" + [(set_attr "type" "sseicvt") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + (define_insn "*floatdi2_i387" [(set (match_operand:MODEF 0 "register_operand" "=f") (float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))] @@ -7626,6 +7713,13 @@ (define_expand "xf3" (match_operand:XF 2 "register_operand")))] "TARGET_80387") +(define_expand "hf3" + [(set (match_operand:HF 0 "register_operand") + (plusminus:HF + (match_operand:HF 1 "register_operand") + (match_operand:HF 2 "nonimmediate_operand")))] + "TARGET_AVX512FP16") + (define_expand "3" [(set (match_operand:MODEF 0 "register_operand") (plusminus:MODEF @@ -8203,6 +8297,12 @@ (define_expand "mulxf3" (match_operand:XF 2 "register_operand")))] "TARGET_80387") +(define_expand "mulhf3" + [(set (match_operand:HF 0 "register_operand") + (mult:HF (match_operand:HF 1 "register_operand") + (match_operand:HF 2 "nonimmediate_operand")))] + "TARGET_AVX512FP16") + (define_expand "mul3" [(set (match_operand:MODEF 0 "register_operand") (mult:MODEF (match_operand:MODEF 1 "register_operand") @@ -8220,6 +8320,12 @@ (define_expand "divxf3" (match_operand:XF 2 "register_operand")))] "TARGET_80387") +(define_expand "divhf3" + [(set (match_operand:HF 0 "register_operand") + (div:HF (match_operand:HF 1 "register_operand") + (match_operand:HF 2 "nonimmediate_operand")))] + "TARGET_AVX512FP16") + (define_expand "div3" [(set (match_operand:MODEF 0 "register_operand") (div:MODEF (match_operand:MODEF 1 "register_operand") @@ -16312,6 +16418,17 @@ (define_insn "*fop__comm" (symbol_ref "true") (symbol_ref "false"))))]) +(define_insn "*hf" + [(set (match_operand:HF 0 "register_operand" "=v") + (plusminusmultdiv:HF + (match_operand:HF 1 "nonimmediate_operand" "v") + (match_operand:HF 2 "nonimmediate_operand" "vm")))] + "TARGET_AVX512FP16 + && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "vsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + (define_insn "*rcpsf2_sse" [(set (match_operand:SF 0 "register_operand" "=x,x,x") (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")] @@ -19178,6 +19295,15 @@ (define_peephole2 gcc_unreachable (); }) +(define_expand "movhfcc" + [(set (match_operand:HF 0 "register_operand") + (if_then_else:HF + (match_operand 1 "comparison_operator") + (match_operand:HF 2 "register_operand") + (match_operand:HF 3 "register_operand")))] + "TARGET_AVX512FP16" + "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;") + (define_expand "movcc" [(set (match_operand:X87MODEF 0 "register_operand") (if_then_else:X87MODEF @@ -19346,6 +19472,18 @@ (define_insn "3" ;; Their operands are not commutative, and thus they may be used in the ;; presence of -0.0 and NaN. +(define_insn "*ieee_shf3" + [(set (match_operand:HF 0 "register_operand" "=v") + (unspec:HF + [(match_operand:HF 1 "register_operand" "v") + (match_operand:HF 2 "nonimmediate_operand" "vm")] + IEEE_MAXMIN))] + "TARGET_AVX512FP16" + "vsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "type" "sseadd") + (set_attr "mode" "HF")]) + (define_insn "*ieee_s3" [(set (match_operand:MODEF 0 "register_operand" "=x,v") (unspec:MODEF diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 7b8547bb1c3..ad366974b5b 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property. mmwait Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save Support MWAIT and MONITOR built-in functions and code generation. + +mavx512fp16 +Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index f129de4bbe5..2421a78637b 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -94,6 +94,10 @@ #include +#ifdef __SSE2__ +#include +#endif + #include #include diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 3a1978efc97..09040bfca33 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -1164,6 +1164,14 @@ to inconsistent behavior between software emulation and AVX512-FP16 instructions. Using @option{-fexcess-precision=16} and will force round back after each operation. +Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of +software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round +after each operation. The same is true with @option{-fexcess-precision=standard} +and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse}, +@option{-fexcess-precision=standard} alone does the same thing as before, +It is useful for code that does not have @code{_Float16} and runs on the x87 +FPU. + @node Decimal Float @section Decimal Floating Types @cindex decimal floating types diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 32697e6117c..bb9f7ca956e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol +-mavx512fp16 @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @itemx -mavx512bf16 @opindex mavx512bf16 @need 200 +@itemx -mavx512fp16 +@opindex mavx512fp16 +@need 200 @itemx -mgfni @opindex mgfni @need 200 @@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP, XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE -extended instruction sets. Each has a corresponding @option{-mno-} option to -disable use of these instructions. +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16 +or CLDEMOTE extended instruction sets. Each has a corresponding +@option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index 62b2132957a..fba3d1ac684 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index 843aa2bdb2f..5cc0fa83457 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C new file mode 100644 index 00000000000..95d1ac27c4f --- /dev/null +++ b/gcc/testsuite/g++.target/i386/float16-1.C @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-sse2" } */ + +_Float16/* { dg-error "does not name a type" } */ +foo (_Float16 x) +{ + return x; +} diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C new file mode 100644 index 00000000000..99eb797eff1 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/float16-2.C @@ -0,0 +1,14 @@ +/* { dg-do assemble { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +union flt +{ + _Float16 flt; + short s; +}; + +_Float16 +foo (union flt x) +{ + return x.flt; +} diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C new file mode 100644 index 00000000000..940878503f1 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/float16-3.C @@ -0,0 +1,10 @@ +/* { dg-do assemble { target avx512fp16 } } */ +/* { dg-options "-O0 -mavx512fp16" } */ + +template void a(char *) {} +char b, d; +void c() +{ + a(&d); + a<_Float16>(&b); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 6178e38ce02..f3676077743 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c index 986fbd819e4..1751c52565c 100644 --- a/gcc/testsuite/gcc.target/i386/avx-2.c +++ b/gcc/testsuite/gcc.target/i386/avx-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h index 0a377dba1d5..0ad9064f637 100644 --- a/gcc/testsuite/gcc.target/i386/avx512-check.h +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h @@ -87,6 +87,9 @@ main () #ifdef AVX512VNNI && (ecx & bit_AVX512VNNI) #endif +#ifdef AVX512FP16 + && (edx & bit_AVX512FP16) +#endif #ifdef VAES && (ecx & bit_VAES) #endif diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c new file mode 100644 index 00000000000..88887556d68 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +_Float16 +__attribute__ ((noinline, noclone)) +do_max (_Float16 __A, _Float16 __B) +{ + return __A > __B ? __A : __B; +} + +_Float16 +__attribute__ ((noinline, noclone)) +do_min (_Float16 __A, _Float16 __B) +{ + return __A < __B ? __A : __B; +} + +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c new file mode 100644 index 00000000000..c9e23bf95c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c @@ -0,0 +1,27 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +#include + +static void do_test (void); + +#define DO_TEST do_test +#define AVX512FP16 +#include "avx512-check.h" +#include "avx512fp16-12a.c" + +static void +do_test (void) +{ + _Float16 x = 0.1f; + _Float16 y = -3.2f; + _Float16 z; + + z = do_max (x, y); + if (z != x) + abort (); + + z = do_min (x, y); + if (z != y) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c new file mode 100644 index 00000000000..3846c8e9b6e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/float16-3a.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +_Float16 +foo (int x) +{ + return x; +} + +/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c new file mode 100644 index 00000000000..247dd6e7e33 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/float16-3b.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +_Float16 +foo (unsigned int x) +{ + return x; +} + +/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c new file mode 100644 index 00000000000..631082581f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/float16-4a.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +_Float16 +foo (long long x) +{ + return x; +} + +/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c new file mode 100644 index 00000000000..828d8530769 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/float16-4b.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +_Float16 +foo (unsigned long long x) +{ + return x; +} + +/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index 79265c7c94f..8499fdf2db9 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -79,6 +79,7 @@ extern void test_hreset (void) __attribute__((__target__("hreset"))); extern void test_keylocker (void) __attribute__((__target__("kl"))); extern void test_widekl (void) __attribute__((__target__("widekl"))); extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); +extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -159,6 +160,7 @@ extern void test_no_hreset (void) __attribute__((__target__("no-hreset"))); extern void test_no_keylocker (void) __attribute__((__target__("no-kl"))); extern void test_no_widekl (void) __attribute__((__target__("no-widekl"))); extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); +extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c new file mode 100644 index 00000000000..2f8af392c83 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */ + +#include + +_Float16 +foo (_Float16 x, _Float16 y) +{ + x = x > y ? x : y; + return x; +} diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 7029771334b..f5f5c113612 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 4ce0ffffaf3..747d504cedb 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 6e8b6f3fa1b..33411969901 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 7faa053ace8..86590ca5ffb 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -708,6 +708,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 42ac9d0ac1a..10765365d7b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } { proc check_effective_target_float16 {} { return [check_no_compiler_messages_nocache float16 object { - _Float16 x; + _Float16 foo (_Float16 x) { return x; } } [add_options_for_float16 ""]] } @@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } { } +# Return 1 if avx512fp16 instructions can be compiled. + +proc check_effective_target_avx512fp16 { } { + return [check_no_compiler_messages avx512fp16 object { + void foo (void) + { + asm volatile ("vmovw %edi, %xmm0"); + } + } "-O2 -mavx512fp16" ] +} + # Return 1 if avx512f instructions can be compiled. proc check_effective_target_avx512f { } { From patchwork Mon Aug 2 06:39:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1512284 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=xwIfK6H4; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GdSzm0qmZz9sRK for ; Mon, 2 Aug 2021 16:40:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6116D3835C23 for ; Mon, 2 Aug 2021 06:40:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6116D3835C23 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1627886401; bh=m20Z1+vUsBiZKIpbJyfL+5w4oQhTKVl+I4Fa6WgtIUs=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xwIfK6H4ggGQ2p2LAdk1/5gM2vVCMgHjNlFPIR9+XmVY/r5SbkZ4v1n9sMkN1SkNZ NyZlWhb3PMJf71j/c7sfQ1ODX5i9IAzLzoIE/VKbBVs9Ar4HA0Mvva7meProXL+PHL fU2EsvQSitxw3ebvwhV+C4XDVrlHtP9eJWcLcabs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id 804B93846077 for ; Mon, 2 Aug 2021 06:39:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 804B93846077 X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="200569034" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="200569034" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 23:39:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="568235553" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga004.jf.intel.com with ESMTP; 01 Aug 2021 23:39:34 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 1726dVZh029354; Sun, 1 Aug 2021 23:39:32 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] AVX512FP16: Support vector init/broadcast/set/extract for FP16. Date: Mon, 2 Aug 2021 14:39:31 +0800 Message-Id: <20210802063931.999956-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210802063116.999830-1-hongtao.liu@intel.com> References: <20210802063116.999830-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Cc: joseph@codesourcery.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic. (_mm256_set_ph): Likewise. (_mm512_set_ph): Likewise. (_mm_setr_ph): Likewise. (_mm256_setr_ph): Likewise. (_mm512_setr_ph): Likewise. (_mm_set1_ph): Likewise. (_mm256_set1_ph): Likewise. (_mm512_set1_ph): Likewise. (_mm_setzero_ph): Likewise. (_mm256_setzero_ph): Likewise. (_mm512_setzero_ph): Likewise. (_mm_set_sh): Likewise. (_mm_load_sh): Likewise. (_mm_store_sh): Likewise. * config/i386/i386-builtin-types.def (V8HF): New type. (DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): Support vector HFmodes. (ix86_expand_vector_init_one_nonzero): Likewise. (ix86_expand_vector_init_one_var): Likewise. (ix86_expand_vector_init_interleave): Likewise. (ix86_expand_vector_init_general): Likewise. (ix86_expand_vector_set): Likewise. (ix86_expand_vector_extract): Likewise. (ix86_expand_vector_init_concat): Likewise. (ix86_expand_sse_movcc): Handle vector HFmodes. (ix86_expand_vector_set_var): Ditto. * config/i386/i386-modes.def: Add HF vector modes in comment. * config/i386/i386.c (classify_argument): Add HF vector modes. (ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16. (ix86_vector_mode_supported_p): Likewise. (ix86_set_reg_reg_cost): Handle vector HFmode. (ix86_get_ssemov): Handle vector HFmode. (function_arg_advance_64): Pass unamed V16HFmode and V32HFmode by stack. (function_arg_32): Pass V8HF/V16HF/V32HF by sse reg for 32bit mode. (function_arg_advance_32): Ditto. * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New. (VALID_AVX256_REG_OR_OI_MODE): Rename to .. (VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF. (VALID_SSE2_REG_VHF_MODE): New. (VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode. (SSE_REG_MODE_P): Add vector HFmode. * config/i386/i386.md (mode): Add HF vector modes. (MODE_SIZE): Likewise. (ssemodesuffix): Add ph suffix for HF vector modes. * config/i386/sse.md (VFH_128): New mode iterator. (VMOVE): Adjust for HF vector modes. (V): Likewise. (V_256_512): Likewise. (avx512): Likewise. (avx512fmaskmode): Likewise. (shuffletype): Likewise. (sseinsnmode): Likewise. (ssedoublevecmode): Likewise. (ssehalfvecmode): Likewise. (ssehalfvecmodelower): Likewise. (ssePScmode): Likewise. (ssescalarmode): Likewise. (ssescalarmodelower): Likewise. (sseintprefix): Likewise. (i128): Likewise. (bcstscalarsuff): Likewise. (xtg_mode): Likewise. (VI12HF_AVX512VL): New mode_iterator. (VF_AVX512FP16): Likewise. (VIHF): Likewise. (VIHF_256): Likewise. (VIHF_AVX512BW): Likewise. (V16_256): Likewise. (V32_512): Likewise. (sseintmodesuffix): New mode_attr. (sse): Add scalar and vector HFmodes. (ssescalarmode): Add vector HFmode mapping. (ssescalarmodesuffix): Add sh suffix for HFmode. (*_vm3): Use VFH_128. (*_vm3): Likewise. (*ieee_3): Likewise. (_blendm): New define_insn. (vec_setv8hf): New define_expand. (vec_set_0): New define_insn for HF vector set. (*avx512fp16_movsh): Likewise. (avx512fp16_movsh): Likewise. (vec_extract_lo_v32hi): Rename to ... (vec_extract_lo_): ... this, and adjust to allow HF vector modes. (vec_extract_hi_v32hi): Likewise. (vec_extract_hi_): Likewise. (vec_extract_lo_v16hi): Likewise. (vec_extract_lo_): Likewise. (vec_extract_hi_v16hi): Likewise. (vec_extract_hi_): Likewise. (vec_set_hi_v16hi): Likewise. (vec_set_hi_): Likewise. (vec_set_lo_v16hi): Likewise. (vec_set_lo_: Likewise. (*vec_extract_0): New define_insn_and_split for HF vector extract. (*vec_extracthf): New define_insn. (VEC_EXTRACT_MODE): Add HF vector modes. (PINSR_MODE): Add V8HF. (sse2p4_1): Likewise. (pinsr_evex_isa): Likewise. (_pinsr): Adjust to support insert for V8HFmode. (pbroadcast_evex_isa): Add HF vector modes. (AVX2_VEC_DUP_MODE): Likewise. (VEC_INIT_MODE): Likewise. (VEC_INIT_HALF_MODE): Likewise. (avx2_pbroadcast): Adjust to support HF vector mode broadcast. (avx2_pbroadcast_1): Likewise. (_vec_dup_1): Likewise. (_vec_dup): Likewise. (_vec_dup_gpr): Likewise. --- gcc/config/i386/avx512fp16intrin.h | 172 +++++++++++ gcc/config/i386/i386-builtin-types.def | 6 +- gcc/config/i386/i386-expand.c | 124 +++++++- gcc/config/i386/i386-modes.def | 12 +- gcc/config/i386/i386.c | 75 ++--- gcc/config/i386/i386.h | 15 +- gcc/config/i386/i386.md | 13 +- gcc/config/i386/sse.md | 397 +++++++++++++++++++------ 8 files changed, 660 insertions(+), 154 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 38d63161ba6..3fc0770986e 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -45,6 +45,178 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__)); typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__)); typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__)); +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5, + _Float16 __A4, _Float16 __A3, _Float16 __A2, + _Float16 __A1, _Float16 __A0) +{ + return __extension__ (__m128h)(__v8hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7 }; +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13, + _Float16 __A12, _Float16 __A11, _Float16 __A10, + _Float16 __A9, _Float16 __A8, _Float16 __A7, + _Float16 __A6, _Float16 __A5, _Float16 __A4, + _Float16 __A3, _Float16 __A2, _Float16 __A1, + _Float16 __A0) +{ + return __extension__ (__m256h)(__v16hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7, + __A8, __A9, __A10, __A11, + __A12, __A13, __A14, __A15 }; +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29, + _Float16 __A28, _Float16 __A27, _Float16 __A26, + _Float16 __A25, _Float16 __A24, _Float16 __A23, + _Float16 __A22, _Float16 __A21, _Float16 __A20, + _Float16 __A19, _Float16 __A18, _Float16 __A17, + _Float16 __A16, _Float16 __A15, _Float16 __A14, + _Float16 __A13, _Float16 __A12, _Float16 __A11, + _Float16 __A10, _Float16 __A9, _Float16 __A8, + _Float16 __A7, _Float16 __A6, _Float16 __A5, + _Float16 __A4, _Float16 __A3, _Float16 __A2, + _Float16 __A1, _Float16 __A0) +{ + return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3, + __A4, __A5, __A6, __A7, + __A8, __A9, __A10, __A11, + __A12, __A13, __A14, __A15, + __A16, __A17, __A18, __A19, + __A20, __A21, __A22, __A23, + __A24, __A25, __A26, __A27, + __A28, __A29, __A30, __A31 }; +} + +/* Create vectors of elements in the reversed order from _mm_set_ph, + _mm256_set_ph and _mm512_set_ph functions. */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7) +{ + return _mm_set_ph (__A7, __A6, __A5, __A4, __A3, __A2, __A1, __A0); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7, _Float16 __A8, + _Float16 __A9, _Float16 __A10, _Float16 __A11, + _Float16 __A12, _Float16 __A13, _Float16 __A14, + _Float16 __A15) +{ + return _mm256_set_ph (__A15, __A14, __A13, __A12, __A11, __A10, __A9, + __A8, __A7, __A6, __A5, __A4, __A3, __A2, __A1, + __A0); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2, + _Float16 __A3, _Float16 __A4, _Float16 __A5, + _Float16 __A6, _Float16 __A7, _Float16 __A8, + _Float16 __A9, _Float16 __A10, _Float16 __A11, + _Float16 __A12, _Float16 __A13, _Float16 __A14, + _Float16 __A15, _Float16 __A16, _Float16 __A17, + _Float16 __A18, _Float16 __A19, _Float16 __A20, + _Float16 __A21, _Float16 __A22, _Float16 __A23, + _Float16 __A24, _Float16 __A25, _Float16 __A26, + _Float16 __A27, _Float16 __A28, _Float16 __A29, + _Float16 __A30, _Float16 __A31) + +{ + return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25, + __A24, __A23, __A22, __A21, __A20, __A19, __A18, + __A17, __A16, __A15, __A14, __A13, __A12, __A11, + __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3, + __A2, __A1, __A0); +} + +/* Broadcast _Float16 to vector. */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_ph (_Float16 __A) +{ + return _mm_set_ph (__A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_ph (_Float16 __A) +{ + return _mm256_set_ph (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_ph (_Float16 __A) +{ + return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +/* Create a vector with all zeros. */ + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_ph (void) +{ + return _mm_set1_ph (0.0f); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_ph (void) +{ + return _mm256_set1_ph (0.0f); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_ph (void) +{ + return _mm512_set1_ph (0.0f); +} + +/* Create a vector with element 0 as F and the rest zero. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_sh (_Float16 __F) +{ + return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, __F); +} + +/* Create a vector with element 0 as *P and the rest zero. */ +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_sh (void const *__P) +{ + return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, + *(_Float16 const *) __P); +} + +/* Stores the lower _Float16 value. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_sh (void *__P, __m128h __A) +{ + *(_Float16 *) __P = ((__v8hf)__A)[0]; +} + #ifdef __DISABLE_AVX512FP16__ #undef __DISABLE_AVX512FP16__ #pragma GCC pop_options diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 1768b88d748..4df6ee1009d 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -85,6 +85,7 @@ DEF_VECTOR_TYPE (V8QI, QI) # SSE vectors DEF_VECTOR_TYPE (V2DF, DOUBLE) DEF_VECTOR_TYPE (V4SF, FLOAT) +DEF_VECTOR_TYPE (V8HF, FLOAT16) DEF_VECTOR_TYPE (V2DI, DI) DEF_VECTOR_TYPE (V4SI, SI) DEF_VECTOR_TYPE (V8HI, HI) @@ -1297,4 +1298,7 @@ DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID) DEF_FUNCTION_TYPE (UINT, UINT, V2DI, PVOID) DEF_FUNCTION_TYPE (VOID, V2DI, V2DI, V2DI, UINT) DEF_FUNCTION_TYPE (UINT8, PV2DI, V2DI, PCVOID) -DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) \ No newline at end of file +DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID) + +# FP16 builtins +DEF_FUNCTION_TYPE (V8HF, V8HI) diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index b7d050a1e42..bb965ca0e9b 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -3952,6 +3952,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) break; case E_V16QImode: case E_V8HImode: + case E_V8HFmode: case E_V4SImode: case E_V2DImode: if (TARGET_SSE4_1) @@ -3974,6 +3975,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) break; case E_V32QImode: case E_V16HImode: + case E_V16HFmode: case E_V8SImode: case E_V4DImode: if (TARGET_AVX2) @@ -3993,6 +3995,9 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) case E_V32HImode: gen = gen_avx512bw_blendmv32hi; break; + case E_V32HFmode: + gen = gen_avx512bw_blendmv32hf; + break; case E_V16SImode: gen = gen_avx512f_blendmv16si; break; @@ -14144,6 +14149,11 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, } return true; + case E_V8HFmode: + case E_V16HFmode: + case E_V32HFmode: + return ix86_vector_duplicate_value (mode, target, val); + default: return false; } @@ -14228,6 +14238,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode, use_vector_set = TARGET_AVX512F && TARGET_64BIT && one_var == 0; gen_vec_set_0 = gen_vec_setv8di_0; break; + case E_V8HFmode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv8hf_0; + break; + case E_V16HFmode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv16hf_0; + break; + case E_V32HFmode: + use_vector_set = TARGET_AVX512FP16 && one_var == 0; + gen_vec_set_0 = gen_vec_setv32hf_0; + break; default: break; } @@ -14377,6 +14399,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode, if (!TARGET_64BIT) return false; /* FALLTHRU */ + case E_V8HFmode: + case E_V16HFmode: case E_V4DFmode: case E_V8SFmode: case E_V8SImode: @@ -14457,6 +14481,9 @@ ix86_expand_vector_init_concat (machine_mode mode, case 2: switch (mode) { + case E_V32HFmode: + half_mode = V16HFmode; + break; case E_V16SImode: half_mode = V8SImode; break; @@ -14469,6 +14496,9 @@ ix86_expand_vector_init_concat (machine_mode mode, case E_V8DFmode: half_mode = V4DFmode; break; + case E_V16HFmode: + half_mode = V8HFmode; + break; case E_V8SImode: half_mode = V4SImode; break; @@ -14611,13 +14641,22 @@ ix86_expand_vector_init_interleave (machine_mode mode, { machine_mode first_imode, second_imode, third_imode, inner_mode; int i, j; - rtx op0, op1; + rtx op, op0, op1; rtx (*gen_load_even) (rtx, rtx, rtx); rtx (*gen_interleave_first_low) (rtx, rtx, rtx); rtx (*gen_interleave_second_low) (rtx, rtx, rtx); switch (mode) { + case E_V8HFmode: + gen_load_even = gen_vec_setv8hf; + gen_interleave_first_low = gen_vec_interleave_lowv4si; + gen_interleave_second_low = gen_vec_interleave_lowv2di; + inner_mode = HFmode; + first_imode = V4SImode; + second_imode = V2DImode; + third_imode = VOIDmode; + break; case E_V8HImode: gen_load_even = gen_vec_setv8hi; gen_interleave_first_low = gen_vec_interleave_lowv4si; @@ -14642,9 +14681,19 @@ ix86_expand_vector_init_interleave (machine_mode mode, for (i = 0; i < n; i++) { + op = ops [i + i]; + if (inner_mode == HFmode) + { + /* Convert HFmode to HImode. */ + op1 = gen_reg_rtx (HImode); + op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0); + op = gen_reg_rtx (HImode); + emit_move_insn (op, op1); + } + /* Extend the odd elment to SImode using a paradoxical SUBREG. */ op0 = gen_reg_rtx (SImode); - emit_move_insn (op0, gen_lowpart (SImode, ops [i + i])); + emit_move_insn (op0, gen_lowpart (SImode, op)); /* Insert the SImode value as low element of V4SImode vector. */ op1 = gen_reg_rtx (V4SImode); @@ -14781,6 +14830,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode, half_mode = V8HImode; goto half; + case E_V16HFmode: + half_mode = V8HFmode; + goto half; + half: n = GET_MODE_NUNITS (mode); for (i = 0; i < n; i++) @@ -14804,6 +14857,11 @@ half: half_mode = V16HImode; goto quarter; + case E_V32HFmode: + quarter_mode = V8HFmode; + half_mode = V16HFmode; + goto quarter; + quarter: n = GET_MODE_NUNITS (mode); for (i = 0; i < n; i++) @@ -14840,6 +14898,9 @@ quarter: move from GPR to SSE register directly. */ if (!TARGET_INTER_UNIT_MOVES_TO_VEC) break; + /* FALLTHRU */ + + case E_V8HFmode: n = GET_MODE_NUNITS (mode); for (i = 0; i < n; i++) @@ -15087,6 +15148,16 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx) case E_V16SFmode: cmp_mode = V16SImode; break; + /* TARGET_AVX512FP16 implies TARGET_AVX512BW. */ + case E_V8HFmode: + cmp_mode = V8HImode; + break; + case E_V16HFmode: + cmp_mode = V16HImode; + break; + case E_V32HFmode: + cmp_mode = V32HImode; + break; default: gcc_unreachable (); } @@ -15123,23 +15194,25 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) machine_mode half_mode; bool use_vec_merge = false; rtx tmp; - static rtx (*gen_extract[6][2]) (rtx, rtx) + static rtx (*gen_extract[7][2]) (rtx, rtx) = { { gen_vec_extract_lo_v32qi, gen_vec_extract_hi_v32qi }, { gen_vec_extract_lo_v16hi, gen_vec_extract_hi_v16hi }, { gen_vec_extract_lo_v8si, gen_vec_extract_hi_v8si }, { gen_vec_extract_lo_v4di, gen_vec_extract_hi_v4di }, { gen_vec_extract_lo_v8sf, gen_vec_extract_hi_v8sf }, - { gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df } + { gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df }, + { gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf } }; - static rtx (*gen_insert[6][2]) (rtx, rtx, rtx) + static rtx (*gen_insert[7][2]) (rtx, rtx, rtx) = { { gen_vec_set_lo_v32qi, gen_vec_set_hi_v32qi }, { gen_vec_set_lo_v16hi, gen_vec_set_hi_v16hi }, { gen_vec_set_lo_v8si, gen_vec_set_hi_v8si }, { gen_vec_set_lo_v4di, gen_vec_set_hi_v4di }, { gen_vec_set_lo_v8sf, gen_vec_set_hi_v8sf }, - { gen_vec_set_lo_v4df, gen_vec_set_hi_v4df } + { gen_vec_set_lo_v4df, gen_vec_set_hi_v4df }, + { gen_vec_set_lo_v16hf, gen_vec_set_hi_v16hf }, }; int i, j, n; machine_mode mmode = VOIDmode; @@ -15306,6 +15379,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) } return; + case E_V8HFmode: + use_vec_merge = true; + break; + case E_V8HImode: case E_V2HImode: use_vec_merge = TARGET_SSE2; @@ -15329,6 +15406,12 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) n = 16; goto half; + case E_V16HFmode: + half_mode = V8HFmode; + j = 6; + n = 8; + goto half; + case E_V16HImode: half_mode = V8HImode; j = 1; @@ -15409,6 +15492,13 @@ half: } break; + case E_V32HFmode: + if (TARGET_AVX512BW) + { + mmode = SImode; + gen_blendm = gen_avx512bw_blendmv32hf; + } + break; case E_V32HImode: if (TARGET_AVX512BW) { @@ -15780,6 +15870,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) ix86_expand_vector_extract (false, target, tmp, elt & 3); return; + case E_V32HFmode: + tmp = gen_reg_rtx (V16HFmode); + if (elt < 16) + emit_insn (gen_vec_extract_lo_v32hf (tmp, vec)); + else + emit_insn (gen_vec_extract_hi_v32hf (tmp, vec)); + ix86_expand_vector_extract (false, target, tmp, elt & 15); + return; + + case E_V16HFmode: + tmp = gen_reg_rtx (V8HFmode); + if (elt < 8) + emit_insn (gen_vec_extract_lo_v16hf (tmp, vec)); + else + emit_insn (gen_vec_extract_hi_v16hf (tmp, vec)); + ix86_expand_vector_extract (false, target, tmp, elt & 7); + return; + + case E_V8HFmode: + use_vec_extr = true; + break; + case E_V8QImode: use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1; /* ??? Could extract the appropriate HImode element and shift. */ diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def index 9232f59a925..fcadfcd4c94 100644 --- a/gcc/config/i386/i386-modes.def +++ b/gcc/config/i386/i386-modes.def @@ -84,12 +84,12 @@ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI */ VECTOR_MODES (INT, 64); /* V64QI V32HI V16SI V8DI */ VECTOR_MODES (INT, 128); /* V128QI V64HI V32SI V16DI */ -VECTOR_MODES (FLOAT, 8); /* V2SF */ -VECTOR_MODES (FLOAT, 16); /* V4SF V2DF */ -VECTOR_MODES (FLOAT, 32); /* V8SF V4DF V2TF */ -VECTOR_MODES (FLOAT, 64); /* V16SF V8DF V4TF */ -VECTOR_MODES (FLOAT, 128); /* V32SF V16DF V8TF */ -VECTOR_MODES (FLOAT, 256); /* V64SF V32DF V16TF */ +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ +VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ +VECTOR_MODES (FLOAT, 32); /* V16HF V8SF V4DF V2TF */ +VECTOR_MODES (FLOAT, 64); /* V32HF V16SF V8DF V4TF */ +VECTOR_MODES (FLOAT, 128); /* V64HF V32SF V16DF V8TF */ +VECTOR_MODES (FLOAT, 256); /* V128HF V64SF V32DF V16TF */ VECTOR_MODE (INT, TI, 1); /* V1TI */ VECTOR_MODE (INT, DI, 1); /* V1DI */ VECTOR_MODE (INT, SI, 1); /* V1SI */ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 71bbcf968c5..889256e0298 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2418,6 +2418,7 @@ classify_argument (machine_mode mode, const_tree type, case E_V8SFmode: case E_V8SImode: case E_V32QImode: + case E_V16HFmode: case E_V16HImode: case E_V4DFmode: case E_V4DImode: @@ -2428,6 +2429,7 @@ classify_argument (machine_mode mode, const_tree type, return 4; case E_V8DFmode: case E_V16SFmode: + case E_V32HFmode: case E_V8DImode: case E_V16SImode: case E_V32HImode: @@ -2445,6 +2447,7 @@ classify_argument (machine_mode mode, const_tree type, case E_V4SImode: case E_V16QImode: case E_V8HImode: + case E_V8HFmode: case E_V2DFmode: case E_V2DImode: classes[0] = X86_64_SSE_CLASS; @@ -2858,12 +2861,14 @@ pass_in_reg: break; /* FALLTHRU */ + case E_V16HFmode: case E_V8SFmode: case E_V8SImode: case E_V64QImode: case E_V32HImode: case E_V16SImode: case E_V8DImode: + case E_V32HFmode: case E_V16SFmode: case E_V8DFmode: case E_V32QImode: @@ -2875,6 +2880,7 @@ pass_in_reg: case E_V8HImode: case E_V4SImode: case E_V2DImode: + case E_V8HFmode: case E_V4SFmode: case E_V2DFmode: if (!type || !AGGREGATE_TYPE_P (type)) @@ -2929,7 +2935,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, machine_mode mode, /* Unnamed 512 and 256bit vector mode parameters are passed on stack. */ if (!named && (VALID_AVX512F_REG_MODE (mode) - || VALID_AVX256_REG_MODE (mode))) + || VALID_AVX256_REG_MODE (mode) + || mode == V16HFmode + || mode == V32HFmode)) return 0; if (!examine_argument (mode, type, 0, &int_nregs, &sse_nregs) @@ -3097,6 +3105,7 @@ pass_in_reg: case E_V8HImode: case E_V4SImode: case E_V2DImode: + case E_V8HFmode: case E_V4SFmode: case E_V2DFmode: if (!type || !AGGREGATE_TYPE_P (type)) @@ -3116,8 +3125,10 @@ pass_in_reg: case E_V32HImode: case E_V16SImode: case E_V8DImode: + case E_V32HFmode: case E_V16SFmode: case E_V8DFmode: + case E_V16HFmode: case E_V8SFmode: case E_V8SImode: case E_V32QImode: @@ -3176,12 +3187,14 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode, default: break; + case E_V16HFmode: case E_V8SFmode: case E_V8SImode: case E_V32QImode: case E_V16HImode: case E_V4DFmode: case E_V4DImode: + case E_V32HFmode: case E_V16SFmode: case E_V16SImode: case E_V64QImode: @@ -4676,12 +4689,14 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p, nat_mode = type_natural_mode (type, NULL, false); switch (nat_mode) { + case E_V16HFmode: case E_V8SFmode: case E_V8SImode: case E_V32QImode: case E_V16HImode: case E_V4DFmode: case E_V4DImode: + case E_V32HFmode: case E_V16SFmode: case E_V16SImode: case E_V64QImode: @@ -5348,7 +5363,12 @@ ix86_get_ssemov (rtx *operands, unsigned size, switch (type) { case opcode_int: - opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; + if (scalar_mode == E_HFmode) + opcode = (misaligned_p + ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") + : "vmovdqa64"); + else + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; break; case opcode_float: opcode = misaligned_p ? "vmovups" : "vmovaps"; @@ -5362,6 +5382,11 @@ ix86_get_ssemov (rtx *operands, unsigned size, { switch (scalar_mode) { + case E_HFmode: + opcode = (misaligned_p + ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") + : "vmovdqa64"); + break; case E_SFmode: opcode = misaligned_p ? "%vmovups" : "%vmovaps"; break; @@ -19298,7 +19323,6 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in) int index; switch (mode) { - case E_HFmode: case E_SFmode: index = 0; break; @@ -19399,31 +19423,12 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in) } break; case 2: - { - int cost; - if (in == 2) - cost = MAX (ix86_cost->hard_register.int_load[1], - ix86_cost->hard_register.int_store[1]); - else - cost = in ? ix86_cost->hard_register.int_load[1] - : ix86_cost->hard_register.int_store[1]; - if (mode == E_HFmode) - { - /* Prefer SSE over GPR for HFmode. */ - int sse_cost; - int index = sse_store_index (mode); - if (in == 2) - sse_cost = MAX (ix86_cost->hard_register.sse_load[index], - ix86_cost->hard_register.sse_store[index]); - else - sse_cost = (in - ? ix86_cost->hard_register.sse_load [index] - : ix86_cost->hard_register.sse_store [index]); - if (sse_cost >= cost) - cost = sse_cost + 1; - } - return cost; - } + if (in == 2) + return MAX (ix86_cost->hard_register.int_load[1], + ix86_cost->hard_register.int_store[1]); + else + return in ? ix86_cost->hard_register.int_load[1] + : ix86_cost->hard_register.int_store[1]; default: if (in == 2) cost = MAX (ix86_cost->hard_register.int_load[2], @@ -19601,6 +19606,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) between gpr and sse registser. */ if (TARGET_AVX512F && (mode == XImode + || mode == V32HFmode || VALID_AVX512F_REG_MODE (mode) || VALID_AVX512F_SCALAR_MODE (mode))) return true; @@ -19615,9 +19621,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) /* TODO check for QI/HI scalars. */ /* AVX512VL allows sse regs16+ for 128/256 bit modes. */ if (TARGET_AVX512VL - && (mode == OImode - || mode == TImode - || VALID_AVX256_REG_MODE (mode) + && (VALID_AVX256_REG_OR_OI_VHF_MODE (mode) || VALID_AVX512VL_128_REG_MODE (mode))) return true; @@ -19627,9 +19631,9 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) /* OImode and AVX modes are available only when AVX is enabled. */ return ((TARGET_AVX - && VALID_AVX256_REG_OR_OI_MODE (mode)) + && VALID_AVX256_REG_OR_OI_VHF_MODE (mode)) || VALID_SSE_REG_MODE (mode) - || VALID_SSE2_REG_MODE (mode) + || VALID_SSE2_REG_VHF_MODE (mode) || VALID_MMX_REG_MODE (mode) || VALID_MMX_REG_MODE_3DNOW (mode)); } @@ -19840,7 +19844,8 @@ ix86_set_reg_reg_cost (machine_mode mode) case MODE_VECTOR_INT: case MODE_VECTOR_FLOAT: - if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode)) + if ((TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode)) + || (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode)) || (TARGET_AVX && VALID_AVX256_REG_MODE (mode)) || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode)) || (TARGET_SSE && VALID_SSE_REG_MODE (mode)) @@ -21706,6 +21711,8 @@ ix86_vector_mode_supported_p (machine_mode mode) if ((TARGET_MMX || TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode)) return true; + if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode)) + return true; if ((TARGET_3DNOW || TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE_3DNOW (mode)) return true; diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 8fcd5693624..64327dc90df 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -995,8 +995,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode \ || (MODE) == V4DFmode) -#define VALID_AVX256_REG_OR_OI_MODE(MODE) \ - (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) +#define VALID_AVX256_REG_OR_OI_VHF_MODE(MODE) \ + (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode || (MODE) == V16HFmode) #define VALID_AVX512F_SCALAR_MODE(MODE) \ ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode \ @@ -1014,13 +1014,20 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define VALID_AVX512VL_128_REG_MODE(MODE) \ ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode \ || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode \ - || (MODE) == TFmode || (MODE) == V1TImode) + || (MODE) == TFmode || (MODE) == V1TImode || (MODE) == V8HFmode \ + || (MODE) == TImode) + +#define VALID_AVX512FP16_REG_MODE(MODE) \ + ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode) #define VALID_SSE2_REG_MODE(MODE) \ ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \ || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode) +#define VALID_SSE2_REG_VHF_MODE(MODE) \ + (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode) + #define VALID_SSE_REG_MODE(MODE) \ ((MODE) == V1TImode || (MODE) == TImode \ || (MODE) == V4SFmode || (MODE) == V4SImode \ @@ -1065,7 +1072,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode \ || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode \ || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \ - || (MODE) == V16SFmode) + || (MODE) == V16SFmode || VALID_AVX512FP16_REG_MODE (MODE)) #define X87_FLOAT_MODE_P(MODE) \ (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode)) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 777d11261ac..f25166695f1 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -496,8 +496,8 @@ (define_attr "type" ;; Main data type used by the insn (define_attr "mode" - "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF, - V2DF,V2SF,V1DF,V8DF" + "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF, + V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF" (const_string "unknown")) ;; The CPU unit operations uses. @@ -1102,7 +1102,8 @@ (define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (V2DI "16") (V4DI "32") (V8DI "64") (V1TI "16") (V2TI "32") (V4TI "64") (V2DF "16") (V4DF "32") (V8DF "64") - (V4SF "16") (V8SF "32") (V16SF "64")]) + (V4SF "16") (V8SF "32") (V16SF "64") + (V8HF "16") (V16HF "32") (V32HF "64")]) ;; Double word integer modes as mode attribute. (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")]) @@ -1237,9 +1238,9 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")]) ;; SSE instruction suffix for various modes (define_mode_attr ssemodesuffix [(HF "sh") (SF "ss") (DF "sd") - (V16SF "ps") (V8DF "pd") - (V8SF "ps") (V4DF "pd") - (V4SF "ps") (V2DF "pd") + (V32HF "ph") (V16SF "ps") (V8DF "pd") + (V16HF "ph") (V8SF "ps") (V4DF "pd") + (V8HF "ph") (V4SF "ps") (V2DF "pd") (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q") (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q") (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ab29999023d..e331ef477d3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -225,6 +225,7 @@ (define_mode_iterator VMOVE (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI + (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) @@ -240,6 +241,13 @@ (define_mode_iterator VI12_AVX512VL [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) +(define_mode_iterator VI12HF_AVX512VL + [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") + V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL") + (V32HF "TARGET_AVX512FP16") + (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") + (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")]) + ;; Same iterator, but without supposed TARGET_AVX512BW (define_mode_iterator VI12_AVX512VLBW [(V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL") @@ -255,6 +263,8 @@ (define_mode_iterator V (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) @@ -277,7 +287,8 @@ (define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF]) (define_mode_iterator V_256_512 [V32QI V16HI V8SI V4DI V8SF V4DF (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F") - (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) + (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") + (V16HF "TARGET_AVX512FP16") (V32HF "TARGET_AVX512FP16")]) ;; All vector float modes (define_mode_iterator VF @@ -321,6 +332,11 @@ (define_mode_iterator VF2_512_256VL (define_mode_iterator VF_128 [V4SF (V2DF "TARGET_SSE2")]) +;; All 128bit vector HF/SF/DF modes +(define_mode_iterator VFH_128 + [(V8HF "TARGET_AVX512FP16") + V4SF (V2DF "TARGET_SSE2")]) + ;; All 256bit vector float modes (define_mode_iterator VF_256 [V8SF V4DF]) @@ -347,6 +363,9 @@ (define_mode_iterator VF2_AVX512VL (define_mode_iterator VF1_AVX512VL [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) +(define_mode_iterator VF_AVX512FP16 + [V32HF V16HF V8HF]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") @@ -355,6 +374,16 @@ (define_mode_iterator VI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI]) +;; All vector integer and HF modes +(define_mode_iterator VIHF + [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI + (V8SI "TARGET_AVX") V4SI + (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16")]) + (define_mode_iterator VI_AVX2 [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI @@ -557,6 +586,7 @@ (define_mode_attr avx512 (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw") (V4SI "avx512vl") (V8SI "avx512vl") (V16SI "avx512f") (V2DI "avx512vl") (V4DI "avx512vl") (V8DI "avx512f") + (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw") (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f") (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")]) @@ -617,12 +647,13 @@ (define_mode_attr avx2_avx512 (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")]) (define_mode_attr shuffletype - [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i") - (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i") - (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i") - (V32HI "i") (V16HI "i") (V8HI "i") - (V64QI "i") (V32QI "i") (V16QI "i") - (V4TI "i") (V2TI "i") (V1TI "i")]) + [(V32HF "f") (V16HF "f") (V8HF "f") + (V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i") + (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i") + (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i") + (V32HI "i") (V16HI "i") (V8HI "i") + (V64QI "i") (V32QI "i") (V16QI "i") + (V4TI "i") (V2TI "i") (V1TI "i")]) (define_mode_attr ssequartermode [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")]) @@ -659,6 +690,8 @@ (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) ;; All 128 and 256bit vector integer modes (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI]) +;; All 256bit vector integer and HF modes +(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF]) ;; Various 128bit vector integer mode combinations (define_mode_iterator VI12_128 [V16QI V8HI]) @@ -680,6 +713,9 @@ (define_mode_iterator VI48_512 [V16SI V8DI]) (define_mode_iterator VI4_256_8_512 [V8SI V8DI]) (define_mode_iterator VI_AVX512BW [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) +(define_mode_iterator VIHF_AVX512BW + [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW") + (V32HF "TARGET_AVX512FP16")]) ;; Int-float size matches (define_mode_iterator VI4F_128 [V4SI V4SF]) @@ -720,6 +756,9 @@ (define_mode_iterator VF_AVX512 (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL") V16SF V8DF]) +(define_mode_iterator V16_256 [V16HI V16HF]) +(define_mode_iterator V32_512 [V32HI V32HF]) + (define_mode_attr avx512bcst [(V4SI "%{1to4%}") (V2DI "%{1to2%}") (V8SI "%{1to8%}") (V4DI "%{1to4%}") @@ -730,8 +769,10 @@ (define_mode_attr avx512bcst ;; Mapping from float mode to required SSE level (define_mode_attr sse - [(SF "sse") (DF "sse2") + [(SF "sse") (DF "sse2") (HF "avx512fp16") (V4SF "sse") (V2DF "sse2") + (V32HF "avx512fp16") (V16HF "avx512fp16") + (V8HF "avx512fp16") (V16SF "avx512f") (V8SF "avx") (V8DF "avx512f") (V4DF "avx")]) @@ -767,14 +808,23 @@ (define_mode_attr sseinsnmode (V16SF "V16SF") (V8DF "V8DF") (V8SF "V8SF") (V4DF "V4DF") (V4SF "V4SF") (V2DF "V2DF") + (V8HF "TI") (V16HF "OI") (V32HF "XI") (TI "TI")]) +;; SSE integer instruction suffix for various modes +(define_mode_attr sseintmodesuffix + [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q") + (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q") + (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q") + (V8HF "w") (V16HF "w") (V32HF "w")]) + ;; Mapping of vector modes to corresponding mask size (define_mode_attr avx512fmaskmode [(V64QI "DI") (V32QI "SI") (V16QI "HI") (V32HI "SI") (V16HI "HI") (V8HI "QI") (V4HI "QI") (V16SI "HI") (V8SI "QI") (V4SI "QI") (V8DI "QI") (V4DI "QI") (V2DI "QI") + (V32HF "SI") (V16HF "HI") (V8HF "QI") (V16SF "HI") (V8SF "QI") (V4SF "QI") (V8DF "QI") (V4DF "QI") (V2DF "QI")]) @@ -784,6 +834,7 @@ (define_mode_attr avx512fmaskmodelower (V32HI "si") (V16HI "hi") (V8HI "qi") (V4HI "qi") (V16SI "hi") (V8SI "qi") (V4SI "qi") (V8DI "qi") (V4DI "qi") (V2DI "qi") + (V32HF "si") (V16HF "hi") (V8HF "qi") (V16SF "hi") (V8SF "qi") (V4SF "qi") (V8DF "qi") (V4DF "qi") (V2DF "qi")]) @@ -828,7 +879,8 @@ (define_mode_attr ssedoublevecmode (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI") (V16SF "V32SF") (V8DF "V16DF") (V8SF "V16SF") (V4DF "V8DF") - (V4SF "V8SF") (V2DF "V4DF")]) + (V4SF "V8SF") (V2DF "V4DF") + (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")]) ;; Mapping of vector modes to a vector mode of half size ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar. @@ -838,7 +890,8 @@ (define_mode_attr ssehalfvecmode (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") (V2DI "DI") (V16SF "V8SF") (V8DF "V4DF") (V8SF "V4SF") (V4DF "V2DF") - (V4SF "V2SF") (V2DF "DF")]) + (V4SF "V2SF") (V2DF "DF") + (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")]) (define_mode_attr ssehalfvecmodelower [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti") @@ -846,9 +899,10 @@ (define_mode_attr ssehalfvecmodelower (V16QI "v8qi") (V8HI "v4hi") (V4SI "v2si") (V16SF "v8sf") (V8DF "v4df") (V8SF "v4sf") (V4DF "v2df") - (V4SF "v2sf")]) + (V4SF "v2sf") + (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")]) -;; Mapping of vector modes ti packed single mode of the same size +;; Mapping of vector modes to packed single mode of the same size (define_mode_attr ssePSmode [(V16SI "V16SF") (V8DF "V16SF") (V16SF "V16SF") (V8DI "V16SF") @@ -858,7 +912,8 @@ (define_mode_attr ssePSmode (V4DI "V8SF") (V2DI "V4SF") (V4TI "V16SF") (V2TI "V8SF") (V1TI "V4SF") (V8SF "V8SF") (V4SF "V4SF") - (V4DF "V8SF") (V2DF "V4SF")]) + (V4DF "V8SF") (V2DF "V4SF") + (V32HF "V16SF") (V16HF "V8SF") (V8HF "V4SF")]) (define_mode_attr ssePSmode2 [(V8DI "V8SF") (V4DI "V4SF")]) @@ -869,6 +924,7 @@ (define_mode_attr ssescalarmode (V32HI "HI") (V16HI "HI") (V8HI "HI") (V16SI "SI") (V8SI "SI") (V4SI "SI") (V8DI "DI") (V4DI "DI") (V2DI "DI") + (V32HF "HF") (V16HF "HF") (V8HF "HF") (V16SF "SF") (V8SF "SF") (V4SF "SF") (V8DF "DF") (V4DF "DF") (V2DF "DF") (V4TI "TI") (V2TI "TI")]) @@ -879,6 +935,7 @@ (define_mode_attr ssescalarmodelower (V32HI "hi") (V16HI "hi") (V8HI "hi") (V16SI "si") (V8SI "si") (V4SI "si") (V8DI "di") (V4DI "di") (V2DI "di") + (V32HF "hf") (V16HF "hf") (V8HF "hf") (V16SF "sf") (V8SF "sf") (V4SF "sf") (V8DF "df") (V4DF "df") (V2DF "df") (V4TI "ti") (V2TI "ti")]) @@ -889,6 +946,7 @@ (define_mode_attr ssexmmmode (V32HI "V8HI") (V16HI "V8HI") (V8HI "V8HI") (V16SI "V4SI") (V8SI "V4SI") (V4SI "V4SI") (V8DI "V2DI") (V4DI "V2DI") (V2DI "V2DI") + (V32HF "V8HF") (V16HF "V8HF") (V8HF "V8HF") (V16SF "V4SF") (V8SF "V4SF") (V4SF "V4SF") (V8DF "V2DF") (V4DF "V2DF") (V2DF "V2DF")]) @@ -931,10 +989,11 @@ (define_mode_attr ssescalarsize (V64QI "8") (V32QI "8") (V16QI "8") (V32HI "16") (V16HI "16") (V8HI "16") (V16SI "32") (V8SI "32") (V4SI "32") + (V32HF "16") (V16HF "16") (V8HF "16") (V16SF "32") (V8SF "32") (V4SF "32") (V8DF "64") (V4DF "64") (V2DF "64")]) -;; SSE prefix for integer vector modes +;; SSE prefix for integer and HF vector modes (define_mode_attr sseintprefix [(V2DI "p") (V2DF "") (V4DI "p") (V4DF "") @@ -942,16 +1001,16 @@ (define_mode_attr sseintprefix (V4SI "p") (V4SF "") (V8SI "p") (V8SF "") (V16SI "p") (V16SF "") - (V16QI "p") (V8HI "p") - (V32QI "p") (V16HI "p") - (V64QI "p") (V32HI "p")]) + (V16QI "p") (V8HI "p") (V8HF "p") + (V32QI "p") (V16HI "p") (V16HF "p") + (V64QI "p") (V32HI "p") (V32HF "p")]) ;; SSE scalar suffix for vector modes (define_mode_attr ssescalarmodesuffix - [(SF "ss") (DF "sd") - (V16SF "ss") (V8DF "sd") - (V8SF "ss") (V4DF "sd") - (V4SF "ss") (V2DF "sd") + [(HF "sh") (SF "ss") (DF "sd") + (V32HF "sh") (V16SF "ss") (V8DF "sd") + (V16HF "sh") (V8SF "ss") (V4DF "sd") + (V8HF "sh") (V4SF "ss") (V2DF "sd") (V16SI "d") (V8DI "q") (V8SI "d") (V4DI "q") (V4SI "d") (V2DI "q")]) @@ -979,7 +1038,8 @@ (define_mode_attr castmode ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise. ;; i64x4 or f64x4 for 512bit modes. (define_mode_attr i128 - [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128") + [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128") + (V8DF "f64x4") (V4DF "f128") (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128") (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")]) @@ -1003,14 +1063,18 @@ (define_mode_attr bcstscalarsuff (V32HI "w") (V16HI "w") (V8HI "w") (V16SI "d") (V8SI "d") (V4SI "d") (V8DI "q") (V4DI "q") (V2DI "q") + (V32HF "w") (V16HF "w") (V8HF "w") (V16SF "ss") (V8SF "ss") (V4SF "ss") (V8DF "sd") (V4DF "sd") (V2DF "sd")]) ;; Tie mode of assembler operand to mode iterator (define_mode_attr xtg_mode - [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x") - (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t") - (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")]) + [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") + (V8HF "x") (V4SF "x") (V2DF "x") + (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") + (V16HF "t") (V8SF "t") (V4DF "t") + (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") + (V32HF "g") (V16SF "g") (V8DF "g")]) ;; Half mask mode for unpacks (define_mode_attr HALFMASKMODE @@ -1306,6 +1370,20 @@ (define_insn "_blendm" (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "_blendm" + [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v") + (vec_merge:VF_AVX512FP16 + (match_operand:VF_AVX512FP16 2 "nonimmediate_operand" "vm,vm") + (match_operand:VF_AVX512FP16 1 "nonimm_or_0_operand" "0C,v") + (match_operand: 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512BW" + "@ + vmovdqu\t{%2, %0%{%3%}%N1|%0%{%3%}%N1, %2} + vpblendmw\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_store_mask" [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m") (vec_merge:V48_AVX512VL @@ -1903,12 +1981,12 @@ (define_insn "*3" ;; Standard scalar operation patterns which preserve the rest of the ;; vector for combiner. (define_insn "*_vm3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (vec_duplicate:VF_128 + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (vec_duplicate:VFH_128 (plusminus: (vec_select: - (match_operand:VF_128 1 "register_operand" "0,v") + (match_operand:VFH_128 1 "register_operand" "0,v") (parallel [(const_int 0)])) (match_operand: 2 "nonimmediate_operand" "xm,vm"))) (match_dup 1) @@ -1919,7 +1997,16 @@ (define_insn "*_vm3" v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") - (set_attr "prefix" "orig,vex") + (set (attr "prefix") + (cond [(eq_attr "alternative" "0") + (const_string "orig") + (eq_attr "alternative" "1") + (if_then_else + (match_test "mode == V8HFmode") + (const_string "evex") + (const_string "vex")) + ] + (const_string "*"))) (set_attr "mode" "")]) (define_insn "_vm3" @@ -1966,12 +2053,12 @@ (define_insn "*mul3" ;; Standard scalar operation patterns which preserve the rest of the ;; vector for combiner. (define_insn "*_vm3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (vec_duplicate:VF_128 + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (vec_duplicate:VFH_128 (multdiv: (vec_select: - (match_operand:VF_128 1 "register_operand" "0,v") + (match_operand:VFH_128 1 "register_operand" "0,v") (parallel [(const_int 0)])) (match_operand: 2 "nonimmediate_operand" "xm,vm"))) (match_dup 1) @@ -1982,7 +2069,16 @@ (define_insn "*_vm3" v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sse") - (set_attr "prefix" "orig,vex") + (set (attr "prefix") + (cond [(eq_attr "alternative" "0") + (const_string "orig") + (eq_attr "alternative" "1") + (if_then_else + (match_test "mode == V8HFmode") + (const_string "evex") + (const_string "vex")) + ] + (const_string "*"))) (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) @@ -2368,12 +2464,12 @@ (define_insn "ieee_3" ;; Standard scalar operation patterns which preserve the rest of the ;; vector for combiner. (define_insn "*ieee_3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (vec_duplicate:VF_128 + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (vec_duplicate:VFH_128 (unspec: [(vec_select: - (match_operand:VF_128 1 "register_operand" "0,v") + (match_operand:VFH_128 1 "register_operand" "0,v") (parallel [(const_int 0)])) (match_operand: 2 "nonimmediate_operand" "xm,vm")] IEEE_MAXMIN)) @@ -2386,7 +2482,16 @@ (define_insn "*ieee_3" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "btver2_sse_attr" "maxmin") - (set_attr "prefix" "orig,vex") + (set (attr "prefix") + (cond [(eq_attr "alternative" "0") + (const_string "orig") + (eq_attr "alternative" "1") + (if_then_else + (match_test "mode == V8HFmode") + (const_string "evex") + (const_string "vex")) + ] + (const_string "*"))) (set_attr "mode" "")]) (define_insn "_vm3" @@ -8364,6 +8469,47 @@ (define_insn "vec_set_0" ] (symbol_ref "true")))]) +;; vmovw clears also the higer bits +(define_insn "vec_set_0" + [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v") + (vec_merge:VF_AVX512FP16 + (vec_duplicate:VF_AVX512FP16 + (match_operand:HF 2 "nonimmediate_operand" "r,m")) + (match_operand:VF_AVX512FP16 1 "const0_operand" "C,C") + (const_int 1)))] + "TARGET_AVX512FP16" + "@ + vmovw\t{%k2, %x0|%x0, %k2} + vmovw\t{%2, %x0|%x0, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_insn "*avx512fp16_movsh" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (vec_duplicate:V8HF + (match_operand:HF 2 "register_operand" "v")) + (match_operand:V8HF 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vmovsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_insn "avx512fp16_movsh" + [(set (match_operand:V8HF 0 "register_operand" "=v") + (vec_merge:V8HF + (match_operand:V8HF 2 "register_operand" "v") + (match_operand:V8HF 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512FP16" + "vmovsh\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + ;; A subset is vec_setv4sf. (define_insn "*vec_setv4sf_sse4_1" [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v") @@ -8499,6 +8645,20 @@ (define_expand "vec_set" DONE; }) +(define_expand "vec_setv8hf" + [(match_operand:V8HF 0 "register_operand") + (match_operand:HF 1 "register_operand") + (match_operand 2 "vec_setm_sse41_operand")] + "TARGET_SSE" +{ + if (CONST_INT_P (operands[2])) + ix86_expand_vector_set (false, operands[0], operands[1], + INTVAL (operands[2])); + else + ix86_expand_vector_set_var (operands[0], operands[1], operands[2]); + DONE; +}) + (define_expand "vec_set" [(match_operand:V_256_512 0 "register_operand") (match_operand: 1 "register_operand") @@ -9214,10 +9374,10 @@ (define_insn "vec_extract_hi_" (set_attr "length_immediate" "1") (set_attr "mode" "")]) -(define_insn_and_split "vec_extract_lo_v32hi" - [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,v,m") - (vec_select:V16HI - (match_operand:V32HI 1 "nonimmediate_operand" "v,m,v") +(define_insn_and_split "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,v,m") + (vec_select: + (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) @@ -9244,9 +9404,10 @@ (define_insn_and_split "vec_extract_lo_v32hi" if (!TARGET_AVX512VL && REG_P (operands[0]) && EXT_REX_SSE_REG_P (operands[1])) - operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode); + operands[0] = lowpart_subreg (mode, operands[0], + mode); else - operands[1] = gen_lowpart (V16HImode, operands[1]); + operands[1] = gen_lowpart (mode, operands[1]); } [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -9255,10 +9416,10 @@ (define_insn_and_split "vec_extract_lo_v32hi" (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "vec_extract_hi_v32hi" - [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm") - (vec_select:V16HI - (match_operand:V32HI 1 "register_operand" "v") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") + (vec_select: + (match_operand:V32_512 1 "register_operand" "v") (parallel [(const_int 16) (const_int 17) (const_int 18) (const_int 19) (const_int 20) (const_int 21) @@ -9275,10 +9436,10 @@ (define_insn "vec_extract_hi_v32hi" (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn_and_split "vec_extract_lo_v16hi" - [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m") - (vec_select:V8HI - (match_operand:V16HI 1 "nonimmediate_operand" "vm,v") +(define_insn_and_split "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_select: + (match_operand:V16_256 1 "nonimmediate_operand" "vm,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) @@ -9287,12 +9448,12 @@ (define_insn_and_split "vec_extract_lo_v16hi" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (V8HImode, operands[1]);") + "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_hi_v16hi" - [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm,vm,vm") - (vec_select:V8HI - (match_operand:V16HI 1 "register_operand" "x,v,v") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=xm,vm,vm") + (vec_select: + (match_operand:V16_256 1 "register_operand" "x,v,v") (parallel [(const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) @@ -9428,12 +9589,41 @@ (define_insn "vec_extract_hi_v32qi" (set_attr "prefix" "vex,evex,evex") (set_attr "mode" "OI")]) +;; NB: *vec_extract_0 must be placed before *vec_extracthf. +;; Otherwise, it will be ignored. +(define_insn_and_split "*vec_extract_0" + [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r") + (vec_select:HF + (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m") + (parallel [(const_int 0)])))] + "TARGET_AVX512FP16 && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 1))] + "operands[1] = gen_lowpart (HFmode, operands[1]);") + +(define_insn "*vec_extracthf" + [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=r,m") + (vec_select:HF + (match_operand:V8HF 1 "register_operand" "v,v") + (parallel + [(match_operand:SI 2 "const_0_to_7_operand")])))] + "TARGET_AVX512FP16" + "@ + vpextrw\t{%2, %1, %k0|%k0, %1, %2} + vpextrw\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sselog1") + (set_attr "prefix" "maybe_evex") + (set_attr "mode" "TI")]) + ;; Modes handled by vec_extract patterns. (define_mode_iterator VEC_EXTRACT_MODE [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")]) @@ -14666,16 +14856,16 @@ (define_expand "vec_interleave_low" ;; Modes handled by pinsr patterns. (define_mode_iterator PINSR_MODE - [(V16QI "TARGET_SSE4_1") V8HI + [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16") (V4SI "TARGET_SSE4_1") (V2DI "TARGET_SSE4_1 && TARGET_64BIT")]) (define_mode_attr sse2p4_1 - [(V16QI "sse4_1") (V8HI "sse2") + [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1") (V4SI "sse4_1") (V2DI "sse4_1")]) (define_mode_attr pinsr_evex_isa - [(V16QI "avx512bw") (V8HI "avx512bw") + [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw") (V4SI "avx512dq") (V2DI "avx512dq")]) ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred. @@ -14703,11 +14893,19 @@ (define_insn "_pinsr" case 2: case 4: if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)) - return "vpinsr\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; + { + if (mode == V8HFmode) + return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; + else + return "vpinsr\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; + } /* FALLTHRU */ case 3: case 5: - return "vpinsr\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + if (mode == V8HFmode) + return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + else + return "vpinsr\t{%3, %2, %1, %0|%0, %1, %2, %3}"; default: gcc_unreachable (); } @@ -21122,16 +21320,17 @@ (define_mode_attr pbroadcast_evex_isa [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw") (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw") (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f") - (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")]) + (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f") + (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")]) (define_insn "avx2_pbroadcast" - [(set (match_operand:VI 0 "register_operand" "=x,v") - (vec_duplicate:VI + [(set (match_operand:VIHF 0 "register_operand" "=x,v") + (vec_duplicate:VIHF (vec_select: (match_operand: 1 "nonimmediate_operand" "xm,vm") (parallel [(const_int 0)]))))] "TARGET_AVX2" - "vpbroadcast\t{%1, %0|%0, %1}" + "vpbroadcast\t{%1, %0|%0, %1}" [(set_attr "isa" "*,") (set_attr "type" "ssemov") (set_attr "prefix_extra" "1") @@ -21139,17 +21338,17 @@ (define_insn "avx2_pbroadcast" (set_attr "mode" "")]) (define_insn "avx2_pbroadcast_1" - [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v") - (vec_duplicate:VI_256 + [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v") + (vec_duplicate:VIHF_256 (vec_select: - (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v") + (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v") (parallel [(const_int 0)]))))] "TARGET_AVX2" "@ - vpbroadcast\t{%1, %0|%0, %1} - vpbroadcast\t{%x1, %0|%0, %x1} - vpbroadcast\t{%1, %0|%0, %1} - vpbroadcast\t{%x1, %0|%0, %x1}" + vpbroadcast\t{%1, %0|%0, %1} + vpbroadcast\t{%x1, %0|%0, %x1} + vpbroadcast\t{%1, %0|%0, %1} + vpbroadcast\t{%x1, %0|%0, %x1}" [(set_attr "isa" "*,*,,") (set_attr "type" "ssemov") (set_attr "prefix_extra" "1") @@ -21503,15 +21702,15 @@ (define_insn "avx2_vec_dupv4df" (set_attr "mode" "V4DF")]) (define_insn "_vec_dup_1" - [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v") - (vec_duplicate:VI_AVX512BW + [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v") + (vec_duplicate:VIHF_AVX512BW (vec_select: - (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m") + (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m") (parallel [(const_int 0)]))))] "TARGET_AVX512F" "@ - vpbroadcast\t{%x1, %0|%0, %x1} - vpbroadcast\t{%x1, %0|%0, %1}" + vpbroadcast\t{%x1, %0|%0, %x1} + vpbroadcast\t{%x1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -21536,8 +21735,8 @@ (define_insn "_vec_dup" (set_attr "mode" "")]) (define_insn "_vec_dup" - [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v") - (vec_duplicate:VI12_AVX512VL + [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VI12HF_AVX512VL (vec_select: (match_operand: 1 "nonimmediate_operand" "vm") (parallel [(const_int 0)]))))] @@ -21572,8 +21771,8 @@ (define_insn "avx512f_broadcast" (set_attr "mode" "")]) (define_insn "_vec_dup_gpr" - [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v") - (vec_duplicate:VI12_AVX512VL + [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v") + (vec_duplicate:VI12HF_AVX512VL (match_operand: 1 "nonimmediate_operand" "vm,r")))] "TARGET_AVX512BW" "@ @@ -21668,7 +21867,7 @@ (define_mode_attr vecdupssescalarmodesuffix [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")]) ;; Modes handled by AVX2 vec_dup patterns. (define_mode_iterator AVX2_VEC_DUP_MODE - [V32QI V16QI V16HI V8HI V8SI V4SI]) + [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF]) (define_insn "*vec_dup" [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v") @@ -22224,12 +22423,12 @@ (define_insn "vec_set_hi_" (set_attr "prefix" "vex") (set_attr "mode" "")]) -(define_insn "vec_set_lo_v16hi" - [(set (match_operand:V16HI 0 "register_operand" "=x,v") - (vec_concat:V16HI - (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm") - (vec_select:V8HI - (match_operand:V16HI 1 "register_operand" "x,v") +(define_insn "vec_set_lo_" + [(set (match_operand:V16_256 0 "register_operand" "=x,v") + (vec_concat:V16_256 + (match_operand: 2 "nonimmediate_operand" "xm,vm") + (vec_select: + (match_operand:V16_256 1 "register_operand" "x,v") (parallel [(const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) @@ -22244,16 +22443,16 @@ (define_insn "vec_set_lo_v16hi" (set_attr "prefix" "vex,evex") (set_attr "mode" "OI")]) -(define_insn "vec_set_hi_v16hi" - [(set (match_operand:V16HI 0 "register_operand" "=x,v") - (vec_concat:V16HI - (vec_select:V8HI - (match_operand:V16HI 1 "register_operand" "x,v") +(define_insn "vec_set_hi_" + [(set (match_operand:V16_256 0 "register_operand" "=x,v") + (vec_concat:V16_256 + (vec_select: + (match_operand:V16_256 1 "register_operand" "x,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)])) - (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")))] + (match_operand: 2 "nonimmediate_operand" "xm,vm")))] "TARGET_AVX" "@ vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1} @@ -22430,6 +22629,8 @@ (define_mode_iterator VEC_INIT_MODE (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2") (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")]) @@ -22441,6 +22642,8 @@ (define_mode_iterator VEC_INIT_HALF_MODE (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") + (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") + (V8HF "TARGET_AVX512FP16") (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V4TI "TARGET_AVX512F")])