From patchwork Sun Feb 25 21:11:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1903978 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=HvVbUAWx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Tjbz12CW1z23cb for ; Mon, 26 Feb 2024 08:11:39 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AA8743858C24 for ; Sun, 25 Feb 2024 21:11:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by sourceware.org (Postfix) with ESMTPS id 83B733858D3C for ; Sun, 25 Feb 2024 21:11:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 83B733858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 83B733858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708895477; cv=none; b=RYZeTxcSSREdFY/LznJD16Y2kdk/KRE8lnuK2p/QuXfJ+hJnl0AK4HVV9A5hxLXDLOCie9ApyncO0IM7dOIW/gHTD9r3979asbn9lP1A5otWOEgzVTcXfgtmvyE4I9defA6GLdTZqCWjBfMDvv8unar5Hzr813edI4+OfJsFSXk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708895477; c=relaxed/simple; bh=65yKuju7jIBwpr2PnmLBYYvHi74h1pMzCF+A6gYXv7s=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=dE4k5qIdrQ8L9VfXACYNGakXttlHbNvO/6Nqs989CFXvQ8fYUx4yh2SMN97f4GjFmLlVVez4UVs/2Z8y/J6H+tD9T+95unxAxvS+CAmAv1XAe5gL2sdG2hjJ7Yc2NdNAe0QTdFadc/0mTK/5rZkDNZSSIUGliXeU1f4VbDUeJY4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1dc1ff3ba1aso19234025ad.3 for ; Sun, 25 Feb 2024 13:11:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708895474; x=1709500274; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ly7/FZecdc+2rpObHG30/Qcpz6FQVRCoHcnS0aWezGk=; b=HvVbUAWxZ2O7IulbrF3XvDxI0wFymjGeJaHo0m3J6YQUAh8VIIreMm6VpSxhrYBU2r UGDpHgPUPx98mMidypQEH8nqrCffp8bjoloADUnqzgEO9hYibiwpA1iSqXXJKAISOeyt kQsMx6dfIQdYKcPaesPNHe5reN6oVv4E12xS1cSye7O1+p7tlBUH8vkwax+vQ3uxXmW/ zIe5W0TqOgiEpLPyntfU5TcSe9Ktb4exWL7d+Xnx/XmGP6mRU8074hP3N1i2IkYpdm2J bvdyDVWR3JOME7fmI+aWPsk+DuJM4B3mSyHZMlONZLbB2zfdvC3tBvSi6V5ImbO7wxno DS/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708895474; x=1709500274; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ly7/FZecdc+2rpObHG30/Qcpz6FQVRCoHcnS0aWezGk=; b=RjC+mrw3L1v+EHCsjVv4mSdzp/8JwTZmUYBd9je7Ok0TlhIj0GInzkPlyiJfnClEGu asC804F+eK0GpOqAg4dNmGJpevODKU/NIHAxCVByfuhueTgJIDlR2DGBR0aktRq9WEVa /GynoeOUu9irFFtPPESxKosW7OSNpXhp254xFdoFTqkWUEtihLftAdzC2wW4HlBtrjeM RdPeDKQAjsVcNeBjuy4Z+y6TaLx6ojTgtAKSQrvwYA2Vf+GG+gNKGeSPMjnqecNmt5J6 mo2MpfPbL+5jyY27g+uB/1ulCu9NMw6UxGSwDIxefhQbtZe+CFnvBbq/hyCQYzFDfu2G 7ljA== X-Gm-Message-State: AOJu0Yxdf91M/tQUQwC4rgCaDmb7/eB19TkyJuWH8S/6e236FEtPz/rT Wmc0Vmm+mKCv3klo3AYSnfn21shnnubOnFxBuy9r/0pOsGQvyLOuRHnG5Vo3 X-Google-Smtp-Source: AGHT+IEki1MIrHf7pDvUqQSTQHtkpaeXqzOWEtioM9QdXlyiJyAxJLKwZAXCgHgHhiqjFu/knC/giA== X-Received: by 2002:a17:903:41d1:b0:1dc:90b3:eee with SMTP id u17-20020a17090341d100b001dc90b30eeemr2917284ple.22.1708895474079; Sun, 25 Feb 2024 13:11:14 -0800 (PST) Received: from gnu-cfl-3.localdomain ([172.58.89.72]) by smtp.gmail.com with ESMTPSA id r17-20020a170903411100b001d92a58330csm2613798pld.145.2024.02.25.13.11.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Feb 2024 13:11:13 -0800 (PST) Received: from gnu-cfl-3.. (localhost [IPv6:::1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id 2C9B97403EB; Sun, 25 Feb 2024 13:11:12 -0800 (PST) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com Subject: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics Date: Sun, 25 Feb 2024 13:11:12 -0800 Message-ID: <20240225211112.3552642-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.43.2 MIME-Version: 1.0 X-Spam-Status: No, score=-3021.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org ldtilecfg and sttilecfg take a 512-byte memory block. With _tile_loadconfig implemented as extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _tile_loadconfig (const void *__config) { __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config))); } GCC sees: (parallel [ (asm_operands/v ("ldtilecfg %X0") ("") 0 [(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -64 [0xffffffffffffffc0])) [1 MEM[(const void * *)&tile_data]+0 S8 A128])] [(asm_input:DI ("m"))] (clobber (reg:CC 17 flags))]) and the memory operand size is 1 byte. As the result, the rest of 511 bytes is ignored by GCC. Implement ldtilecfg and sttilecfg intrinsics with a pointer to BLKmode to honor the 512-byte memory block. gcc/ChangeLog: PR target/114098 * config/i386/amxtileintrin.h (_tile_loadconfig): Use __builtin_ia32_ldtilecfg. (_tile_storeconfig): Use __builtin_ia32_sttilecfg. * config/i386/i386-builtin.def (BDESC): Add __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg. * config/i386/i386-expand.cc (ix86_expand_builtin): Handle IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG. * config/i386/i386.md (ldtilecfg): New pattern. (sttilecfg): Likewise. gcc/testsuite/ChangeLog: PR target/114098 * gcc.target/i386/amxtile-4.c: New test. --- gcc/config/i386/amxtileintrin.h | 4 +- gcc/config/i386/i386-builtin.def | 4 ++ gcc/config/i386/i386-expand.cc | 19 ++++++++ gcc/config/i386/i386.md | 24 ++++++++++ gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 +++++++++++++++++++++++ 5 files changed, 104 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h index d1a26e0fea5..5081b326498 100644 --- a/gcc/config/i386/amxtileintrin.h +++ b/gcc/config/i386/amxtileintrin.h @@ -39,14 +39,14 @@ extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _tile_loadconfig (const void *__config) { - __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config))); + __builtin_ia32_ldtilecfg (__config); } extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _tile_storeconfig (void *__config) { - __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config))); + __builtin_ia32_sttilecfg (__config); } extern __inline void diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 729355230b8..88dd7f8857f 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__b BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, UNKNOWN, (int) VOID_FTYPE_PVOID_INT64) BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, UNKNOWN, (int) VOID_FTYPE_PVOID_INT64) +/* LDFILECFG and STFILECFG. */ +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, CODE_FOR_ldtilecfg, "__builtin_ia32_ldtilecfg", IX86_BUILTIN_LDTILECFG, UNKNOWN, (int) VOID_FTYPE_PCVOID) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, CODE_FOR_ldtilecfg, "__builtin_ia32_sttilecfg", IX86_BUILTIN_STTILECFG, UNKNOWN, (int) VOID_FTYPE_PVOID) + /* SSE */ BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_movv4sf_internal, "__builtin_ia32_storeups", IX86_BUILTIN_STOREUPS, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V4SF) BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movntv4sf, "__builtin_ia32_movntps", IX86_BUILTIN_MOVNTPS, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V4SF) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index a4d3369f01b..17993eb837f 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -14152,6 +14152,25 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, emit_insn (pat); return 0; + case IX86_BUILTIN_LDTILECFG: + case IX86_BUILTIN_STTILECFG: + arg0 = CALL_EXPR_ARG (exp, 0); + op0 = expand_normal (arg0); + + if (!address_operand (op0, VOIDmode)) + { + op0 = convert_memory_address (Pmode, op0); + op0 = copy_addr_to_reg (op0); + } + op0 = gen_rtx_MEM (BLKmode, op0); + if (fcode == IX86_BUILTIN_LDTILECFG) + icode = CODE_FOR_ldtilecfg; + else + icode = CODE_FOR_sttilecfg; + pat = GEN_FCN (icode) (op0); + emit_insn (pat); + return 0; + case IX86_BUILTIN_LLWPCB: arg0 = CALL_EXPR_ARG (exp, 0); op0 = expand_normal (arg0); diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 6a26d966a0e..0ede6adac2f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -353,6 +353,10 @@ (define_c_enum "unspecv" [ ;; For USER_MSR support UNSPECV_URDMSR UNSPECV_UWRMSR + + ;; For AMX-TILE + UNSPECV_LDTILECFG + UNSPECV_STTILECFG ]) ;; Constants to represent rounding modes in the ROUND instruction @@ -28152,6 +28156,26 @@ (define_insn "uwrmsr" [(set_attr "prefix" "vex") (set_attr "type" "other")]) + +(define_insn "ldtilecfg" + [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "jm")] + UNSPECV_LDTILECFG)] + "TARGET_AMX_TILE" + "ldtilecfg\t%0" + [(set_attr "type" "other") + (set_attr "addr" "gpr16") + (set_attr "prefix" "vex") + (set_attr "memory" "load")]) + +(define_insn "sttilecfg" + [(set (match_operand:BLK 0 "memory_operand" "=jm") + (unspec_volatile:BLK [(const_int 0)] UNSPECV_STTILECFG))] + "TARGET_AMX_TILE" + "sttilecfg\t%0" + [(set_attr "type" "other") + (set_attr "addr" "gpr16") + (set_attr "prefix" "vex") + (set_attr "memory" "store")]) (include "mmx.md") (include "sse.md") (include "sync.md") diff --git a/gcc/testsuite/gcc.target/i386/amxtile-4.c b/gcc/testsuite/gcc.target/i386/amxtile-4.c new file mode 100644 index 00000000000..1255af2594e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amxtile-4.c @@ -0,0 +1,55 @@ +/* PR target/114098 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mamx-tile" } */ + +#include +#include + +#define MAX_ROWS 16 +#define MAX_COLS 64 +#define MAX 1024 +#define STRIDE 64 + +typedef struct __tile_config +{ + uint8_t palette_id; + uint8_t start_row; + uint8_t reserved_0[14]; + uint16_t colsb[16]; + uint8_t rows[16]; +} __tilecfg; + + +extern void bar (__tilecfg *tileinfo); + +/* Initialize tile config */ +static void +init_tile_config (__tilecfg *tileinfo) +{ + int i; + tileinfo->palette_id = 1; + tileinfo->start_row = 0; + + for (i = 0; i < 1; ++i) + { + tileinfo->colsb[i] = MAX_ROWS; + tileinfo->rows[i] = MAX_ROWS; + } + + for (i = 1; i < 4; ++i) + { + tileinfo->colsb[i] = MAX_COLS; + tileinfo->rows[i] = MAX_ROWS; + } + + _tile_loadconfig (tileinfo); +} + +void +enable_amx (void) +{ + __tilecfg tile_data = {0}; + init_tile_config (&tile_data); +} + +/* { dg-final { scan-assembler-times "pxor\[^\n\]*%xmm" 1 } } */