From patchwork Fri Mar 22 15:54:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1914987 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=J0BI0bh8; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4V1RlD2ZcXz1yXt for ; Sat, 23 Mar 2024 02:56:24 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 51FD7385841C for ; Fri, 22 Mar 2024 15:56:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by sourceware.org (Postfix) with ESMTPS id 3D70E3858D32 for ; Fri, 22 Mar 2024 15:55:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3D70E3858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3D70E3858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::431 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711122909; cv=none; b=JGOCsChw4y8nk4kkIf8RULpE/gPZfq8dvXXt7T22bsu+jzjAqfFNCh1G5/gj06cpTdGSdSMYURRxD2TbnpCPK61VDc5nyxmtiWrC1TfMM1l1RMqG/mLR8p1ap4u3O+5tv2qD0vYhlyB+HkggwIhZJlzMdTdRuuXoKbwEcd6zatw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711122909; c=relaxed/simple; bh=wZwprub3wPMHtjRP+BGGlDAskVPFTUg/0bKp1s7HTGg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=kMZXbfZKHV2A437kCGuQvlmOWsjwfdt2tUyOX1dSDJ1zLZTHoBarTvMaVUkFpwXWD9Np9Mtubx4uKL6NpOV+RhP2DMa9fknbyQy2+mf3Ky8HfRBKu7Iqk8gfxvh54Kv8KQxxaI1p3XvzAODoWygZolor4Bbd7GDnC/NNFhc2IkI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-34100f4f9a2so1157869f8f.2 for ; Fri, 22 Mar 2024 08:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1711122905; x=1711727705; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=bwzlUEACk33b2cSVgvi2xTxpfiZTjnUqqT8n7JdIo9c=; b=J0BI0bh8FaYtvxka7DVdWDgLhTfpSf45GgXSDVJ3iD9NCja+rx3kU7CYL8nxnsB9tl /0qtU02CwF9UbPwYsZiERa/E/pTADvhomB0EgmszhWmTSAnoFF3cqH6vehOTlsz6wMGM r2oMa3rmjC44vAriyID6MRtMhLFrVbkvFRQQfzL6pRHiN1nyuPMnAdIo62ZgHon8aAFu ksw7RiilP3PPAjVECDRRLYs2Bo6oxoA/LtCeTv56E1m3kLWoGiG3ylqwi6FFMxGSQ7MN UtvIwekA3Sl6QcMkaoxX1WQ+linciM3EASI+DQDTOKUnloFdLJBLoapr7pTqGMXUlxdn hWKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711122905; x=1711727705; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bwzlUEACk33b2cSVgvi2xTxpfiZTjnUqqT8n7JdIo9c=; b=RVHrGtSZERGMRkFclHnhuZKEXZ0lmhzxNC3r03bu9SDSSDyWStOYd2W9PH6P81zA8S 1Pqil4fV1ex8MdYwP5UIVYWoQZblis2jpqRUv2dDsUePVBX1PO5LjRp0dCB+0jJL78N5 RzYvpublTXbBf4fwPDXgn0+fmriIes9NY1lLwH7hjyAaqbFgHehykYyGTKsuy77HlDqg ZgF2zWYn3Xtd8shDy2+Bo7kxwirwW11dFRzdiDhVirNbACOOnWuNZbPAgWxUmpHQkFMc hQwBVS3gNsABztwITspvyicWdUx19LsThWodzl9hWdsEI0UQcobNZvthMVzF6sSnXSOt /XRg== X-Gm-Message-State: AOJu0Yxp9tdY1Ofz+Qnp5ybu2gxXJZBAXzJjTvydoniYxfLNfX5USOvp hvi0d2/k4MsFha2Lk0OnVjasYFeIKftu6eR/nYTQPZdh1n8nbL1xn0WCXmIl1fBvRkSkyZgqoNU sobPWlQ== X-Google-Smtp-Source: AGHT+IFdy37iQALIpNqdIHj2/EgBXXOiWOlaVGbTYGdosi1q4D1iTPmrN1EZWL0SiLa2v9e0qaohhg== X-Received: by 2002:a5d:4089:0:b0:33d:be93:5024 with SMTP id o9-20020a5d4089000000b0033dbe935024mr1590415wrp.58.1711122905084; Fri, 22 Mar 2024 08:55:05 -0700 (PDT) Received: from localhost.localdomain (hawk-18-b2-v4wan-167765-cust1304.vm26.cable.virginm.net. [82.41.69.25]) by smtp.googlemail.com with ESMTPSA id z17-20020a056000111100b0033ecbfc6941sm2324548wrw.110.2024.03.22.08.55.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 08:55:04 -0700 (PDT) From: Andrew Stubbs To: gcc-patches@gcc.gnu.org Subject: [committed] amdgcn: Adjust GFX10/GFX11 cache coherency Date: Fri, 22 Mar 2024 15:54:49 +0000 Message-ID: <20240322155449.747518-2-ams@baylibre.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The RDNA devices have different cache architectures to the CDNA devices, and the differences go deeper than just the assembler mnemonics, so we probably need to generate different code to maintain coherency across the whole device. I believe this patch is correct according to the documentation in the LLVM AMDGPU user guide (the ISA manual is less instructive), but I hadn't observed any real problems before (or after). Committed to mainline. Andrew gcc/ChangeLog: * config/gcn/gcn.md (*memory_barrier): Split into RDNA and !RDNA. (atomic_load): Adjust RDNA cache settings. (atomic_store): Likewise. (atomic_exchange): Likewise. --- gcc/config/gcn/gcn.md | 86 +++++++++++++++++++++++++++---------------- 1 file changed, 55 insertions(+), 31 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 3b51453aaca..574c2f87e8c 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -1960,11 +1960,19 @@ (define_insn "*memory_barrier" [(set (match_operand:BLK 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] - "" - "{buffer_wbinvl1_vol|buffer_gl0_inv}" + "!TARGET_RDNA2_PLUS" + "buffer_wbinvl1_vol" [(set_attr "type" "mubuf") (set_attr "length" "4")]) +(define_insn "*memory_barrier" + [(set (match_operand:BLK 0) + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + "TARGET_RDNA2_PLUS" + "buffer_gl1_inv\;buffer_gl0_inv" + [(set_attr "type" "mult") + (set_attr "length" "8")]) + ; FIXME: These patterns have been disabled as they do not seem to work ; reliably - they can cause hangs or incorrect results. ; TODO: flush caches according to memory model @@ -2094,9 +2102,13 @@ case 0: return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)"; case 1: - return "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0"; + return (TARGET_RDNA2 /* Not GFX11. */ + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0" + : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0"); case 2: - return "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)"; + return (TARGET_RDNA2 /* Not GFX11. */ + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)" + : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)"); } break; case MEMMODEL_CONSUME: @@ -2108,15 +2120,21 @@ return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)\;" "s_dcache_wb_vol"; case 1: - return (TARGET_RDNA2_PLUS + return (TARGET_RDNA2 + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0\;" + "buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 ? "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;" - "buffer_gl0_inv" + "buffer_gl1_inv\;buffer_gl0_inv" : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;" "buffer_wbinvl1_vol"); case 2: - return (TARGET_RDNA2_PLUS + return (TARGET_RDNA2 + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)\;" + "buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 ? "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;" - "buffer_gl0_inv" + "buffer_gl1_inv\;buffer_gl0_inv" : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;" "buffer_wbinvl1_vol"); } @@ -2130,15 +2148,21 @@ return "s_dcache_wb_vol\;s_load%o0\t%0, %A1 glc\;" "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; case 1: - return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;" - "s_waitcnt\t0\;buffer_gl0_inv" + return (TARGET_RDNA2 + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc dlc\;" + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;" + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;flat_load%o0\t%0, %A1%O1 glc\;" "s_waitcnt\t0\;buffer_wbinvl1_vol"); case 2: - return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\;" - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" + return (TARGET_RDNA2 + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc dlc\;" + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" + : TARGET_RDNA3 + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\;" + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;global_load%o0\t%0, %A1%O1 glc\;" "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); } @@ -2147,7 +2171,7 @@ gcc_unreachable (); } [(set_attr "type" "smem,flat,flat") - (set_attr "length" "20") + (set_attr "length" "28") (set_attr "gcn_version" "gcn5,*,gcn5") (set_attr "rdna" "no,*,*")]) @@ -2180,11 +2204,11 @@ return "s_dcache_wb_vol\;s_store%o1\t%1, %A0 glc"; case 1: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc" + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc" : "buffer_wbinvl1_vol\;flat_store%o1\t%A0, %1%O0 glc"); case 2: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc" + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc" : "buffer_wbinvl1_vol\;global_store%o1\t%A0, %1%O0 glc"); } break; @@ -2198,14 +2222,14 @@ "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; case 1: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc\;" - "s_waitcnt\t0\;buffer_gl0_inv" + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc\;" + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;flat_store%o1\t%A0, %1%O0 glc\;" "s_waitcnt\t0\;buffer_wbinvl1_vol"); case 2: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc\;" - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc\;" + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;global_store%o1\t%A0, %1%O0 glc\;" "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); } @@ -2214,7 +2238,7 @@ gcc_unreachable (); } [(set_attr "type" "smem,flat,flat") - (set_attr "length" "20") + (set_attr "length" "28") (set_attr "gcn_version" "gcn5,*,gcn5") (set_attr "rdna" "no,*,*")]) @@ -2253,13 +2277,13 @@ case 1: return (TARGET_RDNA2_PLUS ? "flat_atomic_swap\t%0, %1, %2 glc\;s_waitcnt\t0\;" - "buffer_gl0_inv" + "buffer_gl1_inv\;buffer_gl0_inv" : "flat_atomic_swap\t%0, %1, %2 glc\;s_waitcnt\t0\;" "buffer_wbinvl1_vol"); case 2: return (TARGET_RDNA2_PLUS ? "global_atomic_swap\t%0, %A1, %2%O1 glc\;" - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" : "global_atomic_swap\t%0, %A1, %2%O1 glc\;" "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); } @@ -2273,13 +2297,13 @@ "s_waitcnt\tlgkmcnt(0)"; case 1: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2 glc\;" + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2 glc\;" "s_waitcnt\t0" : "buffer_wbinvl1_vol\;flat_atomic_swap\t%0, %1, %2 glc\;" "s_waitcnt\t0"); case 2: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;" + ? "buffer_gl1_inv\;buffer_gl0_inv\;" "global_atomic_swap\t%0, %A1, %2%O1 glc\;" "s_waitcnt\tvmcnt(0)" : "buffer_wbinvl1_vol\;" @@ -2297,15 +2321,15 @@ "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; case 1: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2 glc\;" - "s_waitcnt\t0\;buffer_gl0_inv" + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2 glc\;" + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;flat_atomic_swap\t%0, %1, %2 glc\;" "s_waitcnt\t0\;buffer_wbinvl1_vol"); case 2: return (TARGET_RDNA2_PLUS - ? "buffer_gl0_inv\;" + ? "buffer_gl1_inv\;buffer_gl0_inv\;" "global_atomic_swap\t%0, %A1, %2%O1 glc\;" - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" : "buffer_wbinvl1_vol\;" "global_atomic_swap\t%0, %A1, %2%O1 glc\;" "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); @@ -2315,7 +2339,7 @@ gcc_unreachable (); } [(set_attr "type" "smem,flat,flat") - (set_attr "length" "20") + (set_attr "length" "28") (set_attr "gcn_version" "gcn5,*,gcn5") (set_attr "rdna" "no,*,*")])