From patchwork Tue Dec 12 14:37:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1875191 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=ucw.cz header.i=@ucw.cz header.a=rsa-sha256 header.s=gen1 header.b=fRJKMoIN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SqLnZ15kTz1ySd for ; Wed, 13 Dec 2023 01:38:10 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E91793858418 for ; Tue, 12 Dec 2023 14:38:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 0A8443858C2F for ; Tue, 12 Dec 2023 14:37:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0A8443858C2F Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kam.mff.cuni.cz ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0A8443858C2F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.113.20.16 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702391875; cv=none; b=t5JlvmDD06YaoFqs+S9Ye74edDLPPm3WhmSOD16QWP3F3Y0ukx44VOwPuR1X2GLq3e1MvqJ+MGrkeTUANsYW9v8ggoIW+ShCC3T2oKN2WPWOtEJtdQRQlR3P9l1cIgcVy3sBoSVJYwAlG5g1K6VOpjUYV6SvEwcLfbWJ3I8OOCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702391875; c=relaxed/simple; bh=eU0dNGOKmUHqLrCRejlwpVKGqOuEBmsIkxuytB3ZsiQ=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=QZZ6A2CFCjv7Tg+lFAq6iA2wjaxQkpZtnbs9w1ulAquREiJkS+ftDrCCkv8uE4mBYvPGQK9nTmS/9rXLaRMbktcYpY4t2zr1IgrRwWmr+/+SFvRWA/408sXgI1JJ7ArLghd/RkCKnuO6/a+oMyJggI8rUWxpUjOyptB8TnEQ/1s= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 23DDF28C191; Tue, 12 Dec 2023 15:37:53 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1702391873; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=oxvZ99/IW5iag+utDUWot3KjVEIgV9+Oqb2tmmDMWxY=; b=fRJKMoINtI3ziIEuBZ6z/rT4wvCMG1MX8jRQG21cWm+qndDXJX9TBYYGnR5I2OUZQMiOR5 fCLtq5uYJUuCIeEFaT6tV5xSBB4+ZzitBMdLVxu9Rz2aTD1YkgfakAJ/1XVOfOOqi/2ny3 Vte5f0vTOYYOOwo5JIc8JO9AbdmDEfQ= Date: Tue, 12 Dec 2023 15:37:53 +0100 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com, hongjiu.lu@intel.com Subject: Disable FMADD in chains for Zen4 and generic Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, JMQ_SPF_NEUTRAL, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, this patch disables use of FMA in matrix multiplication loop for generic (for x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U. For Intel this is neutral both on the matrix multiplication microbenchmark (attached) and spec2k17 where the difference was within noise for Core. On core the micro-benchmark runs as follows: With FMA: 578,500,241 cycles:u # 3.645 GHz ( +- 0.12% ) 753,318,477 instructions:u # 1.30 insn per cycle ( +- 0.00% ) 125,417,701 branches:u # 790.227 M/sec ( +- 0.00% ) 0.159146 +- 0.000363 seconds time elapsed ( +- 0.23% ) No FMA: 577,573,960 cycles:u # 3.514 GHz ( +- 0.15% ) 878,318,479 instructions:u # 1.52 insn per cycle ( +- 0.00% ) 125,417,702 branches:u # 763.035 M/sec ( +- 0.00% ) 0.164734 +- 0.000321 seconds time elapsed ( +- 0.19% ) So the cycle count is unchanged and discrete multiply+add takes same time as FMA. While on zen: With FMA: 484875179 cycles:u # 3.599 GHz ( +- 0.05% ) (82.11%) 752031517 instructions:u # 1.55 insn per cycle 125106525 branches:u # 928.712 M/sec ( +- 0.03% ) (85.09%) 128356 branch-misses:u # 0.10% of all branches ( +- 0.06% ) (83.58%) No FMA: 375875209 cycles:u # 3.592 GHz ( +- 0.08% ) (80.74%) 875725341 instructions:u # 2.33 insn per cycle 124903825 branches:u # 1.194 G/sec ( +- 0.04% ) (84.59%) 0.105203 +- 0.000188 seconds time elapsed ( +- 0.18% ) The diffrerence is that Cores understand the fact that fmadd does not need all three parameters to start computation, while Zen cores doesn't. Since this seems noticeable win on zen and not loss on Core it seems like good default for generic. I plan to commit the patch next week if there are no compplains. Honza #include #include #define SIZE 1000 float a[SIZE][SIZE]; float b[SIZE][SIZE]; float c[SIZE][SIZE]; void init(void) { int i, j, k; for(i=0; i