From patchwork Wed Aug 2 01:31:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1815705 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=UfBw8OT+; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RFvdv1v8zz1yYC for ; Wed, 2 Aug 2023 11:34:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C09E4385843E for ; Wed, 2 Aug 2023 01:34:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C09E4385843E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690940046; bh=/sq6Z24GTesRE8GrFgpgRyVonNnO+hZ6RLGN1WP6sLk=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=UfBw8OT+v0JjtmY8aPu0rU0odAGicMoQuRHGcwXMLQKqGwN+yviPfmtDwUkYioAzv zHJNTZKmRuJ+msN/1MKKas1ee7C1y3p2LuyEKlgNpGE+94jlBUQPsXM07uc2NBpvcL 3KboDvTKlqjYqtZbkkqkJJ7e82M2bf8/KSAszN40= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (unknown [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 715223858D39 for ; Wed, 2 Aug 2023 01:33:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 715223858D39 X-IronPort-AV: E=McAfee;i="6600,9927,10789"; a="433292162" X-IronPort-AV: E=Sophos;i="6.01,248,1684825200"; d="scan'208";a="433292162" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2023 18:33:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10789"; a="758554279" X-IronPort-AV: E=Sophos;i="6.01,248,1684825200"; d="scan'208";a="758554279" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga008.jf.intel.com with ESMTP; 01 Aug 2023 18:33:42 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 003621007826; Wed, 2 Aug 2023 09:33:41 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com Subject: [PATCH] Optimize vlddqu + inserti128 to vbroadcasti128 Date: Wed, 2 Aug 2023 09:31:41 +0800 Message-Id: <20230802013141.772245-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.39.1.388.g2fc9e9ca3c In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" In [1], I propose a patch to generate vmovdqu for all vlddqu intrinsics after AVX2, it's rejected as > The instruction is reachable only as __builtin_ia32_lddqu* (aka > _mm_lddqu_si*), so it was chosen by the programmer for a reason. I > think that in this case, the compiler should not be too smart and > change the instruction behind the programmer's back. The caveats are > also explained at length in the ISA manual. So the patch is more conservative, only optimize vlddqu + vinserti128 to vbroadcasti128. vlddqu + vinserti128 will use shuffle port in addition to load port comparing to vbroadcasti128, For latency perspective,vbroadcasti is no worse than vlddqu + vinserti128. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625122.html Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/sse.md (*avx2_lddqu_inserti_to_bcasti): New pre_reload define_insn_and_split. gcc/testsuite/ChangeLog: * gcc.target/i386/vlddqu_vinserti128.c: New test. --- gcc/config/i386/sse.md | 18 ++++++++++++++++++ .../gcc.target/i386/vlddqu_vinserti128.c | 11 +++++++++++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2d81347c7b6..4bdd2b43ba7 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -26600,6 +26600,24 @@ (define_insn "avx2_vbroadcasti128_" (set_attr "prefix" "vex,evex,evex") (set_attr "mode" "OI")]) +;; optimize vlddqu + vinserti128 to vbroadcasti128, the former will use +;; extra shuffle port in addition to load port than the latter. +;; For latency perspective,vbroadcasti is no worse. +(define_insn_and_split "avx2_lddqu_inserti_to_bcasti" + [(set (match_operand:V4DI 0 "register_operand" "=x,v,v") + (vec_concat:V4DI + (subreg:V2DI + (unspec:V16QI [(match_operand:V16QI 1 "memory_operand")] + UNSPEC_LDDQU) 0) + (subreg:V2DI (unspec:V16QI [(match_dup 1)] + UNSPEC_LDDQU) 0)))] + "TARGET_AVX2 && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) + (vec_concat:V4DI (match_dup 1) (match_dup 1)))] + "operands[1] = adjust_address (operands[1], V2DImode, 0);") + ;; Modes handled by AVX vec_dup patterns. (define_mode_iterator AVX_VEC_DUP_MODE [V8SI V8SF V4DI V4DF]) diff --git a/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c new file mode 100644 index 00000000000..29699a5fa7f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O2" } */ +/* { dg-final { scan-assembler-times "vbroadcasti128" 1 } } */ +/* { dg-final { scan-assembler-not {(?n)vlddqu.*xmm} } } */ + +#include +__m256i foo(void *data) { + __m128i X1 = _mm_lddqu_si128((__m128i*)data); + __m256i V1 = _mm256_broadcastsi128_si256 (X1); + return V1; +}