From patchwork Wed Jun 26 12:06:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manolis Tsamis X-Patchwork-Id: 1952572 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=vrull.eu header.i=@vrull.eu header.a=rsa-sha256 header.s=google header.b=VcWa6fC1; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W8L6W2shhz20X1 for ; Wed, 26 Jun 2024 22:07:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A363E3870C3C for ; Wed, 26 Jun 2024 12:07:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by sourceware.org (Postfix) with ESMTPS id A6514387086D for ; Wed, 26 Jun 2024 12:06:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A6514387086D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A6514387086D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719403613; cv=none; b=JvUffL2YKeCqw2NsR7FIGB4dy6ulC3gsnqCbiA4V8RW9KHCKERFlmWiiI4ija1OawFFbgB6Jdkg12vfAOJ5AZS9dCBfdZDpBhSBClmpDevYU4RuTKLcavBoAf/cZZ/7xP0IoqLGk9WHHuRMuM1oGTuNsppz6m/znnepqszGGupU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719403613; c=relaxed/simple; bh=bIrv3dfjhwG7TvUIQfHC3OaqBaQw/jGNtZcOJgShIlk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UnpSbfDoxuPNCI4I3DO9ZYcwHdeJLbu1OfeTrLJEL8QAi8mXFoF9OW0WQyT/n6vNU11c9KDDpTgWoH4BUmm3KZgVuFmnvxmZDJWoH0HZJI0YeQ5QLL4MVT8LIq62Udy+ZL7/oGF7eRzzmA7mz8oh2WL8Eo9OxWodtZxgmCo+3VA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-57d1679ee83so175936a12.2 for ; Wed, 26 Jun 2024 05:06:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1719403608; x=1720008408; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7yNA5Hc2TMJxTU3tRCByUfzK9hpJ+Fttp3LfNA2cmbc=; b=VcWa6fC1ro6/Me5oYC6bRo0nJiDRXRB6lcwHFXPltt1L9hn6pKY08CgsLxPn3WAR5c FcpfR+C5WgEMsIn922WUbjS5fBhriOPTcWA4Z3bNb/ZXKA4GuvKGvDApbT1IrllVfZ4r 980rt/z57QzZGSvntKP3GUl3Rqte9qG3sZiX5GapLo3VrfB8yHA9pEMCUMDTTu3HeqRq Q1HyzgtSJl9zSIT/bq6afjIAkdAOJqoqgD6dPuCFGXSRe5sTtJguZsynx67TwWrgxK3x dwv28+PGons1/kR223nJX/842Sh17P+hRPFlBeEKihA8/g83ODpvmTqR5IcOIR8Bl2bk 7zYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719403608; x=1720008408; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7yNA5Hc2TMJxTU3tRCByUfzK9hpJ+Fttp3LfNA2cmbc=; b=hUriUNaosk0hCTYe/tBqvZeE7VN1wd32Awp1Z5E9IJyPSqNGEK8nJ6i6tMwjjIpWuz nspW0sriU+jDqOcNmf2ZL6tkTk8DdEe3fYTvgmAixB8YoZIFP3ZOYwVbuGlKDeN4wxcI LWJskFuaXNvXQg2qiu252QF/JYKQHnWFHj5wGA2nli3y+MvArQ5Z09iuaKUhNiChX40H 6MLMQgWNCL0y30Rwdx1PkAf2W6EQIrnll2hcV1OnYvuwBW1+J/U5hxeWYFNiAEUUZ8c9 FTfCo/H0IosI3EMIKVsxzv0Z82sOHAbpaHAkBZcNnnE7JY0cU/WwCHQc7wCOBqwuMYie BcsQ== X-Gm-Message-State: AOJu0Yy09D2ipPnEc8v39QuWKaxmCWKxlM3EREaRrjA7fRNEfKFGkX3I ejU48HkvdrkZLwx9jaUAZnVItu8XhG+pdKHgEW/AKs9F0afxX8xrKjbUCOXK3piuMamFfMvicaR g X-Google-Smtp-Source: AGHT+IFOKAwkxDJkudNyWLl8593C2QPoUbBSpX/TZXI3NlAtgrTid6WrF1tyZaxE7XGFZm6hgreNGA== X-Received: by 2002:a17:907:c313:b0:a72:6849:cb09 with SMTP id a640c23a62f3a-a726849d75amr493033366b.53.1719403607496; Wed, 26 Jun 2024 05:06:47 -0700 (PDT) Received: from altra2.sec.univie.ac.at (altra2.sec.univie.ac.at. [131.130.126.102]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a72457301a8sm412839366b.90.2024.06.26.05.06.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jun 2024 05:06:47 -0700 (PDT) From: Manolis Tsamis To: gcc-patches@gcc.gnu.org Cc: Tamar Christina , Jiangning Liu , =?utf-8?q?Christoph_M?= =?utf-8?q?=C3=BCllner?= , "Kewen . Lin" , Philipp Tomsich , Richard Biener , Manolis Tsamis Subject: [PATCH v2] Rearrange SLP nodes with duplicate statements. [PR98138] Date: Wed, 26 Jun 2024 05:06:01 -0700 Message-ID: <20240626120610.1862412-1-manolis.tsamis@vrull.eu> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This change checks when a two_operators SLP node has multiple occurrences of the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the operands so that there are no duplicates. Two vec_perm expressions are then introduced to recreate the original ordering. These duplicates can appear due to how two_operators nodes are handled, and they prevent vectorization in some cases. This targets the vectorization of the SPEC2017 x264 pixel_satd functions. In some processors a larger than 10% improvement on x264 has been observed. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 gcc/ChangeLog: * tree-vect-slp.cc: Avoid duplicates in two_operators nodes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-slp-two-operator.c: New test. Signed-off-by: Manolis Tsamis --- Changes in v2: - Do not use predefined patterns; support rearrangement of arbitrary node orderings. - Only apply for two_operators nodes. - Recurse with single SLP operand instead of two duplicated ones. - Refactoring of code. .../aarch64/vect-slp-two-operator.c | 36 ++++++ gcc/tree-vect-slp.cc | 114 ++++++++++++++++++ 2 files changed, 150 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c new file mode 100644 index 00000000000..b6b093ffc34 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect -fdump-tree-vect-details" } */ + +typedef unsigned char uint8_t; +typedef unsigned int uint32_t; + +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\ + int t0 = s0 + s1;\ + int t1 = s0 - s1;\ + int t2 = s2 + s3;\ + int t3 = s2 - s3;\ + d0 = t0 + t2;\ + d1 = t1 + t3;\ + d2 = t0 - t2;\ + d3 = t1 - t3;\ +} + +void sink(uint32_t tmp[4][4]); + +int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 ) +{ + uint32_t tmp[4][4]; + int sum = 0; + for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 ) + { + uint32_t a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16); + uint32_t a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16); + uint32_t a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16); + uint32_t a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16); + HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 ); + } + sink(tmp); +} + +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index b47b7e8c979..60d0d388dff 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2420,6 +2420,95 @@ out: } swap = NULL; + bool has_two_operators_perm = false; + auto_vec two_op_perm_indices[2]; + vec two_op_scalar_stmts[2] = {vNULL, vNULL}; + + if (two_operators && oprnds_info.length () == 2 && group_size > 2) + { + unsigned idx = 0; + hash_map seen; + vec new_oprnds_info + = vect_create_oprnd_info (1, group_size); + bool success = true; + + enum tree_code code = ERROR_MARK; + if (oprnds_info[0]->def_stmts[0] + && is_a (oprnds_info[0]->def_stmts[0]->stmt)) + code = gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt); + + for (unsigned j = 0; j < group_size; ++j) + { + FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info) + { + stmt_vec_info stmt_info = oprnd_info->def_stmts[j]; + if (!stmt_info || !stmt_info->stmt + || !is_a (stmt_info->stmt) + || gimple_assign_rhs_code (stmt_info->stmt) != code + || skip_args[i]) + { + success = false; + break; + } + + bool exists; + unsigned &stmt_idx + = seen.get_or_insert (stmt_info->stmt, &exists); + + if (!exists) + { + new_oprnds_info[0]->def_stmts.safe_push (stmt_info); + new_oprnds_info[0]->ops.safe_push (oprnd_info->ops[j]); + stmt_idx = idx; + idx++; + } + + two_op_perm_indices[i].safe_push (stmt_idx); + } + + if (!success) + break; + } + + if (success && idx == group_size) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "Replace two_operators operands:\n"); + + FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info) + { + dump_printf_loc (MSG_NOTE, vect_location, + "Operand %u:\n", i); + for (unsigned j = 0; j < group_size; j++) + dump_printf_loc (MSG_NOTE, vect_location, "\tstmt %u %G", + j, oprnd_info->def_stmts[j]->stmt); + } + + dump_printf_loc (MSG_NOTE, vect_location, + "With a single operand:\n"); + for (unsigned j = 0; j < group_size; j++) + dump_printf_loc (MSG_NOTE, vect_location, "\tstmt %u %G", + j, new_oprnds_info[0]->def_stmts[j]->stmt); + } + + two_op_scalar_stmts[0].safe_splice (oprnds_info[0]->def_stmts); + two_op_scalar_stmts[1].safe_splice (oprnds_info[1]->def_stmts); + + new_oprnds_info[0]->first_op_type = oprnds_info[0]->first_op_type; + new_oprnds_info[0]->first_dt = oprnds_info[0]->first_dt; + new_oprnds_info[0]->any_pattern = oprnds_info[0]->any_pattern; + new_oprnds_info[0]->first_gs_p = oprnds_info[0]->first_gs_p; + new_oprnds_info[0]->first_gs_info = oprnds_info[0]->first_gs_info; + + vect_free_oprnd_info (oprnds_info); + oprnds_info = new_oprnds_info; + nops = 1; + has_two_operators_perm = true; + } + } + auto_vec children; stmt_info = stmts[0]; @@ -2691,6 +2780,29 @@ fail: the true { a+b, a+b, a+b, a+b } ... but there we don't have explicit stmts to put in so the keying on 'stmts' doesn't work (but we have the same issue with nodes that use 'ops'). */ + + if (has_two_operators_perm) + { + slp_tree child = children[0]; + children.truncate (0); + for (i = 0; i < 2; i++) + { + slp_tree pnode + = vect_create_new_slp_node (two_op_scalar_stmts[i], 2); + SLP_TREE_CODE (pnode) = VEC_PERM_EXPR; + SLP_TREE_VECTYPE (pnode) = vectype; + SLP_TREE_CHILDREN (pnode).quick_push (child); + SLP_TREE_CHILDREN (pnode).quick_push (child); + lane_permutation_t& perm = SLP_TREE_LANE_PERMUTATION (pnode); + children.safe_push (pnode); + + for (unsigned j = 0; j < stmts.length (); j++) + perm.safe_push (std::make_pair (0, two_op_perm_indices[i][j])); + } + + SLP_TREE_REF_COUNT (child) += 4; + } + slp_tree one = new _slp_tree; slp_tree two = new _slp_tree; SLP_TREE_DEF_TYPE (one) = vect_internal_def; @@ -2727,12 +2839,14 @@ fail: else SLP_TREE_LANE_PERMUTATION (node).safe_push (std::make_pair (0, i)); } + SLP_TREE_CODE (one) = code0; SLP_TREE_CODE (two) = ocode; SLP_TREE_LANES (one) = stmts.length (); SLP_TREE_LANES (two) = stmts.length (); SLP_TREE_REPRESENTATIVE (one) = stmts[0]; SLP_TREE_REPRESENTATIVE (two) = stmts[j]; + return node; }