From patchwork Wed Apr 27 12:09:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1623011 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=n7wq0w1/; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KpHdT1RY1z9s5V for ; Wed, 27 Apr 2022 22:10:35 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1B779385782D for ; Wed, 27 Apr 2022 12:10:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1B779385782D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1651061433; bh=joSxxDjWqDPQXCJEVDVTZjzCQ5QG/+uy8IiRTkDPZcE=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=n7wq0w1/y7C2tSAnq0SE+qocDw13QcyBkmZG+R0f0nzzLLnd+WjtGD1augiTtfas+ Mg0ssjJXa8Zr89KfLSOwQJg3jjqdiUF5QpcWKKB9pMK7yMEq46xWDlyDobpdWigRwq CscgvtL+0sZX8WEyCb4rEA8P82ARnyE4RPmT5Uro= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 92524385840C for ; Wed, 27 Apr 2022 12:09:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 92524385840C Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 698D01F749; Wed, 27 Apr 2022 12:09:50 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4A74F13A39; Wed, 27 Apr 2022 12:09:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /WH1EI4yaWKFMgAAMHmgww (envelope-from ); Wed, 27 Apr 2022 12:09:50 +0000 Date: Wed, 27 Apr 2022 14:09:49 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue MIME-Version: 1.0 Message-Id: <20220427120950.4A74F13A39@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Cc: richard.sandiford@arm.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the epilogue of a vectorized epilogue. Bootstrap & regtest running on x86_64-unknown-linux-gnu. I did not verify this solves the original aarch64 testcase yet but it looks like a simpler fix and explains why I don't see the issue on the 11 branch which does otherwise the same transforms. Richard. 2022-04-27 Richard Biener PR tree-optimization/105219 * tree-vect-loop.cc (vect_transform_loop): Disable special code narrowing the vectorized epilogue epilogue max iterations when peeling for alignment was in effect. * gcc.dg/vect/pr105219.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr105219.c | 29 ++++++++++++++++++++++++++++ gcc/tree-vect-loop.cc | 2 +- 2 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c new file mode 100644 index 00000000000..0cb7ae2f4d6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */ + +#include "tree-vect.h" + +int data[128]; + +void __attribute((noipa)) +foo (int *data, int n) +{ + for (int i = 0; i < n; ++i) + data[i] = i; +} + +int main() +{ + check_vect (); + for (int start = 0; start < 16; ++start) + for (int n = 1; n < 3*16; ++n) + { + __builtin_memset (data, 0, sizeof (data)); + foo (&data[start], n); + for (int j = 0; j < n; ++j) + if (data[start + j] != j) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7bc34636bd..217abab814b 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9977,7 +9977,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) lowest_vf) - 1 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, lowest_vf) - 1); - if (main_vinfo) + if (main_vinfo && !main_vinfo->peeling_for_alignment) { unsigned int bound; poly_uint64 main_iters