From patchwork Fri Apr 12 10:31:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1923046 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=p838lsfw; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=p838lsfw; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VGCXz6Ggbz1yYP for ; Fri, 12 Apr 2024 20:31:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 859733858C33 for ; Fri, 12 Apr 2024 10:31:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2048.outbound.protection.outlook.com [40.107.20.48]) by sourceware.org (Postfix) with ESMTPS id 813603858D38 for ; Fri, 12 Apr 2024 10:31:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 813603858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 813603858D38 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.48 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712917881; cv=pass; b=qfpkcxLH5FIg2643WtQTOeHUAI7CaNyVP7j6XTrGrUTfO7ale/fjtp6BXIg2UVAlhs1pVrlmXMgQmZqoYzt5kUdLgyO1oKsUyy+6EKZ1VXZQu/gtv1Ijyf0wN8QdwgJkqwih4qsWUTG10S+X0OFCMC8nvzq4h8XM63xapyK7zOU= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712917881; c=relaxed/simple; bh=jGC2RZWCv+GprYVnen0M7ImKH+8kkMkmfMuas6+L8XY=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=hl0Mmyo6AsRGxDhzF1oK5DrvBdA7IQVgw8agIVfah5qOY/zwgTXuPOth/3019iO2Y7OjaJP3NQWIRsscRLr1C0uodtVy5pSdrqbPDHlPiZbvCrF5BBDKubIEEwpnJdIzP444QT3d9qfvYUZ3RdR4OXWDTCGv6wQKveccvnfRol8= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=M+f/da54Yix6TQFGzeCCUPj6BrSwVh2GSg1H5mpP0pGNgLysPZtaUjKIGCISLiOqpMuBSCAvY5yJtd1cfsUMGYqYwsta8w8NYvHjmEsi9LcC0Ju9l6aeJG0TzggQ8eZaRdtOj9rfYKLbY4eXBVvbwxqc+U9Zfy+r7vYRTsYdYy4csLV4Hz1rNs9EuG+53rYFDsAYnUg/Lipq/asp8ueZmnXixbBhTiBp5RNGdbWBAvkfwp4snNdEXOs975yR/RHdDBwQdDMo729Yi3Oxq5xnOPcz+i7/zasJ31Ic5t1uUaFDr6gvk6VOCE/nsu675PRqjid08dlHW154UH/Gwp40fg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6U4Q4aQprMhM1HooLUNjus0pyHlC1VIL8FwZYM3wYNk=; b=YK5izUCBWXo2acZp99yUy8g9egFJfbnSxUyTfLOU32xeUvgzZR6G6ozLFWyoYWAQcgjq5wujoI7aOdxxA+KiW/yDyV7CS26NbLJwHiV78EZAWZ8JayJMBtYSeZB/PhKdtxcutk8EDXqfF+aFz8L7bb61S4BPI4X18K64lj0VJae/17zzIVv0d21bc4qR3vLNjGX6w3HU8f+cq4QgtaFTXdbnPfHmyqxyiSFOAOSBJ5KXUBP/gxfuAIjCxAFTpv2fBaBR6FbvuUoiYbDedfiOL8ZtVVGj+arJWMwmWvGaAuEt8hy0cKcxtJeCmOd3baccUfbHTf2hR0lOc5v1x8ZQ2A== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6U4Q4aQprMhM1HooLUNjus0pyHlC1VIL8FwZYM3wYNk=; b=p838lsfwkgFFA+GN+dXf61ZP3xgB6kg9Hwle8SnCTMTXDcyZW6mL2COdy+Jovo8RAjsXiDq4UuZSiLb5HLI2Fs7PIIzZxSntgbony0h0f7SBC/bOxZrFnig7s9koqz/VU6EMnjm2mnmR7vcz5CEAFZejkHd4aNoeY0lBrhxZxxY= Received: from AS9PR05CA0223.eurprd05.prod.outlook.com (2603:10a6:20b:494::19) by AS2PR08MB9918.eurprd08.prod.outlook.com (2603:10a6:20b:544::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Fri, 12 Apr 2024 10:31:15 +0000 Received: from AM4PEPF00025F98.EURPRD83.prod.outlook.com (2603:10a6:20b:494:cafe::fc) by AS9PR05CA0223.outlook.office365.com (2603:10a6:20b:494::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7472.26 via Frontend Transport; Fri, 12 Apr 2024 10:31:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00025F98.mail.protection.outlook.com (10.167.16.7) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7495.0 via Frontend Transport; Fri, 12 Apr 2024 10:31:15 +0000 Received: ("Tessian outbound f623ef0bff48:v300"); Fri, 12 Apr 2024 10:31:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 57f9cc0f111dc2e1 X-CR-MTA-TID: 64aa7808 Received: from 6c8fc4144151.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F4DC0663-6BC7-4B72-B3BE-AFDB8CC0211F.1; Fri, 12 Apr 2024 10:31:07 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 6c8fc4144151.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 12 Apr 2024 10:31:07 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iM1v7mkH7RDoA+pXtOaLpyL5C+ZlSiJWefaFoqT/ruo1ScxfaBtPUxJCjVSXTNv+nYMv0aZMBfuOabEzZfwdXMPnhJTb88lIFaJc4rpzCjx263YsLwld5D/Z/NfcmRecQ3OID7gQsuqKtVX5PKFeFivfvgGmnek1z1Qv2DJiDcG/pFUpRhDJXq41MDfWsL2kEX8yb/hJOCL/kOjDUCezMoP7qCOTAbjntH2jF1PGMLssjI2kC1eHJz5wLwN+6/0V+2bDZAq5UGeT8afGMN4K5EF9fsj+J8ipj1saYKOKMV8bj0IYkTjdSBsCSL5jCMzz8vmOURt5s2XPebkQzuLqoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6U4Q4aQprMhM1HooLUNjus0pyHlC1VIL8FwZYM3wYNk=; b=JtqrI+h2Wk0ljsOZNSu05peQsXB9GbeBhfTcOnqH5D23EPLY+yArCLWXofL90yzdh0U8cRXc7bgHvC6tjYiOqtEl7sf8cwzd9fEo5vKIwp+ILAf/1bzmIsg3BsWYCb9tMrl4fNX4X6xne9IhZjW/D/m9WUbtz/IyiRAbIZg4BefRJ+lB22fUDVt547TavdNLAFQyV44fMvzQbMhqgi2pJz7UO6RLIUcOQYmlo5gADxy3P5vD6MAeJZY5K7yX62W872YNggzrLh44m+9D4eaiXw7zGQSHiWcQbIQ6t3VhZZCiIlMpahgnPFx56I9Bv8FPsafe2CVx9WjcrsRaA5takw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6U4Q4aQprMhM1HooLUNjus0pyHlC1VIL8FwZYM3wYNk=; b=p838lsfwkgFFA+GN+dXf61ZP3xgB6kg9Hwle8SnCTMTXDcyZW6mL2COdy+Jovo8RAjsXiDq4UuZSiLb5HLI2Fs7PIIzZxSntgbony0h0f7SBC/bOxZrFnig7s9koqz/VU6EMnjm2mnmR7vcz5CEAFZejkHd4aNoeY0lBrhxZxxY= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AM0PR08MB5410.eurprd08.prod.outlook.com (2603:10a6:208:182::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.55; Fri, 12 Apr 2024 10:31:05 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::a0e:800c:c8b2:5ff0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::a0e:800c:c8b2:5ff0%4]) with mapi id 15.20.7409.053; Fri, 12 Apr 2024 10:31:05 +0000 Date: Fri, 12 Apr 2024 11:31:02 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH]middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403]. Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO4P123CA0667.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:316::20) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AM0PR08MB5410:EE_|AM4PEPF00025F98:EE_|AS2PR08MB9918:EE_ X-MS-Office365-Filtering-Correlation-Id: 06201d75-cec0-4000-23a3-08dc5adbaed9 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: wnD/48vH64Ldu2CFIVbedNtpUfQ4amQt1l9GCSmhhtQrwXkqrlbwsiHyjpEWa/Kuj3si0MCd63xKrDjiJ1z6bXmOGFbKAbeWnzLpt8jWfPI41nf7LR9psxC1F8HggvRCxjqmZXZdmiLUWUQoAdcbq3V5p6HEU79PTqALQ8XeLXRoDk+XYG2TwDGH+W0Xd4Jjus07lkuwBif3wmPA+YbWYfYANGavB2EHqS7VtJi2hVc2xJQM2HjwIXZ15AhTpds/Yak39spZz4YsYtKIPT3jY0ac3aNzhnM1eNBdd4YSGnD5Pzr+7hh5WoQk/50TY61bHXkwSf8NV88ejLjnRWmpc6NO5Ri2Dfw1n6d2z0WLJYj5CLtgFBy/470NDd593zdD+6MGvNMEyKQoXW4yKLPefbb9ijtrHWoeWz9PuRc/USTml1NVkje5EHwc9FBtsNmSO9GVH4oFcxFRg0vuQPeZWLpsyfhaTIR7lJByoUPfOBS5KshnPr9nxpHGgCQ8jhY9LQy5F37fmdZ+2dJiZVoBpGwSy6wuooGBrUyeBasyZ4t9vu4f4pPbsgj0SUp1ICiAf7RaXqbN6UWvM82YZZv9fuyXKwBAKAPDDNVziUEv/ys= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(376005)(1800799015)(366007); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5410 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00025F98.EURPRD83.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 118bd496-16c5-4988-14b4-08dc5adba865 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +f2u96mbUhYfpM2aUt0gSOWCRZeQn5ajAPzrwZ1I0n5khAcMeXCo15AGYDmcsq8o0bzIjCz2PKJuArDak1r1E3ZP2upsiNUFkkgBw4HGLqfnUqHhInfdm9PsPx/U43/sr4AoUt7sr9UtRvWknj9hzSqFljZbSqjo/OvraP6B51Czwl5q/+JfuMg++ztIiDsBO+yzDLURUTg/stw8z0xvZNXooWOhrsjY45aoRIlYDS5jL48WW6Lvf6AAnFylNVNk4fhTAhzvuWA6ORE1GjZJanX1JsmMR9wig6j30vhceRATlw6appCuKkA6gjT+TyvElxNvUlodKpwlnoE+FnXGDs1p6f5pMsh+ZV5BoBp21VCSyO3ZIiKdC6aQFmxSICsC0cKJa4Obes85/Rk8Dmptc0Z+Sk2iwgVPUZCTEkuee3+pBKHq951qBTgHsbHdcZCJ6JNngFb1OnG5or+UxXPFEh0CGzKZHNQm6sH/IpbGms/WktF2/Nfe9QMMma1tRiEVR4545fFwdhqkqHzALWzR1jkiUWGSfdOLuYr8V4QHvifEq2N1Pzc39qTOP718R+/nTNaBH3scYNiZ4awWgtftWQQHQWH2J0o8ToqdhqF9VT0eQcduXpEGt3BL1Own0K1hTpZIOqJD1bNqsiuSBujqXdsdalY08nR75CctT71e4aUCQQd8WP8a/yDhZNE1Gc8IZ5MMo9/jrXRAApqG7xp9pg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(1800799015)(376005)(82310400014)(36860700004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Apr 2024 10:31:15.6791 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 06201d75-cec0-4000-23a3-08dc5adbaed9 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00025F98.EURPRD83.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9918 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, This is a story all about how the peeling for gaps introduces a bug in the upper bounds. Before I go further, I'll first explain how I understand this to work for loops with a single exit. When peeling for gaps we peel N < VF iterations to scalar. This happens by removing N iterations from the calculation of niters such that vect_iters * VF == niters is always false. In other words, when we exit the vector loop we always fall to the scalar loop. The loop bounds adjustment guarantees this. Because of this we potentially execute a vector loop iteration less. That is, if you're at the boundary condition where niters % VF by peeling one or more scalar iterations the vector loop executes one less. This is accounted for by the adjustments in vect_transform_loops. This adjustment happens differently based on whether the the vector loop can be partial or not: Peeling for gaps sets the bias to 0 and then: when not partial: we take the floor of (scalar_upper_bound / VF) - 1 to get the vector latch iteration count. when loop is partial: For a single exit this means the loop is masked, we take the ceil to account for the fact that the loop can handle the final partial iteration using masking. Note that there's no difference between ceil an floor on the boundary condition. There is a difference however when you're slightly above it. i.e. if scalar iterates 14 times and VF = 4 and we peel 1 iteration for gaps. The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in effect the partial iteration is ignored and it's done as scalar. This is fine because the niters modification has capped the vector iteration at 2. So that when we reduce the induction values you end up entering the scalar code with ind_var.2 = ind_var.1 + 2 * VF. Now lets look at early breaks. To make it esier I'll focus on the specific testcase: char buffer[64]; __attribute__ ((noipa)) buff_t *copy (buff_t *first, buff_t *last) { char *buffer_ptr = buffer; char *const buffer_end = &buffer[SZ-1]; int store_size = sizeof(first->Val); while (first != last && (buffer_ptr + store_size) <= buffer_end) { const char *value_data = (const char *)(&first->Val); __builtin_memcpy(buffer_ptr, value_data, store_size); buffer_ptr += store_size; ++first; } if (first == last) return 0; return first; } Here the first, early exit is on the condition: (buffer_ptr + store_size) <= buffer_end and the main exit is on condition: first != last This is important, as this bug only manifests itself when the first exit has a known constant iteration count that's lower than the latch exit count. because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing 16 bytes per iteration. So the exit has a known bounds of 8 + 1. The vectorizer correctly analizes this: Statement (exit)if (ivtmp_21 != 0) is executed at most 8 (bounded by 8) + 1 times in loop 1. and as a consequence the IV is bound by 9: # vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)> ... vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615 }; mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 }; if (mask_patt_22.17_126 == { -1, -1, -1, -1 }) goto ; [88.89%] else goto ; [11.11%] The imporant bits are this: In this example the value of last - first = 416. the calculated vector iteration count, is: x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27 the bounds generated, adjusting for gaps: x == (((x - 1) >> 2) << 2) which means we'll always fall through to the scalar code. as intended. Here are two key things to note: 1. In this loop, the early exit will always be the one taken. When it's taken we enter the scalar loop with the correct induction value to apply the gap peeling. 2. If the main exit is taken, the induction values assumes you've finished all vector iterations. i.e. it assumes you have completed 24 iterations, as we treat the main exit the same for normal loop vect and early break when not PEELED. This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 * VF; So what's going wrong. The vectorizer's codegen is correct and efficient, however when we adjust the upper bounds, that code knows that the loops upper bound is based on the early exit. i.e. 8 latch iterations. or in other words. It thinks the loop iterates once. This is incorrect as the vector loop iterates twice, as it has set up the induction value such that it exits at the early exit. So it in effect iterates 2.5x times. Becuase the upper bound is incorrect, when we unroll it now exits from the main exit which uses the incorrect induction value. So there are three ways to fix this: 1. If we take the position that the main exit should support both premature exits and final exits then vect_update_ivs_after_vectorizer needs to be skipped for this case, and vectorizable_induction updated with third case where we reduce with LAST reduction based on the IVs instead of assuming you're at the end of the vector loop. I don't like this approach. It don't think we should add a third induction style to cover up an issue introduced by unrolling. It makes the code harder to follow and makes main exits harder to reason about. 2. We could say that vec_init_loop_exit_info should pick the exit which has the smallest known iteration count. This would turn this case into a PEELED case and the induction values would be correct as we'd always recalculate them from a reduction. This is suboptimal though as the reason we pick the latch exit as the IV one is to prevent having to rotate the loop. This results in more efficient code for what we assume is the common case, i.e. the main exit. 3. In PR113734 we've established that for vectorization of early breaks that we must always treat the loop as partial. Here partiallity means that we have enough vector elements to start the iteration, but we may take an early exit and so never reach the latch/main exit. This requirement is overwritten by the peeling for gaps adjustment of the upper bound. I believe the bug is simply that this shouldn't be done. The adjustment here is to indicate that the main exit always leads to the scalar loop when peeling for gaps. But this invariant is already always true for all early exits. Remember that early exits restart the scalar loop at the start of the vector iteration, so the induction values will start it where we want to do the gaps peeling. I think no# 3 is the correct fix, and also one that doesn't degrade code quality. Note: I used memcpy and memcmp in the testcase, I'm not sure if I can rely on these being inlined? but I also don't know how to test for library support. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/114403 * tree-vect-loop.cc (vect_transform_loop): Adjust upper bounds for when peeling for gaps and early break. gcc/testsuite/ChangeLog: PR tree-optimization/114403 * gcc.dg/vect/vect-early-break_124-pr114403.c: New test. --- -- diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c new file mode 100644 index 0000000000000000000000000000000000000000..ae5e53efc45e7bef89c5a72abd6afa48292668db --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c @@ -0,0 +1,74 @@ +/* { dg-add-options vect_early_break } */ +/* { dg-require-effective-target vect_early_break_hw } */ +/* { dg-require-effective-target vect_long_long } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include "tree-vect.h" + +typedef unsigned long PV; +typedef struct _buff_t { + int foo; + PV Val; +} buff_t; + +#define NUM 9 +#define SZ NUM * sizeof (PV) +char buffer[SZ]; + +__attribute__ ((noipa)) +buff_t *copy (buff_t *first, buff_t *last) +{ + char *buffer_ptr = buffer; + char *const buffer_end = &buffer[SZ-1]; + int store_size = sizeof(first->Val); + while (first != last && (buffer_ptr + store_size) <= buffer_end) + { + const char *value_data = (const char *)(&first->Val); + __builtin_memcpy(buffer_ptr, value_data, store_size); + buffer_ptr += store_size; + ++first; + } + + if (first == last) + return 0; + + return first; +} + +int main () +{ + /* Copy an ascii buffer. We need to trigger the loop to exit from + the condition where we have more data to copy but not enough space. + For this test that means that OVL must be > SZ. */ +#define OVL NUM*2 + char str[OVL]="abcdefghiabcdefgh\0"; + buff_t tmp[OVL]; + +#pragma GCC novector + for (int i = 0; i < OVL; i++) + tmp[i].Val = str[i]; + + buff_t *start = &tmp[0]; + buff_t *last = &tmp[OVL-1]; + buff_t *res = 0; + + /* This copy should exit on the early exit, in which case we know + that start != last as we had more data to copy but the buffer + was full. */ + if (!(res = copy (start, last))) + __builtin_abort (); + + /* Check if we have the right reduction value. */ + if (res != &tmp[NUM-1]) + __builtin_abort (); + + int store_size = sizeof(PV); +#pragma GCC novector + for (int i = 0; i < NUM - 1; i+=store_size) + if (0 != __builtin_memcmp (buffer+i, (char*)&tmp[i].Val, store_size)) + __builtin_abort (); + + return 0; +} + diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 4375ebdcb493a90fd0501cbb4b07466077b525c3..024a24a305c4727f97eb022247f4dca791c52dfe 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -12144,6 +12144,12 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) -min_epilogue_iters to remove iterations that cannot be performed by the vector code. */ int bias_for_lowest = 1 - min_epilogue_iters; + /* For an early break we must always assume that the vector loop can be + executed partially. In this definition a partial iteration means that we + take an exit before the IV exit. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + bias_for_lowest = 1; + int bias_for_assumed = bias_for_lowest; int alignment_npeels = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); if (alignment_npeels && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))