[RFC,vect] PR 91246: Prototype for vectorization of loops with breaks

Hi all,

This is a prototype I have put together to look at the feasibility and 
profitability of vectorizing loops with breaks as suggested in PR 91246. 
I am posting this here for comments as I am curious what your opinions 
are on my approach.  I don't expect much attention to this in stage 4, 
but I wanted to get it out now before I forget about it.

The idea is that during ifcvt it checks whether it can safely transform 
the break away, version the loop and in the "to vectorize" version 
replace it with a hopefully a reduceable comparison and MIN_EXPR. 
Currently I have limited the cases it can transform to loops that meet 
all of the following conditions:
- is a inner-loop with one break point,
- the break condition is an EQ_EXPR or NE_EXPR of integers,
- the entire loop is vectorize, i.e. it has no scalar epilogue,
- the loop has no writes to memory and no variable escapes the loop,
- all memory accesses are valid (i.e. within bounds) even if the loop 
doesn't break early.

To check the last condition I copied and modified the checks used to 
warn for out of bounds accesses.  I modified these checks such that they 
are more conservative, we want to reject the transformation if we can't 
guarantee the accesses are within bounds for any possible range. I have 
noticed these range checks aren't very good yet, I am hoping that as 
project ranger evolves this will get easier. However, I am also worried 
that if they start to take control flow into consideration and 
"understand" the break, they will use it to reason that with the break 
in place the range of loop induction variables are limited. This 
obviously makes it unusable for us to check whether removing the break 
will cause out of bounds accesses.  Ideally the new ranger would allow 
us to "ignore" certain expressions in the range calculation. All this 
was with targets in mind that do not support masking, for targets with 
the ability to mask vector operations we could do without such analysis, 
as we may be able to prevent memory accesses taking place after the 
break condition has been met.

In trying to improve performance of this transformation I made a small 
change in the vectorizer, such that if we are dealing with such loops 
with breaks the ifcvt pass stores the variable holding the break 
condition in the loop struct under 'early_break_cond'. The vectorizer 
can then use this variable to check whether we have hit the break 
condition every iteration, preventing us to continue to go through the 
loop if we know we can stop early. This will benefit cases where we 
would break early, on the other hand if we break late, then we now 
introduced extra checks within the loop... However there is no way for 
us to know, one small improvement would be to use the known iteration 
count to decide whether or not to add these checks, one could argue that 
if we know the number of iterations to be small there is little point in 
adding these.  Thoughts welcome...

Another issue I encountered was that the early loop unroller often gets 
in the way of things. For this reason I tried to teach it not to unroll 
inner loops which we can remove breaks from. Unfortunately the early 
loop unroller is performed before some other simplification 
transformations and loop canonicalisation, I have seen cases where these 
transformations were crucial in making our memory access analysis accept 
loops. I suggest disabling the early loop unroller to check whether this 
pass is able to handle certain loops, i.e. pass 
'-fdisable-tree-cunrolli', you can even go as far as use 
-fdisable-tree-cunrolli=<function_name> for more accurate 
benchmarking/analysis.

I made some changes to the testcase on the PR ticket such that the 
compiler has enough information to determine all memory accesses are 
within bounds, see testcase below. This is one of those cases that 
requires '-fdisable-tree-cunrolli'.

#include <stdbool.h>
#define SIZE 8
int s, sum;

int *g_board;
int *g_moves;

static void f(int *moves, int *board, int cnt, int lastmove, int ko)
{
     for (int i = 0; i < cnt; i++) {
         if (moves[i] == ko) continue;

         bool found = false;
         for (int j = 0; j < SIZE; j++) {
             if ((lastmove + board[j]) == moves[j]) {
                 found = true;
                 break;
             }
         }

         if (!found) {
             s *= 20;
         }

         if (s >= 40) {
             sum += s;
         }
     }
}

void foo(int cnt, int lastmove, int ko)
{
   int board[SIZE];
   int moves[SIZE];
   for (int i = 0; i < SIZE; ++i)
     {
       board[i] = g_board[i];
       moves[i] = g_moves[i];
     }

   f (moves, board, cnt, lastmove, ko);
}

I have not found this transformation to have a big impact on performance 
so far.

Kind Regards,
Andre

Message ID	aca8e145-1522-514b-451b-8c3a85dbe329@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48dNTH3P2Bz9sNg for <incoming@patchwork.ozlabs.org>; Thu, 12 Mar 2020 20:15:13 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 74A883942014; Thu, 12 Mar 2020 09:15:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 8E82B3899435 for <gcc-patches@gcc.gnu.org>; Thu, 12 Mar 2020 09:15:06 +0000 (GMT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 35F321FB for <gcc-patches@gcc.gnu.org>; Thu, 12 Mar 2020 02:15:06 -0700 (PDT) Received: from [10.2.78.48] (e107157-lin.cambridge.arm.com [10.2.78.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B99623F67D for <gcc-patches@gcc.gnu.org>; Thu, 12 Mar 2020 02:15:05 -0700 (PDT) To: gcc-patches <gcc-patches@gcc.gnu.org> From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> Subject: [RFC][gcc][vect] PR 91246: Prototype for vectorization of loops with breaks Message-ID: <aca8e145-1522-514b-451b-8c3a85dbe329@arm.com> Date: Thu, 12 Mar 2020 09:15:03 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------40A5054770294C005676605A" Content-Language: en-US X-Spam-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <http://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	[RFC,vect] PR 91246: Prototype for vectorization of loops with breaks \| expand [RFC,vect] PR 91246: Prototype for vectorization of loops with breaks

[RFC,vect] PR 91246: Prototype for vectorization of loops with breaks

Commit Message

Comments

Patch