From patchwork Tue Oct 22 21:12:26 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Hou X-Patchwork-Id: 285497 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 647492C016B for ; Wed, 23 Oct 2013 08:12:48 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=FPjX/Z+HUYMOq0i3ai dxTLOMi9XIxYU5FjMJRkAZ79zR58WpAtq50uPbojGRH7dDP9bhwEHEzmugoR+Y0Z DDDWJG0o/Vo3cfUW5h2ATQ9gfeAgzeRDXVUOQe464l0mxXj9OMfRUIqeU8TBnBkY XAA/mlHkGeU0J5/8xtFdtA9Pw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=/h4LhtSPuaq3CFRB5MRnzIfv ydc=; b=if+FXgZRsJNoVimmpW5Nb9vinFILORlkS5q48kn3Ghe9earXOiPyteZC 9NZwet6tN9PA6wakZsDMhexS0bxx1qiJccjh6C+H5qmzhTvlTBJum3QDsO2qH69V BvIIPNjVFgjqCSfS2SxtFSVBI5OYlhqXAlKJZOuupH/6oEUSnKs= Received: (qmail 30948 invoked by alias); 22 Oct 2013 21:12:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 30881 invoked by uid 89); 22 Oct 2013 21:12:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ie0-f174.google.com Received: from mail-ie0-f174.google.com (HELO mail-ie0-f174.google.com) (209.85.223.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 22 Oct 2013 21:12:28 +0000 Received: by mail-ie0-f174.google.com with SMTP id qd12so2796683ieb.5 for ; Tue, 22 Oct 2013 14:12:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=vlL/o3gfGZIBU2TTW5VQZnYvbP260tElyhKvR+IyS44=; b=M7PpWl9WVCAQPchPfouePARAqRJe9oSUobgm1U1mVlhdn1XSb+tFlOR2GSEAlSyL2U JPsPfD1DWIpmENazJAJwVY82aAfIavWV76yIVbcQuKeb9MVX/VtY6FnIikQ/Y6Beksca jo22ziiQt8LdlBC1KWJHNX/XWCBxCaZU1vGVRXlQPN4JXtGPJFIXI3WgNzmiKKjurcCa B90noywppY5k/ZQQ+khieGronI1H91MHO6RpvK1++Nxer2vJVMws/8RBCKeck1tFCbS5 37hMAapIyxkrtvjj+rHQDWfMayPNQ1JknnFKzddaD4DCOTazGxbXI0d6DoYgtFwIBthx hL/w== X-Gm-Message-State: ALoCoQl84xUIsqZhJWsX0AnL6wa/U8ZZU0elJjQRtUHwwh/AaFPAJ9I2P7Nuyz9T/CAyTZH4iNiprGR6/Wvw4GyGJBgQ3mC/oarmeg7YKKZHQnOakrkLVUl1C9S+6sAm2DORNs+3p7utzGuQB2bSDBNYLOXewmbkyLoDZM3mFSXI5KN2fl48xDYuU/2jitgKiMIR//8bfGwPccXF0GSV7QWLtW6YsJJHyQ== MIME-Version: 1.0 X-Received: by 10.43.60.139 with SMTP id ws11mr3581382icb.12.1382476346247; Tue, 22 Oct 2013 14:12:26 -0700 (PDT) Received: by 10.64.236.37 with HTTP; Tue, 22 Oct 2013 14:12:26 -0700 (PDT) In-Reply-To: References: Date: Tue, 22 Oct 2013 14:12:26 -0700 Message-ID: Subject: Re: [PATCH] Reducing number of alias checks in vectorization. From: Cong Hou To: Richard Biener Cc: GCC Patches On Fri, Oct 18, 2013 at 6:09 AM, Richard Biener wrote: > On Tue, Oct 15, 2013 at 12:43 AM, Cong Hou wrote: >> Sorry for forgetting using plain-text mode. Resend it. >> >> >> ---------- Forwarded message ---------- >> From: Cong Hou >> Date: Mon, Oct 14, 2013 at 3:29 PM >> Subject: Re: [PATCH] Reducing number of alias checks in vectorization. >> To: Richard Biener , GCC Patches >> Cc: Jakub Jelinek >> >> >> I have made a new patch for this issue according to your comments. >> >> There are several modifications to my previous patch: >> >> >> 1. Remove the use of STL features such as vector and sort. Use GCC's >> vec and qsort instead. > > I think that using and thus std::pair and std::swap is ok. Including > of should be done via system.h though. > Unfortunately, std::swap is defined in . Can I still use it? >> 2. Comparisons between tree nodes are not based on their addresses any >> more. Use compare_tree() function instead. > > Ok. > >> 3. The function vect_create_cond_for_alias_checks() now returns the >> number of alias checks. If its second parameter cond_expr is NULL, >> then this function only calculate the number of alias checks after the >> merging and won't generate comparison expressions. > > We now perform the merging twice - that's easily avoided by recording > the merge result in loop_vinfo alongside may_alias_ddrs (which you > should be able to drop as they are already contained in the data > structures you build). > > Which means you can split the function and move the merging > part to vect_prune_runtime_alias_test_list. I have added one more field to loop_vec_info to store the dr pairs from which alias checks are built. And I also split the functionality of vect_create_cond_for_alias_checks() into two functions, so that we now perform the merging once instead of twice. > >> 4. The function vect_prune_runtime_alias_test_list() now uses >> vect_create_cond_for_alias_checks() to get the number of alias checks. >> >> >> The patch is attached as a text file. >> >> Please give me your comment on this patch. Thank you! > > + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0) > + || TREE_CODE (dr_a1->offset) != INTEGER_CST > + || TREE_CODE (dr_a2->offset) != INTEGER_CST) > + continue; > + > + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) - > + widest_int_cst_value (dr_a1->offset); > > use !host_integerp (dr_a1->offset) as check and then int_cst_value. Done. > > + if (diff <= vect_factor > + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST > + && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor) > + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST > + && TREE_CODE (dr_b1->seg_len) == INTEGER_CST > + && diff - widest_int_cst_value (dr_a1->seg_len) < > + widest_int_cst_value (dr_b1->seg_len))) > > can you add a comment above this why it is safe to combine under > this condition? You seem to combine offsets but not data-ref steps > for example. The segment length is computed quite optimistically > and you don't seem to adjust that. That is, generally the alias > check, if implemented as simple overlap, would need to intersect > [init, init + niter * step] intervals, losing some non-aliasing cases. > The current segment length is optimistically computed and thus > can recover some more cases, but you have to be careful with > merging to not break its implicit assumptions. I have changed the code and made it more clear as shown below. I still use the condition DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B But in case that SEGMENT_LENGTH_A or SEGMENT_LENGTH_B is not constant, we can still check the following two conditions: DR_A2->OFFSET - DR_A1->OFFSET < SEGMENT_LENGTH_B (if SEGMENT_LENGTH_A is not a constant). DR_A2->OFFSET - DR_A1->OFFSET - SEGMENT_LENGTH_A < VECT_FACTOR (if SEGMENT_LENGTH_B is not a constant, then its minimum value should be VECT_FACTOR). The corresponding code is: /* Now we check if the following condition is satisfied: DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However, SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we have to make a best estimation. We can get the minimum value of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, then either of the following two conditions can guarantee the one above: 1: DIFF <= MIN_SEG_LEN_B 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B */ HOST_WIDEST_INT min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ? widest_int_cst_value (dr_b1->seg_len) : vect_factor; if (diff <= min_seg_len_b || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST && diff - widest_int_cst_value (dr_a1->seg_len) < min_seg_len_b)) { ... } The new patch is pasted here again. Bootstrapping and regression tests are passed. thanks, Cong > > Thanks, > Richard. > >> >> Cong >> >> >> On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou wrote: >>> >>> Forget about this "aux" idea as the segment length for one data ref >>> can be different in different dr pairs. >>> >>> In my patch I created a struct as shown below: >>> >>> struct dr_addr_with_seg_len >>> { >>> data_reference *dr; >>> tree basic_addr; >>> tree offset; >>> tree seg_len; >>> }; >>> >>> >>> Note that basic_addr and offset can always obtained from dr, but we >>> need to store two segment lengths for each dr pair. It is improper to >>> add a field to data_dependence_relation as it is defined outside of >>> vectorizer. We can change the type (a new one combining >>> data_dependence_relation and segment length) of may_alias_ddrs in >>> loop_vec_info to include such information, but we have to add a new >>> type to tree-vectorizer.h which is only used in two places - still too >>> much. >>> >>> One possible solution is that we create a local struct as shown above >>> and a new function which returns the merged alias check information. >>> This function will be called twice: once during analysis phase and >>> once in transformation phase. Then we don't have to store the merged >>> alias check information during those two phases. The additional time >>> cost is minimal as there will not be too many data dependent dr pairs >>> in a loop. >>> >>> Any comment? >>> >>> >>> thanks, >>> Cong >>> >>> >>> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou wrote: >>> > I noticed that there is a "struct dataref_aux" defined in >>> > tree-vectorizer.h which is specific to the vectorizer pass and is >>> > stored in (void*)aux in "struct data_reference". Can we add one more >>> > field "segment_length" to dataref_aux so that we can pass this >>> > information for merging alias checks? Then we can avoid to modify or >>> > create other structures. >>> > >>> > >>> > thanks, >>> > Cong >>> > >>> > >>> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou wrote: >>> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener wrote: >>> >>> On Tue, 1 Oct 2013, Cong Hou wrote: >>> >>> >>> >>>> When alias exists between data refs in a loop, to vectorize it GCC >>> >>>> does loop versioning and adds runtime alias checks. Basically for each >>> >>>> pair of data refs with possible data dependence, there will be two >>> >>>> comparisons generated to make sure there is no aliasing between them >>> >>>> in each iteration of the vectorized loop. If there are many such data >>> >>>> refs pairs, the number of comparisons can be very large, which is a >>> >>>> big overhead. >>> >>>> >>> >>>> However, in some cases it is possible to reduce the number of those >>> >>>> comparisons. For example, for the following loop, we can detect that >>> >>>> b[0] and b[1] are two consecutive member accesses so that we can >>> >>>> combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into >>> >>>> checking a[0:100]&b[0:2]: >>> >>>> >>> >>>> void foo(int*a, int* b) >>> >>>> { >>> >>>> for (int i = 0; i < 100; ++i) >>> >>>> a[i] = b[0] + b[1]; >>> >>>> } >>> >>>> >>> >>>> Actually, the requirement of consecutive memory accesses is too >>> >>>> strict. For the following loop, we can still combine the alias checks >>> >>>> between a[0:100]&b[0] and a[0:100]&b[100]: >>> >>>> >>> >>>> void foo(int*a, int* b) >>> >>>> { >>> >>>> for (int i = 0; i < 100; ++i) >>> >>>> a[i] = b[0] + b[100]; >>> >>>> } >>> >>>> >>> >>>> This is because if b[0] is not in a[0:100] and b[100] is not in >>> >>>> a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need >>> >>>> to check a[0:100] and b[0:101] don't overlap. >>> >>>> >>> >>>> More generally, consider two pairs of data refs (a, b1) and (a, b2). >>> >>>> Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2; >>> >>>> offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and >>> >>>> b2, and segment_length_a, segment_length_b1, and segment_length_b2 are >>> >>>> segment length of a, b1, and b2. Then we can combine the two >>> >>>> comparisons into one if the following condition is satisfied: >>> >>>> >>> >>>> offset_b2- offset_b1 - segment_length_b1 < segment_length_a >>> >>>> >>> >>>> >>> >>>> This patch detects those combination opportunities to reduce the >>> >>>> number of alias checks. It is tested on an x86-64 machine. >>> >>> >>> >>> Apart from the other comments you got (to which I agree) the patch >>> >>> seems to do two things, namely also: >>> >>> >>> >>> + /* Extract load and store statements on pointers with zero-stride >>> >>> + accesses. */ >>> >>> + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) >>> >>> + { >>> >>> >>> >>> which I'd rather see in a separate patch (and done also when >>> >>> the loop doesn't require versioning for alias). >>> >>> >>> >> >>> >> >>> >> My mistake.. I am working on those two patches at the same time and >>> >> pasted that one also here by mistake. I will send another patch about >>> >> the "hoist" topic. >>> >> >>> >> >>> >>> Also combining the alias checks in vect_create_cond_for_alias_checks >>> >>> is nice but doesn't properly fix the use of the >>> >>> vect-max-version-for-alias-checks param which currently inhibits >>> >>> vectorization of the HIMENO benchmark by default (and make us look bad >>> >>> compared to LLVM). >>> >>> >>> >>> So I believe this merging should be done incrementally when >>> >>> we collect the DDRs we need to test in vect_mark_for_runtime_alias_test. >>> >>> >>> >> >>> >> >>> >> I agree that vect-max-version-for-alias-checks param should count the >>> >> number of checks after the merge. However, the struct >>> >> data_dependence_relation could not record the new information produced >>> >> by the merge. The new information I mentioned contains the new segment >>> >> length for comparisons. This length is calculated right in >>> >> vect_create_cond_for_alias_checks() function. Since >>> >> vect-max-version-for-alias-checks is used during analysis phase, shall >>> >> we move all those (get segment length for each data ref and merge >>> >> alias checks) from transformation to analysis phase? If we cannot >>> >> store the result properly (data_dependence_relation is not enough), >>> >> shall we do it twice in both phases? >>> >> >>> >> I also noticed a possible bug in the function vect_same_range_drs() >>> >> called by vect_prune_runtime_alias_test_list(). For the following code >>> >> I get two pairs of data refs after >>> >> vect_prune_runtime_alias_test_list(), but in >>> >> vect_create_cond_for_alias_checks() after detecting grouped accesses I >>> >> got two identical pairs of data refs. The consequence is two identical >>> >> alias checks are produced. >>> >> >>> >> >>> >> void yuv2yuyv_ref (int *d, int *src, int n) >>> >> { >>> >> char *dest = (char *)d; >>> >> int i; >>> >> >>> >> for(i=0;i>> >> dest[i*4 + 0] = (src[i*2 + 0])>>16; >>> >> dest[i*4 + 1] = (src[i*2 + 1])>>8; >>> >> dest[i*4 + 2] = (src[i*2 + 0])>>16; >>> >> dest[i*4 + 3] = (src[i*2 + 0])>>0; >>> >> } >>> >> } >>> >> >>> >> >>> >> I think the solution to this problem is changing >>> >> >>> >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) >>> >> == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j) >>> >> >>> >> into >>> >> >>> >> STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT >>> >> (vinfo_for_stmt (stmt_i)))) >>> >> == STMT_VINFO_DATA_REF (vinfo_for_stmt (GROUP_FIRST_ELEMENT >>> >> (vinfo_for_stmt (stmt_j))) >>> >> >>> >> >>> >> in function vect_same_range_drs(). What do you think about it? >>> >> >>> >> >>> >> thanks, >>> >> Cong >>> >> >>> >> >>> >> >>> >>> Thanks for working on this, >>> >>> Richard. >>> >>> >>> >>>> >>> >>>> thanks, >>> >>>> Cong >>> >>>> >>> >>>> >>> >>>> >>> >>>> Index: gcc/tree-vect-loop-manip.c >>> >>>> =================================================================== >>> >>>> --- gcc/tree-vect-loop-manip.c (revision 202662) >>> >>>> +++ gcc/tree-vect-loop-manip.c (working copy) >>> >>>> @@ -19,6 +19,10 @@ You should have received a copy of the G >>> >>>> along with GCC; see the file COPYING3. If not see >>> >>>> . */ >>> >>>> >>> >>>> +#include >>> >>>> +#include >>> >>>> +#include >>> >>>> + >>> >>>> #include "config.h" >>> >>>> #include "system.h" >>> >>>> #include "coretypes.h" >>> >>>> @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer >>> >>>> return segment_length; >>> >>>> } >>> >>>> >>> >>>> +namespace >>> >>>> +{ >>> >>>> + >>> >>>> +/* struct dr_addr_with_seg_len >>> >>>> + >>> >>>> + A struct storing information of a data reference, including the data >>> >>>> + ref itself, its basic address, the access offset and the segment length >>> >>>> + for aliasing checks. */ >>> >>>> + >>> >>>> +struct dr_addr_with_seg_len >>> >>>> +{ >>> >>>> + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) >>> >>>> + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} >>> >>>> + >>> >>>> + data_reference* dr; >>> >>>> + tree basic_addr; >>> >>>> + tree offset; >>> >>>> + tree seg_len; >>> >>>> +}; >>> >>>> + >>> >>>> +/* Operator == between two dr_addr_with_seg_len objects. >>> >>>> + >>> >>>> + This equality operator is used to make sure two data refs >>> >>>> + are the same one so that we will consider to combine the >>> >>>> + aliasing checks of those two pairs of data dependent data >>> >>>> + refs. */ >>> >>>> + >>> >>>> +bool operator == (const dr_addr_with_seg_len& d1, >>> >>>> + const dr_addr_with_seg_len& d2) >>> >>>> +{ >>> >>>> + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) >>> >>>> + && operand_equal_p (d1.offset, d2.offset, 0) >>> >>>> + && operand_equal_p (d1.seg_len, d2.seg_len, 0); >>> >>>> +} >>> >>>> + >>> >>>> +typedef std::pair >>> >>>> + dr_addr_with_seg_len_pair_t; >>> >>>> + >>> >>>> + >>> >>>> +/* Operator < between two dr_addr_with_seg_len_pair_t objects. >>> >>>> + >>> >>>> + This operator is used to sort objects of dr_addr_with_seg_len_pair_t >>> >>>> + so that we can combine aliasing checks during one scan. */ >>> >>>> + >>> >>>> +bool operator < (const dr_addr_with_seg_len_pair_t& p1, >>> >>>> + const dr_addr_with_seg_len_pair_t& p2) >>> >>>> +{ >>> >>>> + const dr_addr_with_seg_len& p11 = p1.first; >>> >>>> + const dr_addr_with_seg_len& p12 = p1.second; >>> >>>> + const dr_addr_with_seg_len& p21 = p2.first; >>> >>>> + const dr_addr_with_seg_len& p22 = p2.second; >>> >>>> + >>> >>>> + if (p11.basic_addr != p21.basic_addr) >>> >>>> + return p11.basic_addr < p21.basic_addr; >>> >>>> + if (p12.basic_addr != p22.basic_addr) >>> >>>> + return p12.basic_addr < p22.basic_addr; >>> >>>> + if (TREE_CODE (p11.offset) != INTEGER_CST >>> >>>> + || TREE_CODE (p21.offset) != INTEGER_CST) >>> >>>> + return p11.offset < p21.offset; >>> >>>> + if (int_cst_value (p11.offset) != int_cst_value (p21.offset)) >>> >>>> + return int_cst_value (p11.offset) < int_cst_value (p21.offset); >>> >>>> + if (TREE_CODE (p12.offset) != INTEGER_CST >>> >>>> + || TREE_CODE (p22.offset) != INTEGER_CST) >>> >>>> + return p12.offset < p22.offset; >>> >>>> + return int_cst_value (p12.offset) < int_cst_value (p22.offset); >>> >>>> +} >>> >>>> + >>> >>>> +} >>> >>>> >>> >>>> /* Function vect_create_cond_for_alias_checks. >>> >>>> >>> >>>> @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_ >>> >>>> if (may_alias_ddrs.is_empty ()) >>> >>>> return; >>> >>>> >>> >>>> + >>> >>>> + /* Basically, for each pair of dependent data refs store_ptr_0 >>> >>>> + and load_ptr_0, we create an expression: >>> >>>> + >>> >>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) >>> >>>> + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) >>> >>>> + >>> >>>> + for aliasing checks. However, in some cases we can decrease >>> >>>> + the number of checks by combining two checks into one. For >>> >>>> + example, suppose we have another pair of data refs store_ptr_0 >>> >>>> + and load_ptr_1, and if the following condition is satisfied: >>> >>>> + >>> >>>> + load_ptr_0 < load_ptr_1 && >>> >>>> + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 >>> >>>> + >>> >>>> + (this condition means, in each iteration of vectorized loop, >>> >>>> + the accessed memory of store_ptr_0 cannot be between the memory >>> >>>> + of load_ptr_0 and load_ptr_1.) >>> >>>> + >>> >>>> + we then can use only the following expression to finish the >>> >>>> + alising checks between store_ptr_0 & load_ptr_0 and >>> >>>> + store_ptr_0 & load_ptr_1: >>> >>>> + >>> >>>> + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) >>> >>>> + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) >>> >>>> + >>> >>>> + Note that we only consider that load_ptr_0 and load_ptr_1 have the >>> >>>> + same basic address. */ >>> >>>> + >>> >>>> + std::vector ddrs_with_seg_len; >>> >>>> + >>> >>>> + /* First, we collect all data ref pairs for aliasing checks. */ >>> >>>> + >>> >>>> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) >>> >>>> { >>> >>>> struct data_reference *dr_a, *dr_b; >>> >>>> gimple dr_group_first_a, dr_group_first_b; >>> >>>> - tree addr_base_a, addr_base_b; >>> >>>> tree segment_length_a, segment_length_b; >>> >>>> gimple stmt_a, stmt_b; >>> >>>> - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; >>> >>>> >>> >>>> dr_a = DDR_A (ddr); >>> >>>> stmt_a = DR_STMT (DDR_A (ddr)); >>> >>>> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); >>> >>>> if (dr_group_first_a) >>> >>>> - { >>> >>>> + { >>> >>>> stmt_a = dr_group_first_a; >>> >>>> dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); >>> >>>> } >>> >>>> @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_ >>> >>>> stmt_b = DR_STMT (DDR_B (ddr)); >>> >>>> dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); >>> >>>> if (dr_group_first_b) >>> >>>> - { >>> >>>> + { >>> >>>> stmt_b = dr_group_first_b; >>> >>>> dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); >>> >>>> } >>> >>>> >>> >>>> - addr_base_a >>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), >>> >>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), >>> >>>> - DR_INIT (dr_a))); >>> >>>> - addr_base_b >>> >>>> - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), >>> >>>> - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), >>> >>>> - DR_INIT (dr_b))); >>> >>>> - >>> >>>> if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) >>> >>>> length_factor = scalar_loop_iters; >>> >>>> else >>> >>>> @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_ >>> >>>> segment_length_a = vect_vfa_segment_size (dr_a, length_factor); >>> >>>> segment_length_b = vect_vfa_segment_size (dr_b, length_factor); >>> >>>> >>> >>>> + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair >>> >>>> + (dr_addr_with_seg_len >>> >>>> + (dr_a, DR_BASE_ADDRESS (dr_a), >>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), >>> >>>> + segment_length_a), >>> >>>> + dr_addr_with_seg_len >>> >>>> + (dr_b, DR_BASE_ADDRESS (dr_b), >>> >>>> + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), >>> >>>> + segment_length_b)); >>> >>>> + >>> >>>> + if (dr_with_seg_len_pair.first.basic_addr > >>> >>>> + dr_with_seg_len_pair.second.basic_addr) >>> >>>> + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); >>> >>>> + >>> >>>> + ddrs_with_seg_len.push_back (dr_with_seg_len_pair); >>> >>>> + } >>> >>>> + >>> >>>> + /* Second, we sort the collected data ref pairs so that we can scan >>> >>>> + them once to combine all possible aliasing checks. */ >>> >>>> + >>> >>>> + std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end()); >>> >>>> + >>> >>>> + /* Remove duplicate data ref pairs. */ >>> >>>> + ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(), >>> >>>> + ddrs_with_seg_len.end()), >>> >>>> + ddrs_with_seg_len.end()); >>> >>>> + >>> >>>> + /* We then scan the sorted dr pairs and check if we can combine >>> >>>> + alias checks of two neighbouring dr pairs. */ >>> >>>> + >>> >>>> + for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i) >>> >>>> + { >>> >>>> + dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first; >>> >>>> + dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second; >>> >>>> + dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first; >>> >>>> + dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second; >>> >>>> + >>> >>>> + if (dr_a1 == dr_a2) >>> >>>> + { >>> >>>> + if (dr_b1.basic_addr != dr_b2.basic_addr >>> >>>> + || TREE_CODE (dr_b1.offset) != INTEGER_CST >>> >>>> + || TREE_CODE (dr_b2.offset) != INTEGER_CST) >>> >>>> + continue; >>> >>>> + >>> >>>> + int diff = int_cst_value (dr_b2.offset) - >>> >>>> + int_cst_value (dr_b1.offset); >>> >>>> + >>> >>>> + gcc_assert (diff > 0); >>> >>>> + >>> >>>> + if (diff <= vect_factor >>> >>>> + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST >>> >>>> + && diff - int_cst_value (dr_b1.seg_len) < vect_factor) >>> >>>> + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST >>> >>>> + && TREE_CODE (dr_a1.seg_len) == INTEGER_CST >>> >>>> + && diff - int_cst_value (dr_b1.seg_len) < >>> >>>> + int_cst_value (dr_a1.seg_len))) >>> >>>> + { >>> >>>> + if (dump_enabled_p ()) >>> >>>> + { >>> >>>> + dump_printf_loc >>> >>>> + (MSG_NOTE, vect_location, >>> >>>> + "combining two runtime checks for data references "); >>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr)); >>> >>>> + dump_printf (MSG_NOTE, " and "); >>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr)); >>> >>>> + dump_printf (MSG_NOTE, "\n"); >>> >>>> + } >>> >>>> + >>> >>>> + dr_b1.seg_len = size_binop (PLUS_EXPR, >>> >>>> + dr_b2.seg_len, size_int (diff)); >>> >>>> + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); >>> >>>> + --i; >>> >>>> + } >>> >>>> + } >>> >>>> + else if (dr_b1 == dr_b2) >>> >>>> + { >>> >>>> + if (dr_a1.basic_addr != dr_a2.basic_addr >>> >>>> + || TREE_CODE (dr_a1.offset) != INTEGER_CST >>> >>>> + || TREE_CODE (dr_a2.offset) != INTEGER_CST) >>> >>>> + continue; >>> >>>> + >>> >>>> + int diff = int_cst_value (dr_a2.offset) - >>> >>>> + int_cst_value (dr_a1.offset); >>> >>>> + >>> >>>> + gcc_assert (diff > 0); >>> >>>> + >>> >>>> + if (diff <= vect_factor >>> >>>> + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST >>> >>>> + && diff - int_cst_value (dr_a1.seg_len) < vect_factor) >>> >>>> + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST >>> >>>> + && TREE_CODE (dr_b1.seg_len) == INTEGER_CST >>> >>>> + && diff - int_cst_value (dr_a1.seg_len) < >>> >>>> + int_cst_value (dr_b1.seg_len))) >>> >>>> + { >>> >>>> + if (dump_enabled_p ()) >>> >>>> + { >>> >>>> + dump_printf_loc >>> >>>> + (MSG_NOTE, vect_location, >>> >>>> + "combining two runtime checks for data references "); >>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr)); >>> >>>> + dump_printf (MSG_NOTE, " and "); >>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr)); >>> >>>> + dump_printf (MSG_NOTE, "\n"); >>> >>>> + } >>> >>>> + >>> >>>> + dr_a1.seg_len = size_binop (PLUS_EXPR, >>> >>>> + dr_a2.seg_len, size_int (diff)); >>> >>>> + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); >>> >>>> + --i; >>> >>>> + } >>> >>>> + } >>> >>>> + } >>> >>>> + >>> >>>> + for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i) >>> >>>> + { >>> >>>> + const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first; >>> >>>> + const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second; >>> >>>> + tree segment_length_a = dr_a.seg_len; >>> >>>> + tree segment_length_b = dr_b.seg_len; >>> >>>> + >>> >>>> + tree addr_base_a >>> >>>> + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); >>> >>>> + tree addr_base_b >>> >>>> + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); >>> >>>> + >>> >>>> if (dump_enabled_p ()) >>> >>>> { >>> >>>> dump_printf_loc (MSG_NOTE, vect_location, >>> >>>> - "create runtime check for data references "); >>> >>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); >>> >>>> + "create runtime check for data references "); >>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); >>> >>>> dump_printf (MSG_NOTE, " and "); >>> >>>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); >>> >>>> - dump_printf (MSG_NOTE, "\n"); >>> >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); >>> >>>> + dump_printf (MSG_NOTE, "\n"); >>> >>>> } >>> >>>> >>> >>>> - seg_a_min = addr_base_a; >>> >>>> - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); >>> >>>> - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) >>> >>>> + tree seg_a_min = addr_base_a; >>> >>>> + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); >>> >>>> + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) >>> >>>> seg_a_min = seg_a_max, seg_a_max = addr_base_a; >>> >>>> >>> >>>> - seg_b_min = addr_base_b; >>> >>>> - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); >>> >>>> - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) >>> >>>> + tree seg_b_min = addr_base_b; >>> >>>> + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); >>> >>>> + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) >>> >>>> seg_b_min = seg_b_max, seg_b_max = addr_base_b; >>> >>>> >>> >>>> part_cond_expr = >>> >>>> @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop >>> >>>> adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); >>> >>>> } >>> >>>> >>> >>>> + /* Extract load and store statements on pointers with zero-stride >>> >>>> + accesses. */ >>> >>>> + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) >>> >>>> + { >>> >>>> + >>> >>>> + /* In the loop body, we iterate each statement to check if it is a load >>> >>>> + or store. Then we check the DR_STEP of the data reference. If >>> >>>> + DR_STEP is zero, then we will hoist the load statement to the loop >>> >>>> + preheader, and move the store statement to the loop exit. */ >>> >>>> + >>> >>>> + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); >>> >>>> + !gsi_end_p (si); ) >>> >>>> + { >>> >>>> + gimple stmt = gsi_stmt (si); >>> >>>> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>> >>>> + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); >>> >>>> + >>> >>>> + >>> >>>> + if (dr && integer_zerop (DR_STEP (dr))) >>> >>>> + { >>> >>>> + if (DR_IS_READ (dr)) >>> >>>> + { >>> >>>> + if (dump_file) >>> >>>> + { >>> >>>> + fprintf (dump_file, >>> >>>> + "Hoist the load to outside of the loop:\n"); >>> >>>> + print_gimple_stmt (dump_file, stmt, 0, >>> >>>> + TDF_VOPS|TDF_MEMSYMS); >>> >>>> + } >>> >>>> + >>> >>>> + basic_block preheader = loop_preheader_edge (loop)->src; >>> >>>> + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); >>> >>>> + gsi_move_after (&si, &si_dst); >>> >>>> + } >>> >>>> + else >>> >>>> + { >>> >>>> + gimple_stmt_iterator si_dst = >>> >>>> + gsi_last_bb (single_exit (loop)->dest); >>> >>>> + gsi_move_after (&si, &si_dst); >>> >>>> + } >>> >>>> + continue; >>> >>>> + } >>> >>>> + else if (!dr) >>> >>>> + { >>> >>>> + bool hoist = true; >>> >>>> + for (size_t i = 0; i < gimple_num_ops (stmt); i++) >>> >>>> + { >>> >>>> + tree op = gimple_op (stmt, i); >>> >>>> + if (TREE_CODE (op) == INTEGER_CST >>> >>>> + || TREE_CODE (op) == REAL_CST) >>> >>>> + continue; >>> >>>> + if (TREE_CODE (op) == SSA_NAME) >>> >>>> + { >>> >>>> + gimple def = SSA_NAME_DEF_STMT (op); >>> >>>> + if (def == stmt >>> >>>> + || gimple_nop_p (def) >>> >>>> + || !flow_bb_inside_loop_p (loop, gimple_bb (def))) >>> >>>> + continue; >>> >>>> + } >>> >>>> + hoist = false; >>> >>>> + break; >>> >>>> + } >>> >>>> + >>> >>>> + if (hoist) >>> >>>> + { >>> >>>> + basic_block preheader = loop_preheader_edge (loop)->src; >>> >>>> + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); >>> >>>> + gsi_move_after (&si, &si_dst); >>> >>>> + continue; >>> >>>> + } >>> >>>> + } >>> >>>> + gsi_next (&si); >>> >>>> + } >>> >>>> + } >>> >>>> + >>> >>>> /* End loop-exit-fixes after versioning. */ >>> >>>> >>> >>>> if (cond_expr_stmt_list) >>> >>>> Index: gcc/ChangeLog >>> >>>> =================================================================== >>> >>>> --- gcc/ChangeLog (revision 202663) >>> >>>> +++ gcc/ChangeLog (working copy) >>> >>>> @@ -1,3 +1,8 @@ >>> >>>> +2013-10-01 Cong Hou >>> >>>> + >>> >>>> + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine >>> >>>> + alias checks if it is possible to amortize the runtime overhead. >>> >>>> + >>> >>>> >>> >>>> >>> >>> >>> >>> -- >>> >>> Richard Biener >>> >>> SUSE / SUSE Labs >>> >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>> >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..7333f6a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2013-10-14 Cong Hou + + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): + Combine alias checks if it is possible to amortize the runtime + overhead. Return the number of alias checks after merging. + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): + Use the function vect_create_cond_for_alias_checks () to check + the number of alias checks. + 2013-10-14 David Malcolm * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..ea2782f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-14 Cong Hou + + * gcc.dg/vect/vect-alias-check.c: New. + 2013-10-14 Tobias Burnus PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c new file mode 100644 index 0000000..bf9eb6b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize + --param=vect-max-version-for-alias-checks=2 + -fdump-tree-vect-details" } */ + +/* A test case showing three potential alias checks between + a[i] and b[i], b[i+7], b[i+14]. With alias checks merging + enabled, those tree checks can be merged into one, and the + loop will be vectorized with vect-max-version-for-alias-checks=2. */ + +void foo (int *a, int *b) +{ + int i; + for (i = 0; i < 1000; ++i) + a[i] = b[i] + b[i+7] + b[i+14]; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index e7b2f33..5f70d58 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt, HOST_WIDE_INT *lhs_size_unit, } -/* Check if data references pointed by DR_I and DR_J are same or - belong to same interleaving group. Return FALSE if drs are - different, otherwise return TRUE. */ - -static bool -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j) -{ - gimple stmt_i = DR_STMT (dr_i); - gimple stmt_j = DR_STMT (dr_j); - - if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0) - || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) - && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)) - && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))))) - return true; - else - return false; -} - -/* If address ranges represented by DDR_I and DDR_J are equal, - return TRUE, otherwise return FALSE. */ - -static bool -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j) -{ - if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j)) - && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j))) - || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j)) - && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j)))) - return true; - else - return false; -} - /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be tested at run-time. Return TRUE if DDR was successfully inserted. Return false if versioning is not supported. */ @@ -2647,69 +2612,331 @@ vect_analyze_data_ref_accesses (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) return true; } + +/* Operator == between two dr_addr_with_seg_len objects. + + This equality operator is used to make sure two data refs + are the same one so that we will consider to combine the + aliasing checks of those two pairs of data dependent data + refs. */ + +static bool +operator == (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) +{ + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) + && compare_tree (d1.offset, d2.offset) == 0 + && compare_tree (d1.seg_len, d2.seg_len) == 0; +} + +/* Function comp_dr_addr_with_seg_len_pair. + + Comparison function for sorting objects of dr_addr_with_seg_len_pair_t + so that we can combine aliasing checks in one scan. */ + +static int +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_) +{ + const dr_addr_with_seg_len_pair_t* p1 = + (const dr_addr_with_seg_len_pair_t *) p1_; + const dr_addr_with_seg_len_pair_t* p2 = + (const dr_addr_with_seg_len_pair_t *) p2_; + + const dr_addr_with_seg_len &p11 = p1->first, + &p12 = p1->second, + &p21 = p2->first, + &p22 = p2->second; + + int comp_res = compare_tree (p11.basic_addr, p21.basic_addr); + if (comp_res != 0) + return comp_res; + + comp_res = compare_tree (p12.basic_addr, p22.basic_addr); + if (comp_res != 0) + return comp_res; + + if (TREE_CODE (p11.offset) != INTEGER_CST + || TREE_CODE (p21.offset) != INTEGER_CST) + { + comp_res = compare_tree (p11.offset, p21.offset); + if (comp_res != 0) + return comp_res; + } + if (tree_int_cst_compare (p11.offset, p21.offset) < 0) + return -1; + if (tree_int_cst_compare (p11.offset, p21.offset) > 0) + return 1; + if (TREE_CODE (p12.offset) != INTEGER_CST + || TREE_CODE (p22.offset) != INTEGER_CST) + { + comp_res = compare_tree (p12.offset, p22.offset); + if (comp_res != 0) + return comp_res; + } + if (tree_int_cst_compare (p12.offset, p22.offset) < 0) + return -1; + if (tree_int_cst_compare (p12.offset, p22.offset) > 0) + return 1; + + return 0; +} + +template static void +swap (T& a, T& b) +{ + T c (a); + a = b; + b = c; +} + +/* Function vect_vfa_segment_size. + + Create an expression that computes the size of segment + that will be accessed for a data reference. The functions takes into + account that realignment loads may access one more vector. + + Input: + DR: The data reference. + LENGTH_FACTOR: segment length to consider. + + Return an expression whose value is the size of segment which will be + accessed by DR. */ + +static tree +vect_vfa_segment_size (struct data_reference *dr, tree length_factor) +{ + tree segment_length; + + if (integer_zerop (DR_STEP (dr))) + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); + else + segment_length = size_binop (MULT_EXPR, + fold_convert (sizetype, DR_STEP (dr)), + fold_convert (sizetype, length_factor)); + + if (vect_supportable_dr_alignment (dr, false) + == dr_explicit_realign_optimized) + { + tree vector_size = TYPE_SIZE_UNIT + (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); + + segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); + } + return segment_length; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. + Merge several alias checks into one if possible. Return FALSE if resulting list of ddrs is longer then allowed by PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE. */ bool vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) { - vec ddrs = + vec may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); - unsigned i, j; + vec& comp_alias_ddrs = + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); + int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); + + ddr_p ddr; + unsigned int i; + tree length_factor; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "=== vect_prune_runtime_alias_test_list ===\n"); - for (i = 0; i < ddrs.length (); ) + if (may_alias_ddrs.is_empty ()) + return true; + + /* Basically, for each pair of dependent data refs store_ptr_0 + and load_ptr_0, we create an expression: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) + + for aliasing checks. However, in some cases we can decrease + the number of checks by combining two checks into one. For + example, suppose we have another pair of data refs store_ptr_0 + and load_ptr_1, and if the following condition is satisfied: + + load_ptr_0 < load_ptr_1 && + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 + + (this condition means, in each iteration of vectorized loop, + the accessed memory of store_ptr_0 cannot be between the memory + of load_ptr_0 and load_ptr_1.) + + we then can use only the following expression to finish the + alising checks between store_ptr_0 & load_ptr_0 and + store_ptr_0 & load_ptr_1: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) + + Note that we only consider that load_ptr_0 and load_ptr_1 have the + same basic address. */ + + comp_alias_ddrs.create (may_alias_ddrs.length ()); + + /* First, we collect all data ref pairs for aliasing checks. */ + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { - bool found; - ddr_p ddr_i; + struct data_reference *dr_a, *dr_b; + gimple dr_group_first_a, dr_group_first_b; + tree segment_length_a, segment_length_b; + gimple stmt_a, stmt_b; + + dr_a = DDR_A (ddr); + stmt_a = DR_STMT (DDR_A (ddr)); + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); + if (dr_group_first_a) + { + stmt_a = dr_group_first_a; + dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); + } - ddr_i = ddrs[i]; - found = false; + dr_b = DDR_B (ddr); + stmt_b = DR_STMT (DDR_B (ddr)); + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); + if (dr_group_first_b) + { + stmt_b = dr_group_first_b; + dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); + } - for (j = 0; j < i; j++) - { - ddr_p ddr_j = ddrs[j]; + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) + length_factor = scalar_loop_iters; + else + length_factor = size_int (vect_factor); + segment_length_a = vect_vfa_segment_size (dr_a, length_factor); + segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair + (dr_addr_with_seg_len + (dr_a, DR_BASE_ADDRESS (dr_a), + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), + segment_length_a), + dr_addr_with_seg_len + (dr_b, DR_BASE_ADDRESS (dr_b), + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), + segment_length_b)); + + if (compare_tree (dr_with_seg_len_pair.first.basic_addr, + dr_with_seg_len_pair.second.basic_addr) > 0) + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); + + comp_alias_ddrs.safe_push (dr_with_seg_len_pair); + } - if (vect_vfa_range_equal (ddr_i, ddr_j)) + /* Second, we sort the collected data ref pairs so that we can scan + them once to combine all possible aliasing checks. */ + comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair); + + /* Third, we scan the sorted dr pairs and check if we can combine + alias checks of two neighbouring dr pairs. */ + for (size_t i = 1; i < comp_alias_ddrs.length (); ++i) + { + /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2). */ + dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first, + *dr_b1 = &comp_alias_ddrs[i-1].second, + *dr_a2 = &comp_alias_ddrs[i].first, + *dr_b2 = &comp_alias_ddrs[i].second; + + /* Remove duplicate data ref pairs. */ + if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "found equal ranges "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_a1->dr)); + dump_printf (MSG_NOTE, ", "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_b1->dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_a2->dr)); + dump_printf (MSG_NOTE, ", "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_b2->dr)); + dump_printf (MSG_NOTE, "\n"); + } + + comp_alias_ddrs.ordered_remove (i--); + continue; + } + + if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2) + { + /* We consider the case that DR_B1 and DR_B2 are same memrefs, + and DR_A1 and DR_A2 are two consecutive memrefs. */ + if (*dr_a1 == *dr_a2) + { + swap (dr_a1, dr_b1); + swap (dr_a2, dr_b2); + } + + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0) + || !host_integerp (dr_a1->offset, 0) + || !host_integerp (dr_a2->offset, 0)) + continue; + + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) - + widest_int_cst_value (dr_a1->offset); + + + /* Now we check if the following condition is satisfied: + + DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B + + where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However, + SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we + have to make a best estimation. We can get the minimum value + of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, + then either of the following two conditions can guarantee the + one above: + + 1: DIFF <= MIN_SEG_LEN_B + 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B + + */ + + HOST_WIDEST_INT + min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ? + widest_int_cst_value (dr_b1->seg_len) : + vect_factor; + + if (diff <= min_seg_len_b + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST + && diff - widest_int_cst_value (dr_a1->seg_len) < + min_seg_len_b)) { if (dump_enabled_p ()) { - dump_printf_loc (MSG_NOTE, vect_location, - "found equal ranges "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_A (ddr_i))); - dump_printf (MSG_NOTE, ", "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_B (ddr_i))); - dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_A (ddr_j))); - dump_printf (MSG_NOTE, ", "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_B (ddr_j))); + dump_printf_loc + (MSG_NOTE, vect_location, + "merging two runtime checks for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr)); dump_printf (MSG_NOTE, "\n"); } - found = true; - break; + + dr_a1->seg_len = size_binop (PLUS_EXPR, + dr_a2->seg_len, size_int (diff)); + comp_alias_ddrs.ordered_remove (i--); } } - - if (found) - { - ddrs.ordered_remove (i); - continue; - } - i++; } - if (ddrs.length () > - (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) + if ((int) comp_alias_ddrs.length () > + PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) { if (dump_enabled_p ()) { @@ -2718,8 +2945,6 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) "generated checks exceeded.\n"); } - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return false; } diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..1e03853 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks (loop_vec_info loop_vinfo, *cond_expr = part_cond_expr; } - -/* Function vect_vfa_segment_size. - - Create an expression that computes the size of segment - that will be accessed for a data reference. The functions takes into - account that realignment loads may access one more vector. - - Input: - DR: The data reference. - LENGTH_FACTOR: segment length to consider. - - Return an expression whose value is the size of segment which will be - accessed by DR. */ - -static tree -vect_vfa_segment_size (struct data_reference *dr, tree length_factor) -{ - tree segment_length; - - if (integer_zerop (DR_STEP (dr))) - segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); - else - segment_length = size_binop (MULT_EXPR, - fold_convert (sizetype, DR_STEP (dr)), - fold_convert (sizetype, length_factor)); - - if (vect_supportable_dr_alignment (dr, false) - == dr_explicit_realign_optimized) - { - tree vector_size = TYPE_SIZE_UNIT - (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); - - segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); - } - return segment_length; -} - - /* Function vect_create_cond_for_alias_checks. Create a conditional expression that represents the run-time checks for @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference *dr, tree length_factor) Input: COND_EXPR - input conditional expression. New conditions will be chained - with logical AND operation. + with logical AND operation. If it is NULL, then the function + is used to return the number of alias checks. LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs to be checked. Output: COND_EXPR - conditional expression. - The returned value is the conditional expression to be used in the if + The returned COND_EXPR is the conditional expression to be used in the if statement that controls which version of the loop gets executed at runtime. */ -static void +void vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) { - vec may_alias_ddrs = - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); - int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); - - ddr_p ddr; - unsigned int i; - tree part_cond_expr, length_factor; + vec comp_alias_ddrs = + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); + tree part_cond_expr; /* Create expression ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) ((store_ptr_n + store_segment_length_n) <= load_ptr_n) || (load_ptr_n + load_segment_length_n) <= store_ptr_n)) */ - if (may_alias_ddrs.is_empty ()) + if (comp_alias_ddrs.is_empty ()) return; - FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) + for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i) { - struct data_reference *dr_a, *dr_b; - gimple dr_group_first_a, dr_group_first_b; - tree addr_base_a, addr_base_b; - tree segment_length_a, segment_length_b; - gimple stmt_a, stmt_b; - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; - - dr_a = DDR_A (ddr); - stmt_a = DR_STMT (DDR_A (ddr)); - dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); - if (dr_group_first_a) - { - stmt_a = dr_group_first_a; - dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); - } - - dr_b = DDR_B (ddr); - stmt_b = DR_STMT (DDR_B (ddr)); - dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); - if (dr_group_first_b) - { - stmt_b = dr_group_first_b; - dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); - } + const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first; + const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second; + tree segment_length_a = dr_a.seg_len; + tree segment_length_b = dr_b.seg_len; - addr_base_a - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), - DR_INIT (dr_a))); - addr_base_b - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), - DR_INIT (dr_b))); - - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) - length_factor = scalar_loop_iters; - else - length_factor = size_int (vect_factor); - segment_length_a = vect_vfa_segment_size (dr_a, length_factor); - segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + tree addr_base_a + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); + tree addr_base_b + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, - "create runtime check for data references "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); + "create runtime check for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); - dump_printf (MSG_NOTE, "\n"); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); + dump_printf (MSG_NOTE, "\n"); } - seg_a_min = addr_base_a; - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) + tree seg_a_min = addr_base_a; + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) seg_a_min = seg_a_max, seg_a_max = addr_base_a; - seg_b_min = addr_base_b; - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) + tree seg_b_min = addr_base_b; + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) seg_b_min = seg_b_max, seg_b_max = addr_base_b; part_cond_expr = @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "created %u versioning for alias checks.\n", - may_alias_ddrs.length ()); + comp_alias_ddrs.length ()); + + comp_alias_ddrs.release (); } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 8b7b345..f231b62 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info +/* This struct is used to store the information of a data reference, + including the data ref itself, its basic address, the access offset + and the segment length for aliasing checks. This is used to generate + alias checks. */ + +struct dr_addr_with_seg_len +{ + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} + + data_reference *dr; + tree basic_addr; + tree offset; + tree seg_len; +}; + +/* This struct contains two dr_addr_with_seg_len objects with aliasing data + refs. Two comparisons are generated from them. */ + +struct dr_addr_with_seg_len_pair_t +{ + dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) + : first (d1), second (d2) {} + + dr_addr_with_seg_len first; + dr_addr_with_seg_len second; +}; + + typedef struct _vect_peel_info { int npeel; @@ -274,6 +304,10 @@ typedef struct _loop_vec_info { for a run-time aliasing check. */ vec may_alias_ddrs; + /* Data Dependence Relations defining address ranges together with segment + lengths from which the run-time aliasing check is built. */ + vec comp_alias_ddrs; + /* Statements in the loop that have data references that are candidates for a runtime (loop versioning) misalignment check. */ vec may_misalign_stmts; @@ -336,6 +370,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts #define LOOP_VINFO_LOC(L) (L)->loop_line_number #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs +#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..7333f6a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2013-10-14 Cong Hou + + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): + Combine alias checks if it is possible to amortize the runtime + overhead. Return the number of alias checks after merging. + * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): + Use the function vect_create_cond_for_alias_checks () to check + the number of alias checks. + 2013-10-14 David Malcolm * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..ea2782f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-14 Cong Hou + + * gcc.dg/vect/vect-alias-check.c: New. + 2013-10-14 Tobias Burnus PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c new file mode 100644 index 0000000..bf9eb6b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize + --param=vect-max-version-for-alias-checks=2 + -fdump-tree-vect-details" } */ + +/* A test case showing three potential alias checks between + a[i] and b[i], b[i+7], b[i+14]. With alias checks merging + enabled, those tree checks can be merged into one, and the + loop will be vectorized with vect-max-version-for-alias-checks=2. */ + +void foo (int *a, int *b) +{ + int i; + for (i = 0; i < 1000; ++i) + a[i] = b[i] + b[i+7] + b[i+14]; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index e7b2f33..5f70d58 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -128,41 +128,6 @@ vect_get_smallest_scalar_type (gimple stmt, HOST_WIDE_INT *lhs_size_unit, } -/* Check if data references pointed by DR_I and DR_J are same or - belong to same interleaving group. Return FALSE if drs are - different, otherwise return TRUE. */ - -static bool -vect_same_range_drs (data_reference_p dr_i, data_reference_p dr_j) -{ - gimple stmt_i = DR_STMT (dr_i); - gimple stmt_j = DR_STMT (dr_j); - - if (operand_equal_p (DR_REF (dr_i), DR_REF (dr_j), 0) - || (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) - && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j)) - && (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_i)) - == GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_j))))) - return true; - else - return false; -} - -/* If address ranges represented by DDR_I and DDR_J are equal, - return TRUE, otherwise return FALSE. */ - -static bool -vect_vfa_range_equal (ddr_p ddr_i, ddr_p ddr_j) -{ - if ((vect_same_range_drs (DDR_A (ddr_i), DDR_A (ddr_j)) - && vect_same_range_drs (DDR_B (ddr_i), DDR_B (ddr_j))) - || (vect_same_range_drs (DDR_A (ddr_i), DDR_B (ddr_j)) - && vect_same_range_drs (DDR_B (ddr_i), DDR_A (ddr_j)))) - return true; - else - return false; -} - /* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be tested at run-time. Return TRUE if DDR was successfully inserted. Return false if versioning is not supported. */ @@ -2647,69 +2612,331 @@ vect_analyze_data_ref_accesses (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) return true; } + +/* Operator == between two dr_addr_with_seg_len objects. + + This equality operator is used to make sure two data refs + are the same one so that we will consider to combine the + aliasing checks of those two pairs of data dependent data + refs. */ + +static bool +operator == (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) +{ + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) + && compare_tree (d1.offset, d2.offset) == 0 + && compare_tree (d1.seg_len, d2.seg_len) == 0; +} + +/* Function comp_dr_addr_with_seg_len_pair. + + Comparison function for sorting objects of dr_addr_with_seg_len_pair_t + so that we can combine aliasing checks in one scan. */ + +static int +comp_dr_addr_with_seg_len_pair (const void *p1_, const void *p2_) +{ + const dr_addr_with_seg_len_pair_t* p1 = + (const dr_addr_with_seg_len_pair_t *) p1_; + const dr_addr_with_seg_len_pair_t* p2 = + (const dr_addr_with_seg_len_pair_t *) p2_; + + const dr_addr_with_seg_len &p11 = p1->first, + &p12 = p1->second, + &p21 = p2->first, + &p22 = p2->second; + + int comp_res = compare_tree (p11.basic_addr, p21.basic_addr); + if (comp_res != 0) + return comp_res; + + comp_res = compare_tree (p12.basic_addr, p22.basic_addr); + if (comp_res != 0) + return comp_res; + + if (TREE_CODE (p11.offset) != INTEGER_CST + || TREE_CODE (p21.offset) != INTEGER_CST) + { + comp_res = compare_tree (p11.offset, p21.offset); + if (comp_res != 0) + return comp_res; + } + if (tree_int_cst_compare (p11.offset, p21.offset) < 0) + return -1; + if (tree_int_cst_compare (p11.offset, p21.offset) > 0) + return 1; + if (TREE_CODE (p12.offset) != INTEGER_CST + || TREE_CODE (p22.offset) != INTEGER_CST) + { + comp_res = compare_tree (p12.offset, p22.offset); + if (comp_res != 0) + return comp_res; + } + if (tree_int_cst_compare (p12.offset, p22.offset) < 0) + return -1; + if (tree_int_cst_compare (p12.offset, p22.offset) > 0) + return 1; + + return 0; +} + +template static void +swap (T& a, T& b) +{ + T c (a); + a = b; + b = c; +} + +/* Function vect_vfa_segment_size. + + Create an expression that computes the size of segment + that will be accessed for a data reference. The functions takes into + account that realignment loads may access one more vector. + + Input: + DR: The data reference. + LENGTH_FACTOR: segment length to consider. + + Return an expression whose value is the size of segment which will be + accessed by DR. */ + +static tree +vect_vfa_segment_size (struct data_reference *dr, tree length_factor) +{ + tree segment_length; + + if (integer_zerop (DR_STEP (dr))) + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); + else + segment_length = size_binop (MULT_EXPR, + fold_convert (sizetype, DR_STEP (dr)), + fold_convert (sizetype, length_factor)); + + if (vect_supportable_dr_alignment (dr, false) + == dr_explicit_realign_optimized) + { + tree vector_size = TYPE_SIZE_UNIT + (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); + + segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); + } + return segment_length; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. + Merge several alias checks into one if possible. Return FALSE if resulting list of ddrs is longer then allowed by PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, otherwise return TRUE. */ bool vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) { - vec ddrs = + vec may_alias_ddrs = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); - unsigned i, j; + vec& comp_alias_ddrs = + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); + int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); + + ddr_p ddr; + unsigned int i; + tree length_factor; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "=== vect_prune_runtime_alias_test_list ===\n"); - for (i = 0; i < ddrs.length (); ) + if (may_alias_ddrs.is_empty ()) + return true; + + /* Basically, for each pair of dependent data refs store_ptr_0 + and load_ptr_0, we create an expression: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) + + for aliasing checks. However, in some cases we can decrease + the number of checks by combining two checks into one. For + example, suppose we have another pair of data refs store_ptr_0 + and load_ptr_1, and if the following condition is satisfied: + + load_ptr_0 < load_ptr_1 && + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 + + (this condition means, in each iteration of vectorized loop, + the accessed memory of store_ptr_0 cannot be between the memory + of load_ptr_0 and load_ptr_1.) + + we then can use only the following expression to finish the + alising checks between store_ptr_0 & load_ptr_0 and + store_ptr_0 & load_ptr_1: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) + + Note that we only consider that load_ptr_0 and load_ptr_1 have the + same basic address. */ + + comp_alias_ddrs.create (may_alias_ddrs.length ()); + + /* First, we collect all data ref pairs for aliasing checks. */ + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { - bool found; - ddr_p ddr_i; + struct data_reference *dr_a, *dr_b; + gimple dr_group_first_a, dr_group_first_b; + tree segment_length_a, segment_length_b; + gimple stmt_a, stmt_b; + + dr_a = DDR_A (ddr); + stmt_a = DR_STMT (DDR_A (ddr)); + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); + if (dr_group_first_a) + { + stmt_a = dr_group_first_a; + dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); + } - ddr_i = ddrs[i]; - found = false; + dr_b = DDR_B (ddr); + stmt_b = DR_STMT (DDR_B (ddr)); + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); + if (dr_group_first_b) + { + stmt_b = dr_group_first_b; + dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); + } - for (j = 0; j < i; j++) - { - ddr_p ddr_j = ddrs[j]; + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) + length_factor = scalar_loop_iters; + else + length_factor = size_int (vect_factor); + segment_length_a = vect_vfa_segment_size (dr_a, length_factor); + segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair + (dr_addr_with_seg_len + (dr_a, DR_BASE_ADDRESS (dr_a), + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), + segment_length_a), + dr_addr_with_seg_len + (dr_b, DR_BASE_ADDRESS (dr_b), + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), + segment_length_b)); + + if (compare_tree (dr_with_seg_len_pair.first.basic_addr, + dr_with_seg_len_pair.second.basic_addr) > 0) + swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); + + comp_alias_ddrs.safe_push (dr_with_seg_len_pair); + } - if (vect_vfa_range_equal (ddr_i, ddr_j)) + /* Second, we sort the collected data ref pairs so that we can scan + them once to combine all possible aliasing checks. */ + comp_alias_ddrs.qsort (comp_dr_addr_with_seg_len_pair); + + /* Third, we scan the sorted dr pairs and check if we can combine + alias checks of two neighbouring dr pairs. */ + for (size_t i = 1; i < comp_alias_ddrs.length (); ++i) + { + /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2). */ + dr_addr_with_seg_len *dr_a1 = &comp_alias_ddrs[i-1].first, + *dr_b1 = &comp_alias_ddrs[i-1].second, + *dr_a2 = &comp_alias_ddrs[i].first, + *dr_b2 = &comp_alias_ddrs[i].second; + + /* Remove duplicate data ref pairs. */ + if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "found equal ranges "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_a1->dr)); + dump_printf (MSG_NOTE, ", "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_b1->dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_a2->dr)); + dump_printf (MSG_NOTE, ", "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, + DR_REF (dr_b2->dr)); + dump_printf (MSG_NOTE, "\n"); + } + + comp_alias_ddrs.ordered_remove (i--); + continue; + } + + if (*dr_a1 == *dr_a2 || *dr_b1 == *dr_b2) + { + /* We consider the case that DR_B1 and DR_B2 are same memrefs, + and DR_A1 and DR_A2 are two consecutive memrefs. */ + if (*dr_a1 == *dr_a2) + { + swap (dr_a1, dr_b1); + swap (dr_a2, dr_b2); + } + + if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0) + || !host_integerp (dr_a1->offset, 0) + || !host_integerp (dr_a2->offset, 0)) + continue; + + HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) - + widest_int_cst_value (dr_a1->offset); + + + /* Now we check if the following condition is satisfied: + + DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B + + where DIFF = DR_A2->OFFSET - DR_A1->OFFSET. However, + SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we + have to make a best estimation. We can get the minimum value + of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, + then either of the following two conditions can guarantee the + one above: + + 1: DIFF <= MIN_SEG_LEN_B + 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B + + */ + + HOST_WIDEST_INT + min_seg_len_b = (TREE_CODE (dr_b1->seg_len) == INTEGER_CST) ? + widest_int_cst_value (dr_b1->seg_len) : + vect_factor; + + if (diff <= min_seg_len_b + || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST + && diff - widest_int_cst_value (dr_a1->seg_len) < + min_seg_len_b)) { if (dump_enabled_p ()) { - dump_printf_loc (MSG_NOTE, vect_location, - "found equal ranges "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_A (ddr_i))); - dump_printf (MSG_NOTE, ", "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_B (ddr_i))); - dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_A (ddr_j))); - dump_printf (MSG_NOTE, ", "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, - DR_REF (DDR_B (ddr_j))); + dump_printf_loc + (MSG_NOTE, vect_location, + "merging two runtime checks for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr)); dump_printf (MSG_NOTE, "\n"); } - found = true; - break; + + dr_a1->seg_len = size_binop (PLUS_EXPR, + dr_a2->seg_len, size_int (diff)); + comp_alias_ddrs.ordered_remove (i--); } } - - if (found) - { - ddrs.ordered_remove (i); - continue; - } - i++; } - if (ddrs.length () > - (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) + if ((int) comp_alias_ddrs.length () > + PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) { if (dump_enabled_p ()) { @@ -2718,8 +2945,6 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) "generated checks exceeded.\n"); } - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return false; } diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..1e03853 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2211,44 +2211,6 @@ vect_create_cond_for_align_checks (loop_vec_info loop_vinfo, *cond_expr = part_cond_expr; } - -/* Function vect_vfa_segment_size. - - Create an expression that computes the size of segment - that will be accessed for a data reference. The functions takes into - account that realignment loads may access one more vector. - - Input: - DR: The data reference. - LENGTH_FACTOR: segment length to consider. - - Return an expression whose value is the size of segment which will be - accessed by DR. */ - -static tree -vect_vfa_segment_size (struct data_reference *dr, tree length_factor) -{ - tree segment_length; - - if (integer_zerop (DR_STEP (dr))) - segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); - else - segment_length = size_binop (MULT_EXPR, - fold_convert (sizetype, DR_STEP (dr)), - fold_convert (sizetype, length_factor)); - - if (vect_supportable_dr_alignment (dr, false) - == dr_explicit_realign_optimized) - { - tree vector_size = TYPE_SIZE_UNIT - (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)))); - - segment_length = size_binop (PLUS_EXPR, segment_length, vector_size); - } - return segment_length; -} - - /* Function vect_create_cond_for_alias_checks. Create a conditional expression that represents the run-time checks for @@ -2257,28 +2219,24 @@ vect_vfa_segment_size (struct data_reference *dr, tree length_factor) Input: COND_EXPR - input conditional expression. New conditions will be chained - with logical AND operation. + with logical AND operation. If it is NULL, then the function + is used to return the number of alias checks. LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs to be checked. Output: COND_EXPR - conditional expression. - The returned value is the conditional expression to be used in the if + The returned COND_EXPR is the conditional expression to be used in the if statement that controls which version of the loop gets executed at runtime. */ -static void +void vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) { - vec may_alias_ddrs = - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo); - int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); - - ddr_p ddr; - unsigned int i; - tree part_cond_expr, length_factor; + vec comp_alias_ddrs = + LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo); + tree part_cond_expr; /* Create expression ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) @@ -2289,70 +2247,39 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) ((store_ptr_n + store_segment_length_n) <= load_ptr_n) || (load_ptr_n + load_segment_length_n) <= store_ptr_n)) */ - if (may_alias_ddrs.is_empty ()) + if (comp_alias_ddrs.is_empty ()) return; - FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) + for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i) { - struct data_reference *dr_a, *dr_b; - gimple dr_group_first_a, dr_group_first_b; - tree addr_base_a, addr_base_b; - tree segment_length_a, segment_length_b; - gimple stmt_a, stmt_b; - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; - - dr_a = DDR_A (ddr); - stmt_a = DR_STMT (DDR_A (ddr)); - dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); - if (dr_group_first_a) - { - stmt_a = dr_group_first_a; - dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); - } - - dr_b = DDR_B (ddr); - stmt_b = DR_STMT (DDR_B (ddr)); - dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); - if (dr_group_first_b) - { - stmt_b = dr_group_first_b; - dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); - } + const dr_addr_with_seg_len& dr_a = comp_alias_ddrs[i].first; + const dr_addr_with_seg_len& dr_b = comp_alias_ddrs[i].second; + tree segment_length_a = dr_a.seg_len; + tree segment_length_b = dr_b.seg_len; - addr_base_a - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), - DR_INIT (dr_a))); - addr_base_b - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), - DR_INIT (dr_b))); - - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) - length_factor = scalar_loop_iters; - else - length_factor = size_int (vect_factor); - segment_length_a = vect_vfa_segment_size (dr_a, length_factor); - segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + tree addr_base_a + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); + tree addr_base_b + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, - "create runtime check for data references "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); + "create runtime check for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); - dump_printf (MSG_NOTE, "\n"); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); + dump_printf (MSG_NOTE, "\n"); } - seg_a_min = addr_base_a; - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) + tree seg_a_min = addr_base_a; + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) seg_a_min = seg_a_max, seg_a_max = addr_base_a; - seg_b_min = addr_base_b; - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) + tree seg_b_min = addr_base_b; + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) seg_b_min = seg_b_max, seg_b_max = addr_base_b; part_cond_expr = @@ -2370,7 +2297,9 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "created %u versioning for alias checks.\n", - may_alias_ddrs.length ()); + comp_alias_ddrs.length ()); + + comp_alias_ddrs.release (); } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 8b7b345..f231b62 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -175,6 +175,36 @@ typedef struct _slp_oprnd_info +/* This struct is used to store the information of a data reference, + including the data ref itself, its basic address, the access offset + and the segment length for aliasing checks. This is used to generate + alias checks. */ + +struct dr_addr_with_seg_len +{ + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} + + data_reference *dr; + tree basic_addr; + tree offset; + tree seg_len; +}; + +/* This struct contains two dr_addr_with_seg_len objects with aliasing data + refs. Two comparisons are generated from them. */ + +struct dr_addr_with_seg_len_pair_t +{ + dr_addr_with_seg_len_pair_t (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) + : first (d1), second (d2) {} + + dr_addr_with_seg_len first; + dr_addr_with_seg_len second; +}; + + typedef struct _vect_peel_info { int npeel; @@ -274,6 +304,10 @@ typedef struct _loop_vec_info { for a run-time aliasing check. */ vec may_alias_ddrs; + /* Data Dependence Relations defining address ranges together with segment + lengths from which the run-time aliasing check is built. */ + vec comp_alias_ddrs; + /* Statements in the loop that have data references that are candidates for a runtime (loop versioning) misalignment check. */ vec may_misalign_stmts; @@ -336,6 +370,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts #define LOOP_VINFO_LOC(L) (L)->loop_line_number #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs +#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor