From patchwork Tue May 14 06:17:13 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 243613 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id F0AF22C009D for ; Tue, 14 May 2013 16:17:29 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=BXPuWMwaPRnUIWQvyqLx9F1U7mVsP vJ3qXNPleLVVovqR5PHbUOnhDXLJC5Oh/DBLHsZm7gOFCkhusOz66+aryDv7cioj 5hQwMBEviOE2uesf614ctorbn1HioyhPfPq/fOFig6tqlcz91I396DoCr8PsQVNW sJFHLtGhuvNeh8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=twJd+7b7fxd1fR982WV0MQxG0FY=; b=Qos kPwLU8D+k5xiYRt5u1N4emwvv1aMyK1xnefJsafAFuFqQf2aalv8KbMHc6RITV5X Ewl5sryZfPrR0ygy0eLMVjth2Vtdijw/cogSPThw+uWq+TwKdM6JhJomnYu64zMu x99nyqPd3h5ow0onv/bisf32azO0YIm2l+hoyuFk= Received: (qmail 29857 invoked by alias); 14 May 2013 06:17:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29847 invoked by uid 89); 14 May 2013 06:17:23 -0000 X-Spam-SWARE-Status: No, score=-6.7 required=5.0 tests=AWL, BAYES_00, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 14 May 2013 06:17:22 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r4E6HKtq017684 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 14 May 2013 02:17:20 -0400 Received: from zalov.cz (vpn-60-24.rdu2.redhat.com [10.10.60.24]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r4E6HID6014027 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 May 2013 02:17:19 -0400 Received: from zalov.cz (localhost [127.0.0.1]) by zalov.cz (8.14.5/8.14.5) with ESMTP id r4E6HGgj013696; Tue, 14 May 2013 08:17:17 +0200 Received: (from jakub@localhost) by zalov.cz (8.14.5/8.14.5/Submit) id r4E6HEBi013695; Tue, 14 May 2013 08:17:14 +0200 Date: Tue, 14 May 2013 08:17:13 +0200 From: Jakub Jelinek To: Richard Biener , Richard Henderson , Aldy Hernandez Cc: gcc-patches@gcc.gnu.org Subject: [gomp4] Basic vectorization enablement for #pragma omp simd Message-ID: <20130514061713.GI1377@tucnak.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Hi! This patch adds safelen field to struct loop, teaches expand_omp_simd to set it on the simd loops and then uses it in a few places: 1) because the loops are explicitly marked for vectorization by the user, we'll try to ifconvert them and vectorize even without -O3, -Ofast or -ftree-vectorize (but explicit -fno-tree-vectorize will still disable that behavior) 2) the data dependency analysis uses it to decide about unknown and bad data dependencies 3) unrolling is disabled for those loops, I think we don't want to unroll those loops until vectorization, and after vectorization we just clear the safelen, so that it can be unrolled afterwards In the end we'll want to do much more on the vectorizer side, handle calls to elemental functions, handle conditionalized calls to elemental functions, or even vectorize loops where some part of the loop isn't really vectorizable and needs to be sequential, but other parts of the loop are vectorizable. for (...) { vectorizable_bb; non-vectorizable_bb; vectorizable_bb; } can be turned into for (...) { vectorized_bb; for (temp = 0; temp < vf; temp++) non-vectorizable_bb; vectorized_bb; } etc. Does this look ok? 2013-05-14 Jakub Jelinek * cfgloop.h (struct loop): Add safelen field. * omp-low.c (expand_omp_simd): If !broken_loop, fix_loop_structure to create loop for the simd region and set safelen field. * tree-vectorizer.c (vectorize_loops): If loop has safelen set, vectorize it even if flag_vectorize isn't set. Clear loop->safelen after vectorization. * tree-ssa-loop.c (gate_tree_vectorize): Return true even for flag_openmp if -fno-tree-vectorize hasn't been specified. * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Don't unroll loops with non-NULL loop->safelen. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): For unknown or bad data dependency, if loop->safelen is non-NULL, just decrease *max_vf to loop->safelen if needed and return false. * tree-if-conv.c (main_tree_if_conversion): If-convert also loops with non-NULL loop->safelen. (gate_tree_if_conversion): Return true even for flag_openmp if -fno-tree-vectorize hasn't been specified. Jakub --- gcc/cfgloop.h.jj 2013-05-13 16:49:44.000000000 +0200 +++ gcc/cfgloop.h 2013-05-13 17:30:18.630883633 +0200 @@ -176,6 +176,12 @@ struct GTY ((chain_next ("%h.next"))) lo /* Number of iteration analysis data for RTL. */ struct niter_desc *simple_loop_desc; + + /* If non-NULL, an INTEGER_CST, where the user asserted that for any + I in [ 0, nb_iterations ) and for any J in + [ I, min ( I + safelen, nb_iterations ) ), the Ith and Jth iterations + of the loop can be safely evaluated concurrently. */ + tree safelen; }; /* Flags for state of loop structure. */ --- gcc/omp-low.c.jj 2013-05-13 16:37:05.000000000 +0200 +++ gcc/omp-low.c 2013-05-13 18:46:18.310405585 +0200 @@ -4960,6 +4960,8 @@ expand_omp_simd (struct omp_region *regi edge e, ne; tree *counts = NULL; int i; + tree safelen = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), + OMP_CLAUSE_SAFELEN); type = TREE_TYPE (fd->loop.v); entry_bb = region->entry; @@ -5157,6 +5159,22 @@ expand_omp_simd (struct omp_region *regi set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb); set_immediate_dominator (CDI_DOMINATORS, l2_bb, l1_bb); set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb); + + if (!broken_loop) + { + struct loop *loop; + calculate_dominance_info (CDI_DOMINATORS); + fix_loop_structure (NULL); + loop = l1_bb->loop_father; + if (safelen == NULL_TREE) + { + safelen = build_nonstandard_integer_type (TYPE_PRECISION (type), 1); + safelen = TYPE_MAX_VALUE (safelen); + } + else + safelen = OMP_CLAUSE_SAFELEN_EXPR (safelen); + loop->safelen = safelen; + } } --- gcc/tree-vectorizer.c.jj 2013-05-13 16:49:03.000000000 +0200 +++ gcc/tree-vectorizer.c 2013-05-13 20:44:58.721863725 +0200 @@ -101,7 +101,8 @@ vectorize_loops (void) than all previously defined loops. This fact allows us to run only over initial loops skipping newly generated ones. */ FOR_EACH_LOOP (li, loop, 0) - if (optimize_loop_nest_for_speed_p (loop)) + if ((flag_tree_vectorize && optimize_loop_nest_for_speed_p (loop)) + || loop->safelen) { loop_vec_info loop_vinfo; vect_location = find_loop_location (loop); @@ -122,6 +123,9 @@ vectorize_loops (void) LOC_FILE (vect_location), LOC_LINE (vect_location)); vect_transform_loop (loop_vinfo); num_vectorized_loops++; + /* Now that the loop has been vectorized, allow it to be unrolled + etc. */ + loop->safelen = NULL_TREE; } vect_location = UNKNOWN_LOC; --- gcc/tree-ssa-loop.c.jj 2013-05-13 16:46:36.000000000 +0200 +++ gcc/tree-ssa-loop.c 2013-05-13 19:12:57.301538324 +0200 @@ -225,7 +225,8 @@ tree_vectorize (void) static bool gate_tree_vectorize (void) { - return flag_tree_vectorize; + return flag_tree_vectorize + || (flag_openmp && !global_options_set.x_flag_tree_vectorize); } struct gimple_opt_pass pass_vectorize = --- gcc/tree-ssa-loop-ivcanon.c.jj 2013-05-13 16:46:36.000000000 +0200 +++ gcc/tree-ssa-loop-ivcanon.c 2013-05-13 20:06:44.176519188 +0200 @@ -1123,6 +1123,11 @@ tree_unroll_loops_completely_1 (bool may if (changed) return true; + /* Don't unroll #pragma omp simd loops until the vectorizer + attempts to vectorize those. */ + if (loop->safelen) + return false; + /* Try to unroll this loop. */ loop_father = loop_outer (loop); if (!loop_father) --- gcc/tree-vect-data-refs.c.jj 2013-05-13 16:49:08.000000000 +0200 +++ gcc/tree-vect-data-refs.c 2013-05-13 20:41:51.579889330 +0200 @@ -255,6 +255,16 @@ vect_analyze_data_ref_dependence (struct /* Unknown data dependence. */ if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know) { + /* If user asserted there safelen consecutive iterations can be + executed concurrently, and safelen >= *max_vf, assume + independence. */ + if (loop->safelen) + { + if (compare_tree_int (loop->safelen, *max_vf) < 0) + *max_vf = tree_low_cst (loop->safelen, 0); + return false; + } + if (STMT_VINFO_GATHER_P (stmtinfo_a) || STMT_VINFO_GATHER_P (stmtinfo_b)) { @@ -291,6 +301,16 @@ vect_analyze_data_ref_dependence (struct /* Known data dependence. */ if (DDR_NUM_DIST_VECTS (ddr) == 0) { + /* If user asserted there safelen consecutive iterations can be + executed concurrently, and safelen >= *max_vf, assume + independence. */ + if (loop->safelen) + { + if (compare_tree_int (loop->safelen, *max_vf) < 0) + *max_vf = tree_low_cst (loop->safelen, 0); + return false; + } + if (STMT_VINFO_GATHER_P (stmtinfo_a) || STMT_VINFO_GATHER_P (stmtinfo_b)) { --- gcc/tree-if-conv.c.jj 2013-05-13 16:49:06.000000000 +0200 +++ gcc/tree-if-conv.c 2013-05-13 19:08:27.227188600 +0200 @@ -1822,6 +1822,10 @@ main_tree_if_conversion (void) return 0; FOR_EACH_LOOP (li, loop, 0) + if (flag_tree_loop_if_convert == 1 + || flag_tree_loop_if_convert_stores == 1 + || flag_tree_vectorize + || loop->safelen) changed |= tree_if_conversion (loop); if (changed) @@ -1848,7 +1852,9 @@ main_tree_if_conversion (void) static bool gate_tree_if_conversion (void) { - return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0) + return (((flag_tree_vectorize + || (flag_openmp && !global_options_set.x_flag_tree_vectorize)) + && flag_tree_loop_if_convert != 0) || flag_tree_loop_if_convert == 1 || flag_tree_loop_if_convert_stores == 1); }