From patchwork Tue Jul 10 08:22:50 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 170081 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id A5D562C0081 for ; Tue, 10 Jul 2012 18:25:54 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1342513555; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:From:To:Subject:Date:Message-Id:In-Reply-To:References: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=oEg62iorOBqfGXLlfGbj GJ6+okw=; b=I57JqZRxfAWydzkId9ti12LGhU9TlNHGE83VcWBvC60s9B5ICusX qkMmkjW7a3+E0Twz3BXkEE4SrYFHrqozNpvjw1vrzEmbQzFYu0LnFToCAoRBtHqB irvTAVc4YOC4dzeCRiVJkU5joadE7zoKiWxoNkCUUmqQQg5zwCAgo5s= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:From:To:Subject:Date:Message-Id:In-Reply-To:References:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=EzGFw/g9trsLcnAVzzOCH2tFNoJz7EZDiCFKtn0izol1tauR4L39Dvq6tB5OI0 K0s8V/nk//PGC9x7remFuKOqHOqfWf9w8Wcx7SzolLlhbkVGaLxPrj2oXGzztTyI 8kRj5ezOG68fETGQNy1JqCR8h/qly8Y6/OLyDC9XTEFMg=; Received: (qmail 32314 invoked by alias); 10 Jul 2012 08:23:39 -0000 Received: (qmail 32087 invoked by uid 22791); 10 Jul 2012 08:23:32 -0000 X-SWARE-Spam-Status: No, hits=-5.4 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, KHOP_RCVD_TRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_TM X-Spam-Check-By: sourceware.org Received: from mail-wg0-f51.google.com (HELO mail-wg0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 10 Jul 2012 08:23:09 +0000 Received: by wgbed3 with SMTP id ed3so254362wgb.8 for ; Tue, 10 Jul 2012 01:23:08 -0700 (PDT) Received: by 10.180.81.165 with SMTP id b5mr36091849wiy.17.1341908588572; Tue, 10 Jul 2012 01:23:08 -0700 (PDT) Received: from pebble.cz (vpn-konference.ms.mff.cuni.cz. [195.113.20.101]) by mx.google.com with ESMTPS id ch9sm41442328wib.8.2012.07.10.01.23.07 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 Jul 2012 01:23:07 -0700 (PDT) From: Richard Henderson To: gcc-patches@gcc.gnu.org Subject: [PATCH 6/7] Use VEC_WIDEN_MULT_EVEN/ODD_EXPR in supportable_widening_operation Date: Tue, 10 Jul 2012 10:22:50 +0200 Message-Id: <1341908571-30346-7-git-send-email-rth@redhat.com> In-Reply-To: <1341908571-30346-1-git-send-email-rth@redhat.com> References: <1341908571-30346-1-git-send-email-rth@redhat.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org * tree-vect-stmts.c (supportable_widening_operation): Expand WIDEN_MULT_EXPR via VEC_WIDEN_MULT_EVEN/ODD_EXPR if possible. --- gcc/ChangeLog | 3 ++ gcc/tree-vect-stmts.c | 96 +++++++++++++++++++++++++------------------------ 2 files changed, 53 insertions(+), 46 deletions(-) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 9caf1c6..fe6a997 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -6199,7 +6199,8 @@ vect_is_simple_use_1 (tree operand, gimple stmt, loop_vec_info loop_vinfo, bool supportable_widening_operation (enum tree_code code, gimple stmt, tree vectype_out, tree vectype_in, - tree *decl1, tree *decl2, + tree *decl1 ATTRIBUTE_UNUSED, + tree *decl2 ATTRIBUTE_UNUSED, enum tree_code *code1, enum tree_code *code2, int *multi_step_cvt, VEC (tree, heap) **interm_types) @@ -6207,7 +6208,6 @@ supportable_widening_operation (enum tree_code code, gimple stmt, stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info); struct loop *vect_loop = NULL; - bool ordered_p; enum machine_mode vec_mode; enum insn_code icode1, icode2; optab optab1, optab2; @@ -6223,56 +6223,60 @@ supportable_widening_operation (enum tree_code code, gimple stmt, if (loop_info) vect_loop = LOOP_VINFO_LOOP (loop_info); - /* The result of a vectorized widening operation usually requires two vectors - (because the widened results do not fit into one vector). The generated - vector results would normally be expected to be generated in the same - order as in the original scalar computation, i.e. if 8 results are - generated in each vector iteration, they are to be organized as follows: - vect1: [res1,res2,res3,res4], vect2: [res5,res6,res7,res8]. - - However, in the special case that the result of the widening operation is - used in a reduction computation only, the order doesn't matter (because - when vectorizing a reduction we change the order of the computation). - Some targets can take advantage of this and generate more efficient code. - For example, targets like Altivec, that support widen_mult using a sequence - of {mult_even,mult_odd} generate the following vectors: - vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8]. - - When vectorizing outer-loops, we execute the inner-loop sequentially - (each vectorized inner-loop iteration contributes to VF outer-loop - iterations in parallel). We therefore don't allow to change the order - of the computation in the inner-loop during outer-loop vectorization. */ - - if (vect_loop - && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction - && !nested_in_vect_loop_p (vect_loop, stmt)) - ordered_p = false; - else - ordered_p = true; - - if (!ordered_p - && code == WIDEN_MULT_EXPR - && targetm.vectorize.builtin_mul_widen_even - && targetm.vectorize.builtin_mul_widen_even (vectype) - && targetm.vectorize.builtin_mul_widen_odd - && targetm.vectorize.builtin_mul_widen_odd (vectype)) - { - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, "Unordered widening operation detected."); - - *code1 = *code2 = CALL_EXPR; - *decl1 = targetm.vectorize.builtin_mul_widen_even (vectype); - *decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype); - return true; - } - switch (code) { case WIDEN_MULT_EXPR: + /* The result of a vectorized widening operation usually requires + two vectors (because the widened results do not fit into one vector). + The generated vector results would normally be expected to be + generated in the same order as in the original scalar computation, + i.e. if 8 results are generated in each vector iteration, they are + to be organized as follows: + vect1: [res1,res2,res3,res4], + vect2: [res5,res6,res7,res8]. + + However, in the special case that the result of the widening + operation is used in a reduction computation only, the order doesn't + matter (because when vectorizing a reduction we change the order of + the computation). Some targets can take advantage of this and + generate more efficient code. For example, targets like Altivec, + that support widen_mult using a sequence of {mult_even,mult_odd} + generate the following vectors: + vect1: [res1,res3,res5,res7], + vect2: [res2,res4,res6,res8]. + + When vectorizing outer-loops, we execute the inner-loop sequentially + (each vectorized inner-loop iteration contributes to VF outer-loop + iterations in parallel). We therefore don't allow to change the + order of the computation in the inner-loop during outer-loop + vectorization. */ + /* TODO: Another case in which order doesn't *really* matter is when we + widen and then contract again, e.g. (short)((int)x * y >> 8). + Normally, pack_trunc performs an even/odd permute, whereas the + repack from an even/odd expansion would be an interleave, which + would be significantly simpler for e.g. AVX2. */ + /* In any case, in order to avoid duplicating the code below, recurse + on VEC_WIDEN_MULT_EVEN_EXPR. If it succeeds, all the return values + are properly set up for the caller. If we fail, we'll continue with + a VEC_WIDEN_MULT_LO/HI_EXPR check. */ + if (vect_loop + && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction + && !nested_in_vect_loop_p (vect_loop, stmt) + && supportable_widening_operation (VEC_WIDEN_MULT_EVEN_EXPR, + stmt, vectype_out, vectype_in, + NULL, NULL, code1, code2, + multi_step_cvt, interm_types)) + return true; c1 = VEC_WIDEN_MULT_LO_EXPR; c2 = VEC_WIDEN_MULT_HI_EXPR; break; + case VEC_WIDEN_MULT_EVEN_EXPR: + /* Support the recursion induced just above. */ + c1 = VEC_WIDEN_MULT_EVEN_EXPR; + c2 = VEC_WIDEN_MULT_ODD_EXPR; + break; + case WIDEN_LSHIFT_EXPR: c1 = VEC_WIDEN_LSHIFT_LO_EXPR; c2 = VEC_WIDEN_LSHIFT_HI_EXPR; @@ -6298,7 +6302,7 @@ supportable_widening_operation (enum tree_code code, gimple stmt, gcc_unreachable (); } - if (BYTES_BIG_ENDIAN) + if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR) { enum tree_code ctmp = c1; c1 = c2;