From patchwork Wed Sep 24 15:02:11 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 392984 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 9D7A914009C for ; Thu, 25 Sep 2014 01:02:30 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=M8ARY0zHulIU+2w5j iGxa+9sdrKN1bTd0unlUTiyV6SE8RtngtEDQvT1H4fyCqEmXN2Zcwv77dmsh+i7K +fvqDAZJSnkeOhnJbn5dgV6E66M1RUyDgbvQxM7PowMp2Z19/E3pj4C6qCnWDrgg NhUzjIDd6qv7KAWRop6/ek2X+E= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=VyhmNVNwUhSYf3V8t+/m+om tgRg=; b=h9sGUU0sE79V3gurDexTRL0wxWOZ8MZvGjjzy04rDCNk/rXeFh1OKnJ aJo98Tnczl9qD/r3O2Wd8YgSyTbCyh4Bbn4Fiw8ULhHBvsq63sHPP8g7ByUpVvRB huOLtsJYsXKeYTGm8f9FseL8Go3/M1Kz3om/lJ4Ags1xVatQ8//4= Received: (qmail 27173 invoked by alias); 24 Sep 2014 15:02:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27164 invoked by uid 89); 24 Sep 2014 15:02:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 24 Sep 2014 15:02:18 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Wed, 24 Sep 2014 16:02:14 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 24 Sep 2014 16:02:12 +0100 Message-ID: <5422DCF3.9080206@arm.com> Date: Wed, 24 Sep 2014 16:02:11 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: Segher Boessenkool CC: "gcc-patches@gcc.gnu.org" , Richard Biener Subject: Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result References: <541AC4D2.9040901@arm.com> <541AC71C.5010702@arm.com> In-Reply-To: X-MC-Unique: 114092416021407101 X-IsSubscribed: yes So it looks like patches 1-6 (reduc_foo) are relatively close to final, and given these fix PR/61114, I'm gonna try to land these while working on a respin of the second half (vec_shr)...(summary: yes I like the vec_perm idea too, but the devil is in the detail!) However my CompileFarm account is still pending, so to that end, if you were able to test patch 2/14 (attached inc. Richie's s/VIEW_CONVERT_EXPR/NOP_EXPR/) on the CompileFarm PowerPC machine, that'd be great, many thanks indeed. It should apply on its own without patch 1. I'll aim to get an alternative patch 3 back to the list shortly, and follow up with .md updates to the various backends. Cheers, Alan Richard Biener wrote: > On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence wrote: >> This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. >> >> These are presently documented as producing a vector with the result in >> element 0, and this is inconsistent with their use in tree-vect-loop.c >> (which on bigendian targets pulls the bits out of the wrong end of the >> vector result). This leads to bugs on bigendian targets - see also >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114. >> >> I discounted "fixing" the vectorizer (to read from element 0) and then >> making bigendian targets (whose architectural insn produces the result in >> lane N-1) permute the result vector, as optimization of vectors in RTL seems >> unlikely to remove such a permute and would lead to a performance >> regression. >> >> Instead it seems more natural for the tree code to produce a scalar result >> (producing a vector with the result in lane 0 has already caused confusion, >> e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html). >> >> However, this patch preserves the meaning of the optab (producing a result >> in lane 0 on little-endian architectures or N-1 on bigendian), thus >> generally avoiding the need to change backends. Thus, expr.c extracts an >> endianness-dependent element from the optab result to give the result >> expected for the tree code. >> >> Previously posted as an RFC >> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra >> VIEW_CONVERT_EXPR if the types of the reduction/result do not match. > > Huh. Does that ever happen? Please use a NOP_EXPR instead of > a VIEW_CONVERT_EXPR. > > Ok with that change. > > Thanks, > Richard. > >> Testing: >> x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++ >> aarch64-none-linux-gnu: bootstrap >> aarch64-none-elf: check-gcc, check-g++ >> arm-none-eabi: check-gcc >> >> aarch64_be-none-elf: check-gcc, showing >> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test >> FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test >> Passes the (previously-failing) reduced testcase on >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 >> >> Have also assembler/stage-1 tested that testcase on PowerPC, also >> fixed. > >> gcc/ChangeLog: >> >> * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add >> extract_bit_field around optab result. >> >> * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, >> produce >> scalar not vector. >> >> * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand >> type >> for REDUC_{MIN,MAX,PLUS}_EXPR. >> >> * tree-vect-loop.c (vect_analyze_loop): Update comment. >> (vect_create_epilog_for_reduction): For direct vector reduction, use >> result of tree code directly without extract_bit_field. >> >> * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update >> comment. > commit a7b173d5efc6f08589b04fffeec9b3942b6282a0 Author: Alan Lawrence Date: Tue Jul 29 11:46:01 2014 +0100 Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess) diff --git a/gcc/expr.c b/gcc/expr.c index a6233f3..c792028 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, { op0 = expand_normal (treeop0); this_optab = optab_for_tree_code (code, type, optab_default); - temp = expand_unop (mode, this_optab, op0, target, unsignedp); + enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0)); + temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); + gcc_assert (temp); + /* The tree code produces a scalar result, but (somewhat by convention) + the optab produces a vector with the result in element 0 if + little-endian, or element N-1 if big-endian. So pull the scalar + result out of that element. */ + int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0; + int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode)); + temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp, + target, mode, mode); gcc_assert (temp); return temp; } diff --git a/gcc/fold-const.c b/gcc/fold-const.c index d1b59a1..1773585 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -8246,12 +8246,13 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0) case REDUC_MAX_EXPR: case REDUC_PLUS_EXPR: { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i; + unsigned int nelts, i; tree *elts; enum tree_code subcode; if (TREE_CODE (op0) != VECTOR_CST) return NULL_TREE; + nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)); elts = XALLOCAVEC (tree, nelts); if (!vec_cst_ctor_to_array (op0, elts)) @@ -8270,10 +8271,9 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0) elts[0] = const_binop (subcode, elts[0], elts[i]); if (elts[0] == NULL_TREE || !CONSTANT_CLASS_P (elts[0])) return NULL_TREE; - elts[i] = build_zero_cst (TREE_TYPE (type)); } - return build_vector (type, elts); + return elts[0]; } default: diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index e89d76a..6c6ff18 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3539,12 +3539,21 @@ verify_gimple_assign_unary (gimple stmt) return false; } - - case VEC_UNPACK_HI_EXPR: - case VEC_UNPACK_LO_EXPR: case REDUC_MAX_EXPR: case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: + if (!VECTOR_TYPE_P (rhs1_type) + || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type))) + { + error ("reduction should convert from vector to element type"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; + } + return false; + + case VEC_UNPACK_HI_EXPR: + case VEC_UNPACK_LO_EXPR: case VEC_UNPACK_FLOAT_HI_EXPR: case VEC_UNPACK_FLOAT_LO_EXPR: /* FIXME. */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index fd1166f..8d97e17 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1892,9 +1892,9 @@ vect_analyze_loop (struct loop *loop) Output: REDUC_CODE - the corresponding tree-code to be used to reduce the - vector of partial results into a single scalar result (which - will also reside in a vector) or ERROR_MARK if the operation is - a supported reduction operation, but does not have such tree-code. + vector of partial results into a single scalar result, or ERROR_MARK + if the operation is a supported reduction operation, but does not have + such a tree-code. Return FALSE if CODE currently cannot be vectorized as reduction. */ @@ -4168,6 +4168,7 @@ vect_create_epilog_for_reduction (vec vect_defs, gimple stmt, if (reduc_code != ERROR_MARK && !slp_reduc) { tree tmp; + tree vec_elem_type; /*** Case 1: Create: v_out2 = reduc_expr */ @@ -4176,14 +4177,26 @@ vect_create_epilog_for_reduction (vec vect_defs, gimple stmt, dump_printf_loc (MSG_NOTE, vect_location, "Reduce using direct vector reduction.\n"); - vec_dest = vect_create_destination_var (scalar_dest, vectype); - tmp = build1 (reduc_code, vectype, new_phi_result); - epilog_stmt = gimple_build_assign (vec_dest, tmp); - new_temp = make_ssa_name (vec_dest, epilog_stmt); + vec_elem_type = TREE_TYPE (TREE_TYPE (new_phi_result)); + if (!useless_type_conversion_p (scalar_type, vec_elem_type)) + { + tree tmp_dest = + vect_create_destination_var (scalar_dest, vec_elem_type); + tmp = build1 (reduc_code, vec_elem_type, new_phi_result); + epilog_stmt = gimple_build_assign (tmp_dest, tmp); + new_temp = make_ssa_name (tmp_dest, epilog_stmt); + gimple_assign_set_lhs (epilog_stmt, new_temp); + gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); + + tmp = build1 (NOP_EXPR, scalar_type, new_temp); + } + else + tmp = build1 (reduc_code, scalar_type, new_phi_result); + epilog_stmt = gimple_build_assign (new_scalar_dest, tmp); + new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); gimple_assign_set_lhs (epilog_stmt, new_temp); gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - - extract_scalar_result = true; + scalar_results.safe_push (new_temp); } else { diff --git a/gcc/tree.def b/gcc/tree.def index bd39e4b..c830e4b 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1161,10 +1161,9 @@ DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1) result (e.g. summing the elements of the vector, finding the minimum over the vector elements, etc). Operand 0 is a vector. - The expression returns a vector of the same type, with the first - element in the vector holding the result of the reduction of all elements - of the operand. The content of the other elements in the returned vector - is undefined. */ + The expression returns a scalar, with type the same as the elements of the + vector, holding the result of the reduction of all elements of the operand. + */ DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1) DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1) DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)