From patchwork Fri Dec 4 12:18:19 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 552670 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 7AA9814029E for ; Fri, 4 Dec 2015 23:18:37 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=WY8NB6aC; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=HwSr3WvaGj1lGDOpLi sUAiiNdfODNOlW9XN2PpYU9Ze7q8M/Ex3jxP+qIj+7W5re8QUkqNTc3/+EM+EWh2 pl455ZCqPxO1eFFeliP247LRcJuM8tUKlSqy1zVve+ws+D7UxwT30RfihDA/Bnup +S4n9bfZIl6TKcTwg9vKn67jQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=5+3xK3Wt12DZZAGPyH8IbhsG lkU=; b=WY8NB6aC2j7TwtlD0vb9RkKHmCcxiLso+2+eJbA5IyC4GF4bv99kT9YZ DZsf/S1ECzlP5gNvWx4xCqy+FBrWusT4hPMfqBT6HJLSQSMLVtpSrvCi0nAGwxCq U3WEoNwFwrivGtbzGaPrjMazdh1a0cGwTcUeNNS259pp1JVCuKU= Received: (qmail 89077 invoked by alias); 4 Dec 2015 12:18:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88968 invoked by uid 89); 4 Dec 2015 12:18:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wm0-f45.google.com Received: from mail-wm0-f45.google.com (HELO mail-wm0-f45.google.com) (74.125.82.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 04 Dec 2015 12:18:22 +0000 Received: by wmec201 with SMTP id c201so71467418wme.0 for ; Fri, 04 Dec 2015 04:18:19 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.28.50.193 with SMTP id y184mr4994032wmy.26.1449231499624; Fri, 04 Dec 2015 04:18:19 -0800 (PST) Received: by 10.28.221.196 with HTTP; Fri, 4 Dec 2015 04:18:19 -0800 (PST) In-Reply-To: References: Date: Fri, 4 Dec 2015 13:18:19 +0100 Message-ID: Subject: Re: [PATCH PR68542] From: Richard Biener To: Yuri Rumyantsev Cc: gcc-patches , Igor Zamyatin , Kirill Yukhin X-IsSubscribed: yes On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev wrote: > Hi All, > > Here is a patch for 481.wrf preformance regression for avx2 which is > sligthly modified mask store optimization. This transformation allows > perform unpredication for semi-hammock containing masked stores, other > words if we have a loop like > for (i=0; i if (c[i]) { > p1[i] += 1; > p2[i] = p3[i] +2; > } > > then it will be transformed to > if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) { > vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165); > vect__12.22_172 = vect__11.19_170 + vect_cst__171; > MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172); > vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165); > vect__19.28_184 = vect__18.25_182 + vect_cst__183; > MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184); > } > i.e. it will put all computations related to masked stores to semi-hammock. > > Bootstrapping and regression testing did not show any new failures. Can you please split out the middle-end support for vector equality compares? @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1) if (TREE_CODE (op0_type) == VECTOR_TYPE || TREE_CODE (op1_type) == VECTOR_TYPE) { - error ("vector comparison returning a boolean"); - debug_generic_expr (op0_type); - debug_generic_expr (op1_type); - return true; + /* Allow vector comparison returning boolean if operand types + are equal and CODE is EQ/NE. */ + if ((code != EQ_EXPR && code != NE_EXPR) + || !(VECTOR_BOOLEAN_TYPE_P (op0_type) + || VECTOR_INTEGER_TYPE_P (op0_type))) + { + error ("type mismatch for vector comparison returning a boolean"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } } } please merge the conditions with a && @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1) if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST) { + if (INTEGRAL_TYPE_P (type) + && (TREE_CODE (type) == BOOLEAN_TYPE + || TYPE_PRECISION (type) == 1)) + { + /* Have vector comparison with scalar boolean result. */ + bool result = true; + gcc_assert (code == EQ_EXPR || code == NE_EXPR); + gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1)); + for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++) + { + tree elem0 = VECTOR_CST_ELT (op0, i); + tree elem1 = VECTOR_CST_ELT (op1, i); + tree tmp = fold_relational_const (code, type, elem0, elem1); + result &= integer_onep (tmp); + if (code == NE_EXPR) + result = !result; + return constant_boolean_node (result, type); ... just assumes it is either EQ_EXPR or NE_EXPR. I believe you want to change the guarding condition to just if (! VECTOR_TYPE_P (type)) and assert the boolean/precision. Please also merge the asserts into one with && I believe the comment should simply say that VRP doesn't track ranges for vector types. In the previous review I suggested you should make sure that RTL expansion ends up using a well-defined optab for these compares. To make sure this happens across targets I suggest you make these comparisons available via the GCC vector extension. Thus allow typedef int v4si __attribute__((vector_size(16))); int foo (v4si a, v4si b) { if (a == b) return 4; } and != and also using floating point vectors. Otherwise it's hard to see the impact of this change. Obvious choices are the eq/ne optabs for FP compares and [u]cmp optabs for integer compares. A half-way implementation like your VRP comment suggests (only ==/!= zero against integer vectors is implemented?!) this doesn't sound good without also limiting the feature this way in the verifier. Btw, the regression with WRF is >50% on AMD Bulldozer (which only has AVX, not AVX2). Thanks, Richard. > ChangeLog: > 2015-11-30 Yuri Rumyantsev > > PR middle-end/68542 > * config/i386/i386.c (ix86_expand_branch): Implement integral vector > comparison with boolean result. > * config/i386/sse.md (define_expand "cbranch4): Add define-expand > for vector comparion with eq/ne only. > * fold-const.c (fold_relational_const): Add handling of vector > comparison with boolean result. > * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow > comparison of vector operands with boolean result for EQ/NE only. > (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. > (verify_gimple_cond): Likewise. > * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform > combining for non-compatible vector types. > * tree-vect-loop.c (is_valid_sink): New function. > (optimize_mask_stores): Likewise. > * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize > has_mask_store field of vect_info. > * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for > vectorized loops having masked stores. > * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and > correspondent macros. > (optimize_mask_stores): Add prototype. > * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector > type. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c index b82ae3c..73ee3be 100644 --- a/gcc/tree-ssa-forwprop.c +++ b/gcc/tree-ssa-forwprop.c @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type, gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); + /* Do not perform combining it types are not compatible. */ + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0)))) + return NULL_TREE; + again, how does this happen? diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index e67048e..1605520c 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si, &comp_code, &val)) return; + /* Use of vector comparison in gcond is very restricted and used to check + that the mask in masked store is zero, so assert for such comparison + is not implemented yet. */ + if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE) + return; + VECTOR_TYPE_P