From patchwork Thu Nov 12 13:58:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 543355 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 456371413F2 for ; Fri, 13 Nov 2015 00:59:14 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=HHz/hYM1; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=h44TCIBYStMdQ96XP2 27kDdCOmjtKWQdo05fwdLKCDE2h/Lkcxct+jEFo7K8z6KtvSZ2XldEyPQaPO83+V pGVTjRSBQDSUSHioqgpiUOHWwZRZIqnFaTGM8rWAaA6q0NxkPU0yYHdBzIIxdl+T p24ut1Yqo+OKBuPK0Oq7BU+kA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=auBJBQRpXWzlg4K+mL3rymm/ idA=; b=HHz/hYM1q4vdD0KVdiHdA4uRgdW5iUwfmI0QxgLFldERGCVXPPi2B8Wk 3IeoVJpnOWzc8q+woA2t7syvYpBmB2z/iZ8g2aowPrGAb+eW2G5P50pqmw1ghveZ iJgJ1nzA7oXzQgbTa4M92iuMSQd8YkeZmdpAQfV5lTCy5lGjT8k= Received: (qmail 122794 invoked by alias); 12 Nov 2015 13:59:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 122779 invoked by uid 89); 12 Nov 2015 13:59:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f174.google.com Received: from mail-yk0-f174.google.com (HELO mail-yk0-f174.google.com) (209.85.160.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 12 Nov 2015 13:58:58 +0000 Received: by ykdv3 with SMTP id v3so95248924ykd.0 for ; Thu, 12 Nov 2015 05:58:57 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.13.210.4 with SMTP id u4mr15245462ywd.68.1447336736919; Thu, 12 Nov 2015 05:58:56 -0800 (PST) Received: by 10.37.93.11 with HTTP; Thu, 12 Nov 2015 05:58:56 -0800 (PST) In-Reply-To: References: <559F5D7B.6070208@redhat.com> <55B148AB.6010103@redhat.com> <55B28DCB.2080404@redhat.com> Date: Thu, 12 Nov 2015 14:58:56 +0100 Message-ID: Subject: Re: [PATCH] Simple optimization for MASK_STORE. From: Richard Biener To: Yuri Rumyantsev Cc: Ilya Enkovich , Jeff Law , gcc-patches , Igor Zamyatin X-IsSubscribed: yes On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev wrote: > Richard, > > What we should do to cope with this problem (structure size increasing)? > Should we return to vector comparison version? Ok, given this constraint I think the cleanest approach is to allow integer(!) vector equality(!) compares with scalar result. This should then expand via cmp_optab and not via vec_cmp_optab. On gimple you can then have if (mask_vec_1 != {0, 0, .... }) ... Note that a fallback expansion (for optabs.c to try) would be the suggested view-conversion (aka, subreg) variant using a same-sized integer mode. Target maintainers can then choose what is a better fit for their target (and instruction set as register set constraints may apply). The patch you posted seems to do this but not restrict the compares to integer ones (please do that). if (TREE_CODE (op0_type) == VECTOR_TYPE || TREE_CODE (op1_type) == VECTOR_TYPE) { - error ("vector comparison returning a boolean"); - debug_generic_expr (op0_type); - debug_generic_expr (op1_type); - return true; + /* Allow vector comparison returning boolean if operand types + are equal and CODE is EQ/NE. */ + if ((code != EQ_EXPR && code != NE_EXPR) + || TREE_CODE (op0_type) != TREE_CODE (op1_type) + || TYPE_VECTOR_SUBPARTS (op0_type) + != TYPE_VECTOR_SUBPARTS (op1_type) + || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type))) + != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type)))) These are all checked with the useless_type_conversion_p checks done earlier. As said I'd like to see a || ! VECTOR_INTEGER_TYPE_P (op0_type) check added so we and targets do not need to worry about using EQ/NE vs. CMP and worry about signed zeros and friends. + { + error ("type mismatch for vector comparison returning a boolean"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } does not expect this (valid) GENERIC tree. + if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0) + return; not sure if I like a param more than a target hook ... :/ + /* Create vector comparison with boolean result. */ + vectype = TREE_TYPE (mask); + zero = build_zero_cst (TREE_TYPE (vectype)); + zero = build_vector_from_val (vectype, zero); build_zero_cst (vectype); + stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE); you can omit the NULL_TREE operands. + gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME); please omit the assert. + gimple_set_vdef (last, new_vdef); do this before you create the PHI. + /* Put definition statement of stored value in STORE_BB + if possible. */ + arg3 = gimple_call_arg (last, 3); + if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3)) + { ... is this really necessary? It looks incomplete to me anyway. I'd rather have a late sink pass if this shows necessary. Btw,... + it is legal. */ + if (gimple_bb (def_stmt) == bb + && is_valid_sink (def_stmt, last_store)) with the implementation of is_valid_sink this is effectively && (!gimple_vuse (def_stmt) || gimple_vuse (def_stmt) == gimple_vdef (last_store)) I still think this "pass" is quite a hack, esp. as it appears as generic code in a GIMPLE pass. And esp. as this hack seems to be needed for Haswell only, not Boradwell or Skylake. Thanks, Richard. > Thanks. > Yuri. > > 2015-11-11 12:18 GMT+03:00 Richard Biener : >> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich wrote: >>> 2015-11-10 17:46 GMT+03:00 Richard Biener : >>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich wrote: >>>>> 2015-11-10 15:33 GMT+03:00 Richard Biener : >>>>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev wrote: >>>>>>> Richard, >>>>>>> >>>>>>> I tried it but 256-bit precision integer type is not yet supported. >>>>>> >>>>>> What's the symptom? The compare cannot be expanded? Just add a pattern then. >>>>>> After all we have modes up to XImode. >>>>> >>>>> I suppose problem may be in: >>>>> >>>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >>>>> >>>>> which doesn't allow to create constants of bigger size. Changing it >>>>> to maximum vector size (512) would mean we increase wide_int structure >>>>> size significantly. New patterns are probably also needed. >>>> >>>> Yes, new patterns are needed but wide-int should be fine (we only need to create >>>> a literal zero AFACS). The "new pattern" would be equality/inequality >>>> against zero >>>> compares only. >>> >>> Currently 256bit integer creation fails because wide_int for max and >>> min values cannot be created. >> >> Hmm, indeed: >> >> #1 0x000000000072dab5 in wi::extended_tree<192>::extended_tree ( >> this=0x7fffffffd950, t=0x7ffff6a000b0) >> at /space/rguenther/src/svn/trunk/gcc/tree.h:5125 >> 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); >> >> but that's not that the constants fail to be created but >> >> #5 0x00000000010d8828 in build_nonstandard_integer_type (precision=512, >> unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051 >> 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) >> (gdb) l >> 8046 fixup_unsigned_type (itype); >> 8047 else >> 8048 fixup_signed_type (itype); >> 8049 >> 8050 ret = itype; >> 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) >> 8052 ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE >> (itype)), itype); >> >> thus the integer type hashing being "interesting". tree_fits_uhwi_p >> fails because >> it does >> >> 7289 bool >> 7290 tree_fits_uhwi_p (const_tree t) >> 7291 { >> 7292 return (t != NULL_TREE >> 7293 && TREE_CODE (t) == INTEGER_CST >> 7294 && wi::fits_uhwi_p (wi::to_widest (t))); >> 7295 } >> >> and wi::to_widest () fails with doing >> >> 5121 template >> 5122 inline wi::extended_tree ::extended_tree (const_tree t) >> 5123 : m_t (t) >> 5124 { >> 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); >> 5126 } >> >> fixing the hashing then runs into type_cache_hasher::equal doing >> tree_int_cst_equal >> which again uses to_widest (it should be easier and cheaper to do the compare on >> the actual tree representation, but well, seems to be just the first >> of various issues >> we'd run into). >> >> We eventually could fix the assert above (but then need to hope we assert >> when a computation overflows the narrower precision of widest_int) or use >> a special really_widest_int (ugh). >> >>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases >>> WIDE_INT_MAX_ELTS >>> and thus increases wide_int structure. If we use 512 for >>> MAX_BITSIZE_MODE_ANY_INT then >>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for >>> MAX_BITSIZE_MODE_ANY_INT). >>> Is it OK for such narrow usage? >> >> widest_int is used in some long-living structures (which is the reason for >> MAX_BITSIZE_MODE_ANY_INT in the first place). So I don't think so. >> >> Richard. >> >>> Ilya >>> >>>> >>>> Richard. >>>> >>>>> Ilya >>>>> >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Yuri. >>>>>>> >>>>>>> --- a/gcc/tree-ssa-forwprop.c +++ b/gcc/tree-ssa-forwprop.c @@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt, enum tree_code def_code = gimple_assign_rhs_code (def_stmt); bool invariant_only_p = !single_use0_p; + /* Can't combine vector comparison with scalar boolean type of + the result and VEC_COND_EXPR having vector type of comparison. */ + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE + && INTEGRAL_TYPE_P (type) + && (TREE_CODE (type) == BOOLEAN_TYPE + || TYPE_PRECISION (type) == 1) + && def_code == VEC_COND_EXPR) + return NULL_TREE; this hints at larger fallout you paper over here. So this effectively means we're trying combining (vec1 != vec2) != 0 for example and that fails miserably? If so then the solution is to fix whatever