From patchwork Thu Jun 20 07:09:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1119227 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-503329-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45TtHF1Yzjz9s5c for ; Thu, 20 Jun 2019 17:09:37 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:reply-to:mime-version:content-type; q=dns; s=default; b=pV/avnv0m+ehg7KLruIkyRVdEYfXDKDeYFXW6XveuqD lDWbCxFQd2HTNkYT6VfM3C8VNmn/tfJJvvMfriN73qp8mMwm4ci4eVOJIzkokcyk oLTP/OzWutFyPS86jqIRvd1tOilB6CGJzql51hrOtgdbWPlFPpgQ2k0pdhu7Tc+g = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:reply-to:mime-version:content-type; s=default; bh=NCFBxJ2mIBzMgcmPnWSMIlEV9IE=; b=FY7MXuGYUmLlM/Xc5 70sfFlX03hj+0xhTVHcn/BpK9WqSqKNOEWYZ2as0Lj2f/g9CqoeFalfEejMcMvQH 3LMnKFffV9eZ87kkFCxm/hOd1/cJimHHhSeLla5WNlfv0m7louG1obWqlDF5gr2J bMSMEExlvi57cg/XCZVc5V535w= Received: (qmail 99344 invoked by alias); 20 Jun 2019 07:09:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 99335 invoked by uid 89); 20 Jun 2019 07:09:28 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=tre, safe_push X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 20 Jun 2019 07:09:27 +0000 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0EFB8307D935 for ; Thu, 20 Jun 2019 07:09:26 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-51.ams2.redhat.com [10.36.116.51]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AD58F1001B1D for ; Thu, 20 Jun 2019 07:09:25 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x5K79NiI030512 for ; Thu, 20 Jun 2019 09:09:23 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x5K79N6W030511 for gcc-patches@gcc.gnu.org; Thu, 20 Jun 2019 09:09:23 +0200 Date: Thu, 20 Jun 2019 09:09:23 +0200 From: Jakub Jelinek To: gcc-patches@gcc.gnu.org Subject: [committed] Small inclusive scan SSE2 vectorization improvement Message-ID: <20190620070923.GR815@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.11.3 (2019-02-01) X-IsSubscribed: yes Hi! This is a small improvement over the previous patch, the decision to use whole vector left shift + optional VEC_COND_EXPR doesn't have to be binary for the whole scan that contains several permutations, e.g. SSE2 can't do non-whole vector left shift { 0, 4, 5, 6 } permutation, but can do { 0, 1, 4, 5 } and especially if the initializer is not 0, that saves some instructions. The following patch changes the code, so that it remembers what to do for each of the permutations. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2019-06-20 Jakub Jelinek * tree-vect-stmts.c (enum scan_store_kind): New type. (scan_store_can_perm_p): Change last argument from int * to vec *, record precisely which permutations need whole vector left shift or that plus VEC_COND_EXPR. (vectorizable_scan_store): Adjust caller, use whole vector left shift and additional VEC_COND_EXPR only for those iterations that need it. Jakub --- gcc/tree-vect-stmts.c.jj 2019-06-19 11:58:53.161238429 +0200 +++ gcc/tree-vect-stmts.c 2019-06-19 12:40:50.675838267 +0200 @@ -6354,13 +6354,27 @@ scan_operand_equal_p (tree ref1, tree re } +enum scan_store_kind { + /* Normal permutation. */ + scan_store_kind_perm, + + /* Whole vector left shift permutation with zero init. */ + scan_store_kind_lshift_zero, + + /* Whole vector left shift permutation and VEC_COND_EXPR. */ + scan_store_kind_lshift_cond +}; + /* Function check_scan_store. Verify if we can perform the needed permutations or whole vector shifts. - Return -1 on failure, otherwise exact log2 of vectype's nunits. */ + Return -1 on failure, otherwise exact log2 of vectype's nunits. + USE_WHOLE_VECTOR is a vector of enum scan_store_kind which operation + to do at each step. */ static int -scan_store_can_perm_p (tree vectype, tree init, int *use_whole_vector_p = NULL) +scan_store_can_perm_p (tree vectype, tree init, + vec *use_whole_vector = NULL) { enum machine_mode vec_mode = TYPE_MODE (vectype); unsigned HOST_WIDE_INT nunits; @@ -6371,50 +6385,59 @@ scan_store_can_perm_p (tree vectype, tre return -1; int i; + enum scan_store_kind whole_vector_shift_kind = scan_store_kind_perm; for (i = 0; i <= units_log2; ++i) { unsigned HOST_WIDE_INT j, k; + enum scan_store_kind kind = scan_store_kind_perm; vec_perm_builder sel (nunits, nunits, 1); sel.quick_grow (nunits); - if (i == 0) + if (i == units_log2) { for (j = 0; j < nunits; ++j) sel[j] = nunits - 1; } else { - for (j = 0; j < (HOST_WIDE_INT_1U << (i - 1)); ++j) + for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j) sel[j] = j; for (k = 0; j < nunits; ++j, ++k) sel[j] = nunits + k; } - vec_perm_indices indices (sel, i == 0 ? 1 : 2, nunits); + vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits); if (!can_vec_perm_const_p (vec_mode, indices)) - break; - } - - if (i == 0) - return -1; - - if (i <= units_log2) - { - if (optab_handler (vec_shl_optab, vec_mode) == CODE_FOR_nothing) - return -1; - int kind = 1; - /* Whole vector shifts shift in zeros, so if init is all zero constant, - there is no need to do anything further. */ - if ((TREE_CODE (init) != INTEGER_CST - && TREE_CODE (init) != REAL_CST) - || !initializer_zerop (init)) { - tree masktype = build_same_sized_truth_vector_type (vectype); - if (!expand_vec_cond_expr_p (vectype, masktype, VECTOR_CST)) + if (i == units_log2) return -1; - kind = 2; + + if (whole_vector_shift_kind == scan_store_kind_perm) + { + if (optab_handler (vec_shl_optab, vec_mode) == CODE_FOR_nothing) + return -1; + whole_vector_shift_kind = scan_store_kind_lshift_zero; + /* Whole vector shifts shift in zeros, so if init is all zero + constant, there is no need to do anything further. */ + if ((TREE_CODE (init) != INTEGER_CST + && TREE_CODE (init) != REAL_CST) + || !initializer_zerop (init)) + { + tree masktype = build_same_sized_truth_vector_type (vectype); + if (!expand_vec_cond_expr_p (vectype, masktype, VECTOR_CST)) + return -1; + whole_vector_shift_kind = scan_store_kind_lshift_cond; + } + } + kind = whole_vector_shift_kind; + } + if (use_whole_vector) + { + if (kind != scan_store_kind_perm && use_whole_vector->is_empty ()) + use_whole_vector->safe_grow_cleared (i); + if (kind != scan_store_kind_perm || !use_whole_vector->is_empty ()) + use_whole_vector->safe_push (kind); } - if (use_whole_vector_p) - *use_whole_vector_p = kind; } + return units_log2; } @@ -6726,11 +6749,12 @@ vectorizable_scan_store (stmt_vec_info s unsigned HOST_WIDE_INT nunits; if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)) gcc_unreachable (); - int use_whole_vector_p = 0; - int units_log2 = scan_store_can_perm_p (vectype, *init, &use_whole_vector_p); + auto_vec use_whole_vector; + int units_log2 = scan_store_can_perm_p (vectype, *init, &use_whole_vector); gcc_assert (units_log2 > 0); auto_vec perms; perms.quick_grow (units_log2 + 1); + tree zero_vec = NULL_TREE, masktype = NULL_TREE; for (int i = 0; i <= units_log2; ++i) { unsigned HOST_WIDE_INT j, k; @@ -6739,23 +6763,28 @@ vectorizable_scan_store (stmt_vec_info s if (i == units_log2) for (j = 0; j < nunits; ++j) sel[j] = nunits - 1; - else - { - for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j) - sel[j] = j; - for (k = 0; j < nunits; ++j, ++k) - sel[j] = nunits + k; - } + else + { + for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j) + sel[j] = j; + for (k = 0; j < nunits; ++j, ++k) + sel[j] = nunits + k; + } vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits); - if (use_whole_vector_p && i < units_log2) - perms[i] = vect_gen_perm_mask_any (vectype, indices); + if (!use_whole_vector.is_empty () + && use_whole_vector[i] != scan_store_kind_perm) + { + if (zero_vec == NULL_TREE) + zero_vec = build_zero_cst (vectype); + if (masktype == NULL_TREE + && use_whole_vector[i] == scan_store_kind_lshift_cond) + masktype = build_same_sized_truth_vector_type (vectype); + perms[i] = vect_gen_perm_mask_any (vectype, indices); + } else perms[i] = vect_gen_perm_mask_checked (vectype, indices); } - tree zero_vec = use_whole_vector_p ? build_zero_cst (vectype) : NULL_TREE; - tree masktype = (use_whole_vector_p == 2 - ? build_same_sized_truth_vector_type (vectype) : NULL_TREE); stmt_vec_info prev_stmt_info = NULL; tree vec_oprnd1 = NULL_TREE; tree vec_oprnd2 = NULL_TREE; @@ -6788,7 +6817,10 @@ vectorizable_scan_store (stmt_vec_info s { tree new_temp = make_ssa_name (vectype); gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, - zero_vec ? zero_vec : vec_oprnd1, v, + (zero_vec + && (use_whole_vector[i] + != scan_store_kind_perm)) + ? zero_vec : vec_oprnd1, v, perms[i]); new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi); if (prev_stmt_info == NULL) @@ -6797,7 +6829,7 @@ vectorizable_scan_store (stmt_vec_info s STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info; prev_stmt_info = new_stmt_info; - if (use_whole_vector_p == 2) + if (zero_vec && use_whole_vector[i] == scan_store_kind_lshift_cond) { /* Whole vector shift shifted in zero bits, but if *init is not initializer_zerop, we need to replace those elements