From patchwork Thu Sep 14 11:20:00 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 813795 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-462119-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="nvZpbZ8M"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xtGKf410tz9rvt for ; Thu, 14 Sep 2017 21:20:21 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=oO2FUA12Ls1DdK4/BK8+0P47J70I7KFzGSobGWb5vP+dnRVfFPmTi bXSMVQUkV1DWjdc/OoI3JnvlWzBf0MqeEBBlqS3P4aRA9KKJpl6BPdVnKOWvxN6V n4EuSLHokmMBmd8OQmxt8fOfpX1nBX95X61pnxn5CTgricbhf93Xuo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=LFhSRUy/8fWMZZDjQ4/ZWWiavh0=; b=nvZpbZ8MMYNsnj5e+6hf nlBaa/pgbEGZCV7ivTNYNEQYenR9VTj+v00vLP1uzOtuqPIJYQRkSsC5ou4OSYSn J6AWCz4gX/mJ7jAGbfHaH5igTdBkUDJRq2lsWzFfqam4Rq5+E6Z83Cpz5gESosax tq8WUB59kljmtqyvYJmTmxg= Received: (qmail 6302 invoked by alias); 14 Sep 2017 11:20:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5077 invoked by uid 89); 14 Sep 2017 11:20:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wr0-f172.google.com Received: from mail-wr0-f172.google.com (HELO mail-wr0-f172.google.com) (209.85.128.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 14 Sep 2017 11:20:06 +0000 Received: by mail-wr0-f172.google.com with SMTP id c23so28481wrg.9 for ; Thu, 14 Sep 2017 04:20:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=qHxoSBbp4rROdH15dvDegYGwJC3bnLL5loia65l+9Q0=; b=EKuauYUIGt7eQj8c1L6OZMSEV6GpYd9Q8pdDBqe3uhPViAIZ9hSREhjseTkJUAK9Fb GGoqk+2Sjmhsgo3z6SumYoVVr6G9QooqqwO4C+QLo0GgCVTUqoomc8jlFZvcvNRosyrg QHE1Y7Zk7UUTjFv7k2CXGA9mLK+CnL4zyWY3+tvqVYZWnxmdoAtIKTd5jfvJHWc0L2Pd IwiCTJHrvkwKX7/6FIgMn5b0dRM2OblqhbrS9gG9TvhXgPlYeoHqyBQA5JC7PD5Vwl75 OxELEdDRPh5M//wLRALd/AHtGEE4rb4TCKgtpGEvwEDQNrFcEkd5QXdgBiMrkltAo+Fw qPIA== X-Gm-Message-State: AHPjjUj5NzsJ9mLrukNUBJAiTnMmmHIL1I9lnJk52sx7f3QF4VMkAHFC XhBG85rwZjPH0ii6KI39zg== X-Google-Smtp-Source: ADKCNb7CCyYKQN7ArdBkKCH8aGEyVbagR1WUpiN8HHDc+wYJ69rD+1f8t4oz56lI+kojzBUBDKy04A== X-Received: by 10.223.139.21 with SMTP id n21mr16851690wra.32.1505388003148; Thu, 14 Sep 2017 04:20:03 -0700 (PDT) Received: from localhost (92.40.248.70.threembb.co.uk. [92.40.248.70]) by smtp.gmail.com with ESMTPSA id n9sm1789618wmd.12.2017.09.14.04.20.01 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 14 Sep 2017 04:20:02 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Use vec<> for constant permute masks Date: Thu, 14 Sep 2017 12:20:00 +0100 Message-ID: <877ex1y1b3.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 This patch makes can_vec_perm_p & co. take a vec<>, wrapped in new typedefs vec_perm_indices and auto_vec_perm_indices. There are two reasons for doing this for SVE: (1) it means that the number of elements is bundled with the elements themselves, and is obviously constant. (2) it makes it easier to change the "unsigned char" element type to something wider. I'm happy to change the target hooks as a follow-on patch, if this is OK. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-09-14 Richard Sandiford Alan Hayward David Sherwood gcc/ * target.h (vec_perm_indices): New typedef. (auto_vec_perm_indices): Likewise. * optabs-query.h: Include target.h (can_vec_perm_p): Take a vec_perm_indices *. * optabs-query.c (can_vec_perm_p): Likewise. (can_mult_highpart_p): Update accordingly. Use auto_vec_perm_indices. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-generic.c (lower_vec_perm): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_permute_store_chain): Use auto_vec_perm_indices. (vect_permute_load_chain): Likewise. * fold-const.c (fold_vec_perm): Take vec_perm_indices. (fold_ternary_loc): Update accordingly. Use auto_vec_perm_indices. Update uses of can_vec_perm_p. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Replace the mode with a number of elements. Take a vec_perm_indices *. (vect_create_epilog_for_reduction): Update accordingly. Use auto_vec_perm_indices. (have_whole_vector_shift): Likewise. Update call to can_vec_perm_p. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Use auto_vec_perm_indices. * tree-vectorizer.h (vect_gen_perm_mask_any): Take a vec_perm_indices. (vect_gen_perm_mask_checked): Likewise. * tree-vect-stmts.c (vect_gen_perm_mask_any): Take a vec_perm_indices. (vect_gen_perm_mask_checked): Likewise. (vectorizable_mask_load_store): Use auto_vec_perm_indices. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (perm_mask_for_reverse): Likewise. Update call to can_vec_perm_p. (vectorizable_bswap): Likewise. Index: gcc/target.h =================================================================== --- gcc/target.h 2017-09-11 17:10:58.656085547 +0100 +++ gcc/target.h 2017-09-14 11:25:32.162167193 +0100 @@ -191,6 +191,14 @@ enum vect_cost_model_location { vect_epilogue = 2 }; +/* The type to use for vector permutes with a constant permute vector. + Each entry is an index into the concatenated input vectors. */ +typedef vec vec_perm_indices; + +/* Same, but can be used to construct local permute vectors that are + automatically freed. */ +typedef auto_vec auto_vec_perm_indices; + /* The target structure. This holds all the backend hooks. */ #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME; #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS; Index: gcc/optabs-query.h =================================================================== --- gcc/optabs-query.h 2017-08-30 12:14:51.272396735 +0100 +++ gcc/optabs-query.h 2017-09-14 11:25:32.162167193 +0100 @@ -21,6 +21,7 @@ the Free Software Foundation; either ver #define GCC_OPTABS_QUERY_H #include "insn-opinit.h" +#include "target.h" /* Return the insn used to implement mode MODE of OP, or CODE_FOR_nothing if the target does not have such an insn. */ @@ -165,7 +166,7 @@ enum insn_code can_extend_p (machine_mod enum insn_code can_float_p (machine_mode, machine_mode, int); enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *); bool can_conditionally_move_p (machine_mode mode); -bool can_vec_perm_p (machine_mode, bool, const unsigned char *); +bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *); enum insn_code widening_optab_handler (optab, machine_mode, machine_mode); /* Find a widening optab even if it doesn't widen as much as we want. */ #define find_widening_optab_handler(A,B,C,D) \ Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2017-09-05 20:57:40.745898121 +0100 +++ gcc/optabs-query.c 2017-09-14 11:25:32.162167193 +0100 @@ -353,8 +353,7 @@ can_conditionally_move_p (machine_mode m zeroes; this case is not dealt with here. */ bool -can_vec_perm_p (machine_mode mode, bool variable, - const unsigned char *sel) +can_vec_perm_p (machine_mode mode, bool variable, vec_perm_indices *sel) { machine_mode qimode; @@ -368,7 +367,7 @@ can_vec_perm_p (machine_mode mode, bool if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing && (sel == NULL || targetm.vectorize.vec_perm_const_ok == NULL - || targetm.vectorize.vec_perm_const_ok (mode, sel))) + || targetm.vectorize.vec_perm_const_ok (mode, &(*sel)[0]))) return true; } @@ -460,7 +459,6 @@ find_widening_optab_handler_and_mode (op can_mult_highpart_p (machine_mode mode, bool uns_p) { optab op; - unsigned char *sel; unsigned i, nunits; op = uns_p ? umul_highpart_optab : smul_highpart_optab; @@ -472,7 +470,6 @@ can_mult_highpart_p (machine_mode mode, return 0; nunits = GET_MODE_NUNITS (mode); - sel = XALLOCAVEC (unsigned char, nunits); op = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) @@ -480,9 +477,12 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { + auto_vec_perm_indices sel (nunits); for (i = 0; i < nunits; ++i) - sel[i] = !BYTES_BIG_ENDIAN + (i & ~1) + ((i & 1) ? nunits : 0); - if (can_vec_perm_p (mode, false, sel)) + sel.quick_push (!BYTES_BIG_ENDIAN + + (i & ~1) + + ((i & 1) ? nunits : 0)); + if (can_vec_perm_p (mode, false, &sel)) return 2; } } @@ -493,9 +493,10 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { + auto_vec_perm_indices sel (nunits); for (i = 0; i < nunits; ++i) - sel[i] = 2 * i + (BYTES_BIG_ENDIAN ? 0 : 1); - if (can_vec_perm_p (mode, false, sel)) + sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); + if (can_vec_perm_p (mode, false, &sel)) return 3; } } Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c 2017-09-14 11:24:42.667010577 +0100 +++ gcc/tree-ssa-forwprop.c 2017-09-14 11:25:32.163167193 +0100 @@ -1952,7 +1952,6 @@ simplify_vector_constructor (gimple_stmt unsigned elem_size, nelts, i; enum tree_code code, conv_code; constructor_elt *elt; - unsigned char *sel; bool maybe_ident; gcc_checking_assert (gimple_assign_rhs_code (stmt) == CONSTRUCTOR); @@ -1965,7 +1964,7 @@ simplify_vector_constructor (gimple_stmt elem_type = TREE_TYPE (type); elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); - sel = XALLOCAVEC (unsigned char, nelts); + auto_vec_perm_indices sel (nelts); orig = NULL; conv_code = ERROR_MARK; maybe_ident = true; @@ -2023,8 +2022,10 @@ simplify_vector_constructor (gimple_stmt } if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size) return false; - sel[i] = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size; - if (sel[i] != i) maybe_ident = false; + unsigned int elt = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size; + if (elt != i) + maybe_ident = false; + sel.quick_push (elt); } if (i < nelts) return false; @@ -2053,7 +2054,7 @@ simplify_vector_constructor (gimple_stmt { tree mask_type; - if (!can_vec_perm_p (TYPE_MODE (type), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (type), false, &sel)) return false; mask_type = build_vector_type (build_nonstandard_integer_type (elem_size, 1), Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-09-14 11:24:42.667010577 +0100 +++ gcc/tree-vect-generic.c 2017-09-14 11:25:32.164167193 +0100 @@ -1300,13 +1300,13 @@ lower_vec_perm (gimple_stmt_iterator *gs if (TREE_CODE (mask) == VECTOR_CST) { - unsigned char *sel_int = XALLOCAVEC (unsigned char, elements); + auto_vec_perm_indices sel_int (elements); for (i = 0; i < elements; ++i) - sel_int[i] = (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) - & (2 * elements - 1)); + sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) + & (2 * elements - 1)); - if (can_vec_perm_p (TYPE_MODE (vect_type), false, sel_int)) + if (can_vec_perm_p (TYPE_MODE (vect_type), false, &sel_int)) { gimple_assign_set_rhs3 (stmt, mask); update_stmt (stmt); Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-08-30 12:10:14.677681466 +0100 +++ gcc/tree-vect-data-refs.c 2017-09-14 11:25:32.163167193 +0100 @@ -4547,7 +4547,8 @@ vect_grouped_store_supported (tree vecty if (VECTOR_MODE_P (mode)) { unsigned int i, nelt = GET_MODE_NUNITS (mode); - unsigned char *sel = XALLOCAVEC (unsigned char, nelt); + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); if (count == 3) { @@ -4568,7 +4569,7 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - if (!can_vec_perm_p (mode, false, sel)) + if (!can_vec_perm_p (mode, false, &sel)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4585,7 +4586,7 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - if (!can_vec_perm_p (mode, false, sel)) + if (!can_vec_perm_p (mode, false, &sel)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4605,13 +4606,13 @@ vect_grouped_store_supported (tree vecty sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - if (can_vec_perm_p (mode, false, sel)) - { - for (i = 0; i < nelt; i++) - sel[i] += nelt / 2; - if (can_vec_perm_p (mode, false, sel)) - return true; - } + if (can_vec_perm_p (mode, false, &sel)) + { + for (i = 0; i < nelt; i++) + sel[i] += nelt / 2; + if (can_vec_perm_p (mode, false, &sel)) + return true; + } } } @@ -4710,7 +4711,9 @@ vect_permute_store_chain (vec dr_c tree perm3_mask_low, perm3_mask_high; unsigned int i, n, log_length = exact_log2 (length); unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype); - unsigned char *sel = XALLOCAVEC (unsigned char, nelt); + + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); result_chain->quick_grow (length); memcpy (result_chain->address (), dr_chain.address (), @@ -5132,7 +5135,8 @@ vect_grouped_load_supported (tree vectyp if (VECTOR_MODE_P (mode)) { unsigned int i, j, nelt = GET_MODE_NUNITS (mode); - unsigned char *sel = XALLOCAVEC (unsigned char, nelt); + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); if (count == 3) { @@ -5144,7 +5148,7 @@ vect_grouped_load_supported (tree vectyp sel[i] = 3 * i + k; else sel[i] = 0; - if (!can_vec_perm_p (mode, false, sel)) + if (!can_vec_perm_p (mode, false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5157,7 +5161,7 @@ vect_grouped_load_supported (tree vectyp sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - if (!can_vec_perm_p (mode, false, sel)) + if (!can_vec_perm_p (mode, false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5174,11 +5178,11 @@ vect_grouped_load_supported (tree vectyp gcc_assert (pow2p_hwi (count)); for (i = 0; i < nelt; i++) sel[i] = i * 2; - if (can_vec_perm_p (mode, false, sel)) + if (can_vec_perm_p (mode, false, &sel)) { for (i = 0; i < nelt; i++) sel[i] = i * 2 + 1; - if (can_vec_perm_p (mode, false, sel)) + if (can_vec_perm_p (mode, false, &sel)) return true; } } @@ -5292,7 +5296,9 @@ vect_permute_load_chain (vec dr_ch tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)); unsigned int i, j, log_length = exact_log2 (length); unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype); - unsigned char *sel = XALLOCAVEC (unsigned char, nelt); + + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); result_chain->quick_grow (length); memcpy (result_chain->address (), dr_chain.address (), @@ -5486,10 +5492,12 @@ vect_shift_permute_load_chain (vec tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)); unsigned int i; unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype); - unsigned char *sel = XALLOCAVEC (unsigned char, nelt); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); + result_chain->quick_grow (length); memcpy (result_chain->address (), dr_chain.address (), length * sizeof (tree)); @@ -5501,7 +5509,7 @@ vect_shift_permute_load_chain (vec sel[i] = i * 2; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2 + 1; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5515,7 +5523,7 @@ vect_shift_permute_load_chain (vec sel[i] = i * 2 + 1; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5529,7 +5537,7 @@ vect_shift_permute_load_chain (vec For vector length 8 it is {4 5 6 7 8 9 10 11}. */ for (i = 0; i < nelt; i++) sel[i] = nelt / 2 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5544,7 +5552,7 @@ vect_shift_permute_load_chain (vec sel[i] = i; for (i = nelt / 2; i < nelt; i++) sel[i] = nelt + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5607,7 +5615,7 @@ vect_shift_permute_load_chain (vec sel[i] = 3 * k + (l % 3); k++; } - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5621,7 +5629,7 @@ vect_shift_permute_load_chain (vec For vector length 8 it is {6 7 8 9 10 11 12 13}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5634,7 +5642,7 @@ vect_shift_permute_load_chain (vec For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + 1 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5647,7 +5655,7 @@ vect_shift_permute_load_chain (vec For vector length 8 it is {3 4 5 6 7 8 9 10}. */ for (i = 0; i < nelt; i++) sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5660,7 +5668,7 @@ vect_shift_permute_load_chain (vec For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-09-14 11:24:42.666088258 +0100 +++ gcc/fold-const.c 2017-09-14 11:25:32.162167193 +0100 @@ -8786,12 +8786,14 @@ vec_cst_ctor_to_array (tree arg, unsigne NULL_TREE otherwise. */ static tree -fold_vec_perm (tree type, tree arg0, tree arg1, const unsigned char *sel) +fold_vec_perm (tree type, tree arg0, tree arg1, vec_perm_indices sel) { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i; + unsigned int i; bool need_ctor = false; - gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)) == nelts + unsigned int nelts = sel.length (); + gcc_assert (TYPE_VECTOR_SUBPARTS (type) == nelts + && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)) == nelts && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)) == nelts); if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) @@ -11312,15 +11314,15 @@ fold_ternary_loc (location_t loc, enum t || TREE_CODE (arg2) == CONSTRUCTOR)) { unsigned int nelts = VECTOR_CST_NELTS (arg0), i; - unsigned char *sel = XALLOCAVEC (unsigned char, nelts); gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); + auto_vec_perm_indices sel (nelts); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg0, i); if (integer_all_onesp (val)) - sel[i] = i; + sel.quick_push (i); else if (integer_zerop (val)) - sel[i] = nelts + i; + sel.quick_push (nelts + i); else /* Currently unreachable. */ return NULL_TREE; } @@ -11643,8 +11645,6 @@ fold_ternary_loc (location_t loc, enum t if (TREE_CODE (arg2) == VECTOR_CST) { unsigned int nelts = VECTOR_CST_NELTS (arg2), i, mask, mask2; - unsigned char *sel = XALLOCAVEC (unsigned char, 2 * nelts); - unsigned char *sel2 = sel + nelts; bool need_mask_canon = false; bool need_mask_canon2 = false; bool all_in_vec0 = true; @@ -11656,6 +11656,8 @@ fold_ternary_loc (location_t loc, enum t mask2 = 2 * nelts - 1; mask = single_arg ? (nelts - 1) : mask2; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); + auto_vec_perm_indices sel (nelts); + auto_vec_perm_indices sel2 (nelts); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg2, i); @@ -11667,16 +11669,19 @@ fold_ternary_loc (location_t loc, enum t wide_int t = val; need_mask_canon |= wi::gtu_p (t, mask); need_mask_canon2 |= wi::gtu_p (t, mask2); - sel[i] = t.to_uhwi () & mask; - sel2[i] = t.to_uhwi () & mask2; + unsigned int elt = t.to_uhwi () & mask; + unsigned int elt2 = t.to_uhwi () & mask2; - if (sel[i] < nelts) + if (elt < nelts) all_in_vec1 = false; else all_in_vec0 = false; - if ((sel[i] & (nelts-1)) != i) + if ((elt & (nelts - 1)) != i) maybe_identity = false; + + sel.quick_push (elt); + sel2.quick_push (elt2); } if (maybe_identity) @@ -11714,8 +11719,8 @@ fold_ternary_loc (location_t loc, enum t argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 - && !can_vec_perm_p (TYPE_MODE (type), false, sel) - && can_vec_perm_p (TYPE_MODE (type), false, sel2)) + && !can_vec_perm_p (TYPE_MODE (type), false, &sel) + && can_vec_perm_p (TYPE_MODE (type), false, &sel2)) { need_mask_canon = need_mask_canon2; sel = sel2; Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-09-14 11:24:42.667932896 +0100 +++ gcc/tree-vect-loop.c 2017-09-14 11:25:32.164167193 +0100 @@ -3698,15 +3698,15 @@ vect_estimate_min_profitable_iters (loop } /* Writes into SEL a mask for a vec_perm, equivalent to a vec_shr by OFFSET - vector elements (not bits) for a vector of mode MODE. */ + vector elements (not bits) for a vector with NELT elements. */ static void -calc_vec_perm_mask_for_shift (machine_mode mode, unsigned int offset, - unsigned char *sel) +calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt, + vec_perm_indices *sel) { - unsigned int i, nelt = GET_MODE_NUNITS (mode); + unsigned int i; for (i = 0; i < nelt; i++) - sel[i] = (i + offset) & (2*nelt - 1); + sel->quick_push ((i + offset) & (2 * nelt - 1)); } /* Checks whether the target supports whole-vector shifts for vectors of mode @@ -3722,12 +3722,13 @@ have_whole_vector_shift (machine_mode mo return false; unsigned int i, nelt = GET_MODE_NUNITS (mode); - unsigned char *sel = XALLOCAVEC (unsigned char, nelt); + auto_vec_perm_indices sel (nelt); for (i = nelt/2; i >= 1; i/=2) { - calc_vec_perm_mask_for_shift (mode, i, sel); - if (!can_vec_perm_p (mode, false, sel)) + sel.truncate (0); + calc_vec_perm_mask_for_shift (i, nelt, &sel); + if (!can_vec_perm_p (mode, false, &sel)) return false; } return true; @@ -5059,7 +5060,7 @@ vect_create_epilog_for_reduction (vec= 1; elt_offset /= 2) { - calc_vec_perm_mask_for_shift (mode, elt_offset, sel); - tree mask = vect_gen_perm_mask_any (vectype, sel); + sel.truncate (0); + calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); + tree mask = vect_gen_perm_mask_any (vectype, sel); epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR, new_temp, zero_vec, mask); new_name = make_ssa_name (vec_dest, epilog_stmt); Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-09-14 11:24:42.667932896 +0100 +++ gcc/tree-vect-slp.c 2017-09-14 11:25:32.165167193 +0100 @@ -873,15 +873,16 @@ vect_build_slp_tree_1 (vec_info *vinfo, if (alt_stmt_code != ERROR_MARK && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) { - unsigned char *sel - = XALLOCAVEC (unsigned char, TYPE_VECTOR_SUBPARTS (vectype)); - for (i = 0; i < TYPE_VECTOR_SUBPARTS (vectype); ++i) + unsigned int count = TYPE_VECTOR_SUBPARTS (vectype); + auto_vec_perm_indices sel (count); + for (i = 0; i < count; ++i) { - sel[i] = i; + unsigned int elt = i; if (gimple_assign_rhs_code (stmts[i % group_size]) == alt_stmt_code) - sel[i] += TYPE_VECTOR_SUBPARTS (vectype); + elt += count; + sel.quick_push (elt); } - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) { for (i = 0; i < group_size; ++i) if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) @@ -3486,7 +3487,6 @@ vect_transform_slp_perm_load (slp_tree n tree vectype = STMT_VINFO_VECTYPE (stmt_info); int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance); int mask_element; - unsigned char *mask; machine_mode mode; if (!STMT_VINFO_GROUPED_ACCESS (stmt_info)) @@ -3502,7 +3502,8 @@ vect_transform_slp_perm_load (slp_tree n (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1); mask_type = get_vectype_for_scalar_type (mask_element_type); nunits = TYPE_VECTOR_SUBPARTS (vectype); - mask = XALLOCAVEC (unsigned char, nunits); + auto_vec_perm_indices mask (nunits); + mask.quick_grow (nunits); /* Initialize the vect stmts of NODE to properly insert the generated stmts later. */ @@ -3577,7 +3578,7 @@ vect_transform_slp_perm_load (slp_tree n if (index == nunits) { if (! noop_p - && ! can_vec_perm_p (mode, false, mask)) + && ! can_vec_perm_p (mode, false, &mask)) { if (dump_enabled_p ()) { @@ -3730,15 +3731,15 @@ vect_schedule_slp_instance (slp_tree nod enum tree_code code0 = gimple_assign_rhs_code (stmt); enum tree_code ocode = ERROR_MARK; gimple *ostmt; - unsigned char *mask = XALLOCAVEC (unsigned char, group_size); + auto_vec_perm_indices mask (group_size); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt) if (gimple_assign_rhs_code (ostmt) != code0) { - mask[i] = 1; + mask.quick_push (1); ocode = gimple_assign_rhs_code (ostmt); } else - mask[i] = 0; + mask.quick_push (0); if (ocode != ERROR_MARK) { vec v0; Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-08-29 20:01:07.143372092 +0100 +++ gcc/tree-vectorizer.h 2017-09-14 11:25:32.166167193 +0100 @@ -1151,8 +1151,8 @@ extern void vect_get_load_cost (struct d extern void vect_get_store_cost (struct data_reference *, int, unsigned int *, stmt_vector_for_cost *); extern bool vect_supportable_shift (enum tree_code, tree); -extern tree vect_gen_perm_mask_any (tree, const unsigned char *); -extern tree vect_gen_perm_mask_checked (tree, const unsigned char *); +extern tree vect_gen_perm_mask_any (tree, vec_perm_indices); +extern tree vect_gen_perm_mask_checked (tree, vec_perm_indices); extern void optimize_mask_stores (struct loop*); /* In tree-vect-data-refs.c. */ Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-09-14 11:24:42.668855214 +0100 +++ gcc/tree-vect-stmts.c 2017-09-14 11:25:32.166167193 +0100 @@ -1706,15 +1706,14 @@ compare_step_with_zero (gimple *stmt) perm_mask_for_reverse (tree vectype) { int i, nunits; - unsigned char *sel; nunits = TYPE_VECTOR_SUBPARTS (vectype); - sel = XALLOCAVEC (unsigned char, nunits); + auto_vec_perm_indices sel (nunits); for (i = 0; i < nunits; ++i) - sel[i] = nunits - 1 - i; + sel.quick_push (nunits - 1 - i); - if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel)) + if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel)) return NULL_TREE; return vect_gen_perm_mask_checked (vectype, sel); } @@ -2171,19 +2170,20 @@ vectorizable_mask_load_store (gimple *st modifier = NONE; else if (nunits == gather_off_nunits / 2) { - unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits); modifier = WIDEN; + auto_vec_perm_indices sel (gather_off_nunits); for (i = 0; i < gather_off_nunits; ++i) - sel[i] = i | nunits; + sel.quick_push (i | nunits); perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); } else if (nunits == gather_off_nunits * 2) { - unsigned char *sel = XALLOCAVEC (unsigned char, nunits); modifier = NARROW; + auto_vec_perm_indices sel (nunits); + sel.quick_grow (nunits); for (i = 0; i < nunits; ++i) sel[i] = i < gather_off_nunits ? i : i + nunits - gather_off_nunits; @@ -2481,14 +2481,14 @@ vectorizable_bswap (gimple *stmt, gimple return false; unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype); - unsigned char *elts = XALLOCAVEC (unsigned char, num_bytes); - unsigned char *elt = elts; unsigned word_bytes = num_bytes / nunits; + + auto_vec_perm_indices elts (num_bytes); for (unsigned i = 0; i < nunits; ++i) for (unsigned j = 0; j < word_bytes; ++j) - *elt++ = (i + 1) * word_bytes - j - 1; + elts.quick_push ((i + 1) * word_bytes - j - 1); - if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, elts)) + if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, &elts)) return false; if (! vec_stmt) @@ -5803,22 +5803,22 @@ vectorizable_store (gimple *stmt, gimple modifier = NONE; else if (nunits == (unsigned int) scatter_off_nunits / 2) { - unsigned char *sel = XALLOCAVEC (unsigned char, scatter_off_nunits); modifier = WIDEN; + auto_vec_perm_indices sel (scatter_off_nunits); for (i = 0; i < (unsigned int) scatter_off_nunits; ++i) - sel[i] = i | nunits; + sel.quick_push (i | nunits); perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); gcc_assert (perm_mask != NULL_TREE); } else if (nunits == (unsigned int) scatter_off_nunits * 2) { - unsigned char *sel = XALLOCAVEC (unsigned char, nunits); modifier = NARROW; + auto_vec_perm_indices sel (nunits); for (i = 0; i < (unsigned int) nunits; ++i) - sel[i] = i | scatter_off_nunits; + sel.quick_push (i | scatter_off_nunits); perm_mask = vect_gen_perm_mask_checked (vectype, sel); gcc_assert (perm_mask != NULL_TREE); @@ -6503,19 +6503,19 @@ vectorizable_store (gimple *stmt, gimple vect_gen_perm_mask_checked. */ tree -vect_gen_perm_mask_any (tree vectype, const unsigned char *sel) +vect_gen_perm_mask_any (tree vectype, vec_perm_indices sel) { tree mask_elt_type, mask_type, mask_vec; - int i, nunits; - nunits = TYPE_VECTOR_SUBPARTS (vectype); + unsigned int nunits = sel.length (); + gcc_checking_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype)); mask_elt_type = lang_hooks.types.type_for_mode (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1); mask_type = get_vectype_for_scalar_type (mask_elt_type); auto_vec mask_elts (nunits); - for (i = 0; i < nunits; ++i) + for (unsigned int i = 0; i < nunits; ++i) mask_elts.quick_push (build_int_cst (mask_elt_type, sel[i])); mask_vec = build_vector (mask_type, mask_elts); @@ -6526,9 +6526,9 @@ vect_gen_perm_mask_any (tree vectype, co i.e. that the target supports the pattern _for arbitrary input vectors_. */ tree -vect_gen_perm_mask_checked (tree vectype, const unsigned char *sel) +vect_gen_perm_mask_checked (tree vectype, vec_perm_indices sel) { - gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, sel)); + gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel)); return vect_gen_perm_mask_any (vectype, sel); } @@ -6841,22 +6841,22 @@ vectorizable_load (gimple *stmt, gimple_ modifier = NONE; else if (nunits == gather_off_nunits / 2) { - unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits); modifier = WIDEN; + auto_vec_perm_indices sel (gather_off_nunits); for (i = 0; i < gather_off_nunits; ++i) - sel[i] = i | nunits; + sel.quick_push (i | nunits); perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); } else if (nunits == gather_off_nunits * 2) { - unsigned char *sel = XALLOCAVEC (unsigned char, nunits); modifier = NARROW; + auto_vec_perm_indices sel (nunits); for (i = 0; i < nunits; ++i) - sel[i] = i < gather_off_nunits - ? i : i + nunits - gather_off_nunits; + sel.quick_push (i < gather_off_nunits + ? i : i + nunits - gather_off_nunits); perm_mask = vect_gen_perm_mask_checked (vectype, sel); ncopies *= 2;