From patchwork Tue Jun 28 13:59:32 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ilya Enkovich <enkovich.gnu@gmail.com>
X-Patchwork-Id: 641574
Return-Path: 
 <gcc-patches-return-430612-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3rf6vD73lRz9sXy
	for <incoming@patchwork.ozlabs.org>;
	Wed, 29 Jun 2016 00:02:32 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=SY0RABoL; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; q=dns; s=default; b=Gav/iSzXIDmdMQAlf
	jbnFfpEqROvX08rt0WU0DpDP+N1bEZtKxHW4DgX1Gv0b9tsNkTY6i6zXP8NxH4kS
	KuAzIQIGLKX4Cx2TZhRUjNvuX7CGjzluX+Ch0B3+Y607xapLTTxTYvr+Ssz27BWE
	qpacr68cqrVvni5Nmsz+x/wTEQ=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; s=default; bh=bBXc/i7XxKf+GHhPFpFAj5m
	3d8c=; b=SY0RABoL47cg2+Rz5CEs+JCPAYwLQ7fZcbFLiUueQ8k7cR4Bu4TN14a
	bR1VGlhzTDPfbHEKhxkjXk7d4H+dMB/VsLF4xlzIl01DldCGtJtCyeX+TC4jUsRe
	wUJiQQo8NWHlB/0+AlJNVC8CJafxFbpxi1av4hxdfVtuQomsHpOQ=
Received: (qmail 125305 invoked by alias); 28 Jun 2016 14:02:22 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 125292 invoked by uid 89); 28 Jun 2016 14:02:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2 spammy=ilya.enkovich@intel.com,
	ilyaenkovichintelcom, typos, Process
X-HELO: mail-qk0-f173.google.com
Received: from mail-qk0-f173.google.com (HELO mail-qk0-f173.google.com)
	(209.85.220.173) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
	encrypted) ESMTPS; Tue, 28 Jun 2016 14:02:08 +0000
Received: by mail-qk0-f173.google.com with SMTP id z142so6003602qkb.3 for
	<gcc-patches@gcc.gnu.org>; Tue, 28 Jun 2016 07:02:07 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20130820;
	h=x-gm-message-state:date:from:to:cc:subject:message-id:references
	:mime-version:content-disposition:in-reply-to:user-agent;
	bh=c+DbJAHrtW7xQXR2Kd2LzLHqv/+7BPD2eytJgGUWgdk=;
	b=TNFSMbKPiU8XaICY8x0iieizL8HhmtCQi5lViFe4y1XUXJD6bi2G2ND0lXqPyF8pxm
	wIu1e7SwhV2GoL7Vt89vk37peKcR2RGmq038ywdFnUb75Kna5x9E2G5fDfmUwCb5CrpN
	1WNBRatxd9yHJVN8cgM/ksN4Cw8BDUm5dgCiTK9bRL6gwhY0gyzqt2TTSb6FMbaANzbD
	EwoxeFkUNx/it1IWvWXsTr8dTHfoaSUcvOaWHxfgdy4Vw+BQXYOh5I/vUxuYxroR1XIA
	XquTbFGlVl6/FJghNTeep01wcqkU1ZDWTMcLo2ZEg1y5LXqv+viDwNkOT1sw+yyssViM
	Wjcg==
X-Gm-Message-State: 
 ALyK8tJOICRDczh9BeGJsfBkVVpZ9Caym2dY2Cm9xTtNaPQzR3fiRCLvW4FRP+3u1j+61A==
X-Received: by 10.55.73.209 with SMTP id w200mr2066998qka.77.1467122525394;
	Tue, 28 Jun 2016 07:02:05 -0700 (PDT)
Received: from msticlxl57.ims.intel.com (irdmzpr02-ext.ir.intel.com.
	[192.198.151.37]) by smtp.gmail.com with ESMTPSA id
	s6sm753609qtc.11.2016.06.28.07.02.02 (version=TLS1_2
	cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 28 Jun 2016 07:02:04 -0700 (PDT)
Date: Tue, 28 Jun 2016 16:59:32 +0300
From: Ilya Enkovich <enkovich.gnu@gmail.com>
To: Jeff Law <law@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH, vec-tails 05/10] Check if loop can be masked
Message-ID: <20160628135932.GA11812@msticlxl57.ims.intel.com>
References: <20160519194208.GF40563@msticlxl57.ims.intel.com>
	<8c811442-df35-986a-d02d-b9c2669876d2@redhat.com>
	<CAMbmDYa9vvFq9eKSfWvhWXNyk3s8Or6EXB394fV3yMwWmYbwXQ@mail.gmail.com>
	<f59318c4-2973-b15f-dcbb-d002896850c6@redhat.com>
	<20160623095422.GC30064@msticlxl57.ims.intel.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160623095422.GC30064@msticlxl57.ims.intel.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-IsSubscribed: yes

On 23 Jun 12:54, Ilya Enkovich wrote:
> 
> Here is an updated version with less typos and more comments.
> 
> Thanks,
> Ilya
> --

Here is an updated version with trapping statements check added
to vect_analyze_stmt.

Thanks,
Ilya
---
gcc/

2016-06-28  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-vect-loop.c: Include insn-config.h and recog.h.
	(vect_check_required_masks_widening): New.
	(vect_check_required_masks_narrowing): New.
	(vect_get_masking_iv_elems): New.
	(vect_get_masking_iv_type): New.
	(vect_get_extreme_masks): New.
	(vect_check_required_masks): New.
	(vect_analyze_loop_operations): Add vect_check_required_masks
	call to compute LOOP_VINFO_CAN_BE_MASKED.
	(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
	LOOP_VINFO_NEED_MASKING before starting over.
	(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
	masking cost.
	* tree-vect-stmts.c (can_mask_load_store): New.
	(vect_model_load_masking_cost): New.
	(vect_model_store_masking_cost): New.
	(vect_model_simple_masking_cost): New.
	(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
	and masking cost.
	(vectorizable_simd_clone_call): Likewise.
	(vectorizable_store): Likewise.
	(vectorizable_load): Likewise.
	(vect_stmt_should_be_masked_for_epilogue): New.
	(vect_add_required_mask_for_stmt): New.
	(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
	* tree-vectorizer.h (vect_model_load_masking_cost): New.
	(vect_model_store_masking_cost): New.
	(vect_model_simple_masking_cost): New.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 36e60d4..1146de9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"		/* FIXME: for insn_data */
 #include "diagnostic-core.h"
 #include "fold-const.h"
 #include "stor-layout.h"
@@ -1603,6 +1605,270 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
 		     vectorization_factor);
 }
 
+/* Function vect_check_required_masks_widening.
+
+   Return 1 if vector mask of type MASK_TYPE can be widened
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_widening (loop_vec_info loop_vinfo,
+				    tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (mask_elems > req_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (mask_elems / req_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+    return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
+  if (icode == CODE_FOR_nothing
+      || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
+    return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+    {
+      mask_elems = mask_elems >> 1;
+      mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+	return false;
+
+      if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
+					       req_elems))
+	return false;
+    }
+  else
+    {
+      mask_type = build_truth_vector_type (req_elems, current_vector_size);
+      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+	return false;
+    }
+
+  return true;
+}
+
+/* Function vect_check_required_masks_narrowing.
+
+   Return 1 if vector mask of type MASK_TYPE can be narrowed
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
+				     tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (req_elems > mask_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (req_elems / mask_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+    return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_pack_trunc_optab, mode);
+  if (icode == CODE_FOR_nothing)
+    return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+    {
+      mask_elems = mask_elems << 1;
+      mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+	return false;
+
+      if (!vect_check_required_masks_narrowing (loop_vinfo, mask_type,
+						req_elems))
+	return false;
+    }
+  else
+    {
+      mask_type = build_truth_vector_type (req_elems, current_vector_size);
+      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+	return false;
+    }
+
+  return true;
+}
+
+/* Function vect_get_masking_iv_elems.
+
+   Return a number of elements in IV used for loop masking.  */
+static int
+vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
+{
+  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
+  tree iv_vectype = get_vectype_for_scalar_type (iv_type);
+
+  /* We extend IV type in case it is not big enough to
+     fill full vector.  */
+  return MIN ((int)TYPE_VECTOR_SUBPARTS (iv_vectype),
+	      LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+}
+
+/* Function vect_get_masking_iv_type.
+
+   Return a type of IV used for loop masking.  */
+static tree
+vect_get_masking_iv_type (loop_vec_info loop_vinfo)
+{
+  /* Masking IV is to be compared to vector of NITERS and therefore
+     type of NITERS is used as a basic type for IV.
+     FIXME: It can be improved by using smaller size when possible
+     for more efficient masks computation.  */
+  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
+  tree iv_vectype = get_vectype_for_scalar_type (iv_type);
+  unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  if (TYPE_VECTOR_SUBPARTS (iv_vectype) <= vf)
+    return iv_vectype;
+
+  unsigned elem_size = current_vector_size * BITS_PER_UNIT / vf;
+  iv_type = build_nonstandard_integer_type (elem_size, TYPE_UNSIGNED (iv_type));
+
+  return get_vectype_for_scalar_type (iv_type);
+}
+
+/* Function vect_get_extreme_masks.
+
+   Determine minimum and maximum number of elements in masks
+   required for masking a loop described by LOOP_VINFO.
+   Computed values are returned in MIN_MASK_ELEMS and
+   MAX_MASK_ELEMS. */
+
+static void
+vect_get_extreme_masks (loop_vec_info loop_vinfo,
+			unsigned *min_mask_elems,
+			unsigned *max_mask_elems)
+{
+  unsigned required_masks = LOOP_VINFO_REQUIRED_MASKS (loop_vinfo);
+  unsigned elems = 1;
+
+  *min_mask_elems = *max_mask_elems = vect_get_masking_iv_elems (loop_vinfo);
+
+  while (required_masks)
+    {
+      if (required_masks & 1)
+	{
+	  if (elems < *min_mask_elems)
+	    *min_mask_elems = elems;
+	  if (elems > *max_mask_elems)
+	    *max_mask_elems = elems;
+	}
+      elems = elems << 1;
+      required_masks = required_masks >> 1;
+    }
+}
+
+/* Function vect_check_required_masks.
+
+   For given LOOP_VINFO check all required masks can be computed
+   and add computation cost into loop cost data.  */
+
+static void
+vect_check_required_masks (loop_vec_info loop_vinfo)
+{
+  if (!LOOP_VINFO_REQUIRED_MASKS (loop_vinfo))
+    return;
+
+  /* Firstly check we have a proper comparison to get
+     an initial mask.  */
+  tree iv_vectype = vect_get_masking_iv_type (loop_vinfo);
+  unsigned iv_elems = TYPE_VECTOR_SUBPARTS (iv_vectype);
+
+  tree mask_type = build_same_sized_truth_vector_type (iv_vectype);
+
+  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: required vector comparison "
+			 "is not supported.\n");
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+      return;
+    }
+
+  int cmp_copies  = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / iv_elems;
+  /* Add cost of initial iv values creation.  */
+  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), cmp_copies,
+		 scalar_to_vec, NULL, 0, vect_masking_prologue);
+  /* Add cost of upper bound and step values creation.  It is the same
+     for all copies.  */
+  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), 2,
+		 scalar_to_vec, NULL, 0, vect_masking_prologue);
+  /* Add cost of vector comparisons.  */
+  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), cmp_copies,
+		 vector_stmt, NULL, 0, vect_masking_body);
+  /* Add cost of iv increment.  */
+  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), cmp_copies,
+		 vector_stmt, NULL, 0, vect_masking_body);
+
+
+  /* Now check the widest and the narrowest masks.
+     All intermediate values are obtained while
+     computing extreme values.  */
+  unsigned min_mask_elems = 0;
+  unsigned max_mask_elems = 0;
+
+  vect_get_extreme_masks (loop_vinfo, &min_mask_elems, &max_mask_elems);
+
+  if (min_mask_elems < iv_elems)
+    {
+      /* Check mask widening is available.  */
+      if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
+					       min_mask_elems))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: required mask widening "
+			     "is not supported.\n");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	  return;
+	}
+
+      /* Add widening cost.  We have totally (2^N - 1) vectors
+	 we need to widen per each original vector, where N is
+	 a number of conversion steps.  Each widening requires
+	 two extracts.  */
+      int steps = exact_log2 (iv_elems / min_mask_elems);
+      int conversions = cmp_copies * 2 * ((1 << steps) - 1);
+      add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
+		     conversions, vec_promote_demote,
+		     NULL, 0, vect_masking_body);
+    }
+
+  if (max_mask_elems > iv_elems)
+    {
+      if (!vect_check_required_masks_narrowing (loop_vinfo, mask_type,
+						max_mask_elems))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: required mask narrowing "
+			     "is not supported.\n");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	  return;
+	}
+
+      /* Add narrowing cost.  We have totally (2^N - 1) vector
+	 narrowings per each resulting vector, where N is
+	 a number of conversion steps.  */
+      int steps = exact_log2 (max_mask_elems / iv_elems);
+      int results = cmp_copies * iv_elems / max_mask_elems;
+      int conversions = results * ((1 << steps) - 1);
+      add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
+		     conversions, vec_promote_demote,
+		     NULL, 0, vect_masking_body);
+    }
+}
+
 /* Function vect_analyze_loop_operations.
 
    Scan the loop stmts and make sure they are all vectorizable.  */
@@ -1761,6 +2027,12 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
       return false;
     }
 
+  /* If all statements can be masked then we also need
+     to check we may compute required masks and compute
+     its cost.  */
+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+    vect_check_required_masks (loop_vinfo);
+
   return true;
 }
 
@@ -2236,6 +2508,8 @@ again:
   LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
   LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = false;
   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = 0;
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = true;
+  LOOP_VINFO_NEED_MASKING (loop_vinfo) = false;
 
   goto start_over;
 }
@@ -5428,6 +5702,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
       outer_loop = loop;
       loop = loop->inner;
       nested_cycle = true;
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
     }
 
   /* 1. Is vectorizable reduction?  */
@@ -5627,6 +5902,18 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
 
   gcc_assert (ncopies >= 1);
 
+  if (slp_node || PURE_SLP_STMT (stmt_info) || code == COND_EXPR
+      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	 == INTEGER_INDUC_COND_REDUCTION)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: unsupported conditional "
+			 "reduction\n");
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+    }
+
   vec_mode = TYPE_MODE (vectype_in);
 
   if (code == COND_EXPR)
@@ -5904,6 +6191,19 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
 	  return false;
 	}
     }
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+    {
+      /* Check that masking of reduction is supported.  */
+      tree mask_vtype = build_same_sized_truth_vector_type (vectype_out);
+      if (!expand_vec_cond_expr_p (vectype_out, mask_vtype))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: required vector conditional "
+			     "expression is not supported.\n");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	}
+    }
 
   if (!vec_stmt) /* transformation not required.  */
     {
@@ -5912,6 +6212,10 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
 					 reduc_index))
         return false;
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	vect_model_simple_masking_cost (stmt_info, ncopies);
+
       return true;
     }
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d2e16d0..2a09591 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "builtins.h"
 #include "internal-fn.h"
+#include "tree-ssa-loop-ivopts.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -535,6 +536,38 @@ process_use (gimple *stmt, tree use, loop_vec_info loop_vinfo, bool live_p,
   return true;
 }
 
+/* Return true if STMT can be converted to masked form.  */
+
+static bool
+can_mask_load_store (gimple *stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype, mask_vectype;
+  tree lhs, ref;
+
+  if (!stmt_info)
+    return false;
+  lhs = gimple_assign_lhs (stmt);
+  ref = (TREE_CODE (lhs) == SSA_NAME) ? gimple_assign_rhs1 (stmt) : lhs;
+  if (may_be_nonaddressable_p (ref))
+    return false;
+  vectype = STMT_VINFO_VECTYPE (stmt_info);
+  mask_vectype = build_same_sized_truth_vector_type (vectype);
+  if (!can_vec_mask_load_store_p (TYPE_MODE (vectype),
+				  TYPE_MODE (mask_vectype),
+				  gimple_assign_load_p (stmt)))
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "Statement can't be masked.\n");
+	  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	}
+
+       return false;
+    }
+  return true;
+}
 
 /* Function vect_mark_stmts_to_be_vectorized.
 
@@ -1193,6 +1226,56 @@ vect_get_load_cost (struct data_reference *dr, int ncopies,
     }
 }
 
+/* Function vect_model_load_masking_cost.
+
+   Models cost for memory load masking.  */
+
+void
+vect_model_load_masking_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  /* MASK_LOAD case.  */
+  if (gimple_code (stmt_info->stmt) == GIMPLE_CALL)
+    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
+			   ncopies, vector_mask_load, stmt_info, false,
+			   vect_masking_body);
+  /* Other loads.  */
+  else
+    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
+			   ncopies, vector_load, stmt_info, false,
+			   vect_masking_body);
+}
+
+/* Function vect_model_store_masking_cost.
+
+   Models cost for memory store masking.  */
+
+void
+vect_model_store_masking_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  /* MASK_STORE case.  */
+  if (gimple_code (stmt_info->stmt) == GIMPLE_CALL)
+    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
+			   ncopies, vector_mask_store, stmt_info, false,
+			   vect_masking_body);
+  /* Other stores.  */
+  else
+    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
+			   ncopies, vector_store, stmt_info, false,
+			   vect_masking_body);
+}
+
+/* Function vect_model_simple_masking_cost.
+
+   Models cost for statement masking.  Return estimated cost.  */
+
+void
+vect_model_simple_masking_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
+			 ncopies, vector_stmt, stmt_info, false,
+			 vect_masking_body);
+}
+
 /* Insert the new stmt NEW_STMT at *GSI or at the appropriate place in
    the loop preheader for the vectorized stmt STMT.  */
 
@@ -1798,6 +1881,20 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
 	       && !useless_type_conversion_p (vectype, rhs_vectype)))
     return false;
 
+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+    {
+      /* Check that mask conjuction is supported.  */
+      optab tab;
+      tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
+      if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: unsupported mask operation\n");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	}
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
@@ -1806,6 +1903,15 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
 			       NULL, NULL, NULL);
       else
 	vect_model_load_cost (stmt_info, ncopies, false, NULL, NULL, NULL);
+
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	{
+	  if (is_store)
+	    vect_model_store_masking_cost (stmt_info, ncopies);
+	  else
+	    vect_model_load_masking_cost (stmt_info, ncopies);
+	}
+
       return true;
     }
 
@@ -2802,6 +2908,18 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi,
   if (slp_node)
     return false;
 
+  /* Masked clones are not yet supported.  But we allow
+     calls which may be just called with no mask.  */
+  if (!(gimple_call_flags (stmt) & ECF_CONST)
+      || (gimple_call_flags (stmt) & ECF_LOOPING_CONST_OR_PURE))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: non-const call "
+			 "(masked calls are not supported)\n");
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+    }
+
   /* Process function arguments.  */
   nargs = gimple_call_num_args (stmt);
 
@@ -5340,6 +5458,14 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 				 "negative step and reversing not supported.\n");
 	      return false;
 	    }
+	  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	    {
+	      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "cannot be masked: negative step"
+				 " is not supported.");
+	    }
 	}
     }
 
@@ -5348,6 +5474,16 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       grouped_store = true;
       first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
       group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: grouped access"
+			     " is not supported." );
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+      }
+
       if (!slp && !STMT_VINFO_STRIDED_P (stmt_info))
 	{
 	  if (vect_store_lanes_supported (vectype, group_size))
@@ -5401,6 +5537,44 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
                              "scatter index use not simple.");
 	  return false;
 	}
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: gather/scatter is"
+			     " not supported.");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	}
+    }
+
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+      && STMT_VINFO_STRIDED_P (stmt_info))
+    {
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: strided store is not"
+			 " supported.\n");
+    }
+
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+      && integer_zerop (nested_in_vect_loop_p (loop, stmt)
+			? STMT_VINFO_DR_STEP (stmt_info)
+			: DR_STEP (dr)))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: invariant store.\n");
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+    }
+
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+      && !can_mask_load_store (stmt))
+    {
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: unsupported mask store.\n");
     }
 
   if (!vec_stmt) /* transformation not required.  */
@@ -5410,6 +5584,9 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       if (!PURE_SLP_STMT (stmt_info))
 	vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt,
 			       NULL, NULL, NULL);
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	vect_model_store_masking_cost (stmt_info, ncopies);
+
       return true;
     }
 
@@ -6315,6 +6492,15 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       grouped_load = true;
       /* FORNOW */
       gcc_assert (!nested_in_vect_loop && !STMT_VINFO_GATHER_SCATTER_P (stmt_info));
+      /* Not yet supported.  */
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: grouped access is not"
+			     " supported.");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+      }
 
       first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
       group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
@@ -6368,6 +6554,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 	    }
 
 	  LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true;
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
 	}
 
       if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
@@ -6421,6 +6608,16 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       gather_decl = vect_check_gather_scatter (stmt, loop_vinfo, &gather_base,
 					       &gather_off, &gather_scale);
       gcc_assert (gather_decl);
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			    "cannot be masked: gather/scatter is not"
+			    " supported.\n");
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	}
+
+
       if (!vect_is_simple_use (gather_off, vinfo, &def_stmt, &gather_dt,
 			       &gather_off_vectype))
 	{
@@ -6432,6 +6629,15 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
     }
   else if (STMT_VINFO_STRIDED_P (stmt_info))
     {
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	{
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "cannot be masked: strided load is not"
+			     " supported.\n");
+	}
+
       if (grouped_load
 	  && slp
 	  && (group_size > nunits
@@ -6483,9 +6689,35 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
                                  "\n");
 	      return false;
 	    }
+	  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			        "cannot be masked: negative step "
+				 "for masking.\n");
+	      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	    }
 	}
     }
 
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+      && integer_zerop (nested_in_vect_loop
+			? STMT_VINFO_DR_STEP (stmt_info)
+			: DR_STEP (dr)))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "allow invariant load for masked loop.\n");
+    }
+  else if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+	   && !can_mask_load_store (stmt))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "cannot be masked: unsupported masked load.\n");
+      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
@@ -6493,6 +6725,9 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       if (!PURE_SLP_STMT (stmt_info))
 	vect_model_load_cost (stmt_info, ncopies, load_lanes_p,
 			      NULL, NULL, NULL);
+      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+	vect_model_load_masking_cost (stmt_info, ncopies);
+
       return true;
     }
 
@@ -7889,6 +8124,43 @@ vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* Return true if vector version of STMT should be masked
+   in a vectorized loop epilogue (considering usage of the
+   same VF as for main loop).  */
+
+static bool
+vect_stmt_should_be_masked_for_epilogue (gimple *stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+
+  /* We should mask all statements accessing memory.  */
+  if (STMT_VINFO_DATA_REF (stmt_info))
+    return true;
+
+  /* We should also mask all recursions.  */
+  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
+      || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
+    return true;
+
+  return false;
+}
+
+/* Add a mask required to mask STMT to LOOP_VINFO_REQUIRED_MASKS.  */
+
+static void
+vect_add_required_mask_for_stmt (gimple *stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  unsigned HOST_WIDE_INT nelems = TYPE_VECTOR_SUBPARTS (vectype);
+  int bit_no = exact_log2 (nelems);
+
+  gcc_assert (bit_no >= 0);
+
+  LOOP_VINFO_REQUIRED_MASKS (loop_vinfo) |= (1 << bit_no);
+}
+
 /* Make sure the statement is vectorizable.  */
 
 bool
@@ -7896,6 +8168,7 @@ vect_analyze_stmt (gimple *stmt, bool *need_to_vectorize, slp_tree node)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   enum vect_relevant relevance = STMT_VINFO_RELEVANT (stmt_info);
   bool ok;
   tree scalar_type, vectype;
@@ -8062,6 +8335,10 @@ vect_analyze_stmt (gimple *stmt, bool *need_to_vectorize, slp_tree node)
       STMT_VINFO_VECTYPE (stmt_info) = vectype;
    }
 
+  /* Masking is not supported for SLP yet.  */
+  if (loop_vinfo && node)
+    LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+
   if (STMT_VINFO_RELEVANT_P (stmt_info))
     {
       gcc_assert (!VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))));
@@ -8121,6 +8398,26 @@ vect_analyze_stmt (gimple *stmt, bool *need_to_vectorize, slp_tree node)
       return false;
     }
 
+  if (loop_vinfo
+      && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+    {
+      /* Currently we have real masking for loads and stores only.
+	 We can't mask loop which has other statements which may
+	 trap.  */
+      if (gimple_could_trap_p_1 (stmt, false, false))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "cannot be masked: unsupported trapping stmt: ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	    }
+	  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+	}
+      else if (vect_stmt_should_be_masked_for_epilogue (stmt))
+	vect_add_required_mask_for_stmt (stmt);
+    }
+
   if (bb_vinfo)
     return true;
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 8a61690..4d13c41 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1033,6 +1033,9 @@ extern void vect_model_store_cost (stmt_vec_info, int, bool,
 extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
 				  stmt_vector_for_cost *,
 				  stmt_vector_for_cost *);
+extern void vect_model_load_masking_cost (stmt_vec_info, int);
+extern void vect_model_store_masking_cost (stmt_vec_info, int);
+extern void vect_model_simple_masking_cost (stmt_vec_info, int);
 extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
 				  enum vect_cost_for_stmt, stmt_vec_info,
 				  int, enum vect_cost_model_location);