From patchwork Tue Jul 10 08:22:50 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Henderson <rth@redhat.com>
X-Patchwork-Id: 170081
Return-Path: 
 <gcc-patches-return-322333-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id A5D562C0081
	for <incoming@patchwork.ozlabs.org>;
	Tue, 10 Jul 2012 18:25:54 +1000 (EST)
Comment: DKIM? See http://www.dkim.org
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;
	d=gcc.gnu.org; s=default; x=1342513555; h=Comment:
	DomainKey-Signature:Received:Received:Received:Received:Received:
	Received:From:To:Subject:Date:Message-Id:In-Reply-To:References:
	Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:
	List-Post:List-Help:Sender:Delivered-To; bh=oEg62iorOBqfGXLlfGbj
	GJ6+okw=; b=I57JqZRxfAWydzkId9ti12LGhU9TlNHGE83VcWBvC60s9B5ICusX
	qkMmkjW7a3+E0Twz3BXkEE4SrYFHrqozNpvjw1vrzEmbQzFYu0LnFToCAoRBtHqB
	irvTAVc4YOC4dzeCRiVJkU5joadE7zoKiWxoNkCUUmqQQg5zwCAgo5s=
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:From:To:Subject:Date:Message-Id:In-Reply-To:References:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=EzGFw/g9trsLcnAVzzOCH2tFNoJz7EZDiCFKtn0izol1tauR4L39Dvq6tB5OI0
	K0s8V/nk//PGC9x7remFuKOqHOqfWf9w8Wcx7SzolLlhbkVGaLxPrj2oXGzztTyI
	8kRj5ezOG68fETGQNy1JqCR8h/qly8Y6/OLyDC9XTEFMg=;
Received: (qmail 32314 invoked by alias); 10 Jul 2012 08:23:39 -0000
Received: (qmail 32087 invoked by uid 22791); 10 Jul 2012 08:23:32 -0000
X-SWARE-Spam-Status: No, hits=-5.4 required=5.0	tests=AWL, BAYES_00,
	DKIM_SIGNED, DKIM_VALID, FREEMAIL_ENVFROM_END_DIGIT,
	FREEMAIL_FROM, KHOP_RCVD_TRUST, KHOP_THREADED,
	RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_TM
X-Spam-Check-By: sourceware.org
Received: from mail-wg0-f51.google.com (HELO mail-wg0-f51.google.com)
	(74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Tue, 10 Jul 2012 08:23:09 +0000
Received: by wgbed3 with SMTP id ed3so254362wgb.8 for
	<gcc-patches@gcc.gnu.org>; Tue, 10 Jul 2012 01:23:08 -0700 (PDT)
Received: by 10.180.81.165 with SMTP id b5mr36091849wiy.17.1341908588572;
	Tue, 10 Jul 2012 01:23:08 -0700 (PDT)
Received: from pebble.cz (vpn-konference.ms.mff.cuni.cz. [195.113.20.101])
	by mx.google.com with ESMTPS id
	ch9sm41442328wib.8.2012.07.10.01.23.07 (version=TLSv1/SSLv3
	cipher=OTHER); Tue, 10 Jul 2012 01:23:07 -0700 (PDT)
From: Richard Henderson <rth@redhat.com>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 6/7] Use VEC_WIDEN_MULT_EVEN/ODD_EXPR in
	supportable_widening_operation
Date: Tue, 10 Jul 2012 10:22:50 +0200
Message-Id: <1341908571-30346-7-git-send-email-rth@redhat.com>
In-Reply-To: <1341908571-30346-1-git-send-email-rth@redhat.com>
References: <1341908571-30346-1-git-send-email-rth@redhat.com>
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

* tree-vect-stmts.c (supportable_widening_operation): Expand
        WIDEN_MULT_EXPR via VEC_WIDEN_MULT_EVEN/ODD_EXPR if possible.
---
 gcc/ChangeLog         |    3 ++
 gcc/tree-vect-stmts.c |   96 +++++++++++++++++++++++++------------------------
 2 files changed, 53 insertions(+), 46 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9caf1c6..fe6a997 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6199,7 +6199,8 @@ vect_is_simple_use_1 (tree operand, gimple stmt, loop_vec_info loop_vinfo,
 bool
 supportable_widening_operation (enum tree_code code, gimple stmt,
 				tree vectype_out, tree vectype_in,
-                                tree *decl1, tree *decl2,
+                                tree *decl1 ATTRIBUTE_UNUSED,
+				tree *decl2 ATTRIBUTE_UNUSED,
                                 enum tree_code *code1, enum tree_code *code2,
                                 int *multi_step_cvt,
                                 VEC (tree, heap) **interm_types)
@@ -6207,7 +6208,6 @@ supportable_widening_operation (enum tree_code code, gimple stmt,
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *vect_loop = NULL;
-  bool ordered_p;
   enum machine_mode vec_mode;
   enum insn_code icode1, icode2;
   optab optab1, optab2;
@@ -6223,56 +6223,60 @@ supportable_widening_operation (enum tree_code code, gimple stmt,
   if (loop_info)
     vect_loop = LOOP_VINFO_LOOP (loop_info);
 
-  /* The result of a vectorized widening operation usually requires two vectors
-     (because the widened results do not fit into one vector). The generated
-     vector results would normally be expected to be generated in the same
-     order as in the original scalar computation, i.e. if 8 results are
-     generated in each vector iteration, they are to be organized as follows:
-        vect1: [res1,res2,res3,res4], vect2: [res5,res6,res7,res8].
-
-     However, in the special case that the result of the widening operation is
-     used in a reduction computation only, the order doesn't matter (because
-     when vectorizing a reduction we change the order of the computation).
-     Some targets can take advantage of this and generate more efficient code.
-     For example, targets like Altivec, that support widen_mult using a sequence
-     of {mult_even,mult_odd} generate the following vectors:
-        vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].
-
-     When vectorizing outer-loops, we execute the inner-loop sequentially
-     (each vectorized inner-loop iteration contributes to VF outer-loop
-     iterations in parallel).  We therefore don't allow to change the order
-     of the computation in the inner-loop during outer-loop vectorization.  */
-
-   if (vect_loop
-       && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
-       && !nested_in_vect_loop_p (vect_loop, stmt))
-     ordered_p = false;
-   else
-     ordered_p = true;
-
-  if (!ordered_p
-      && code == WIDEN_MULT_EXPR
-      && targetm.vectorize.builtin_mul_widen_even
-      && targetm.vectorize.builtin_mul_widen_even (vectype)
-      && targetm.vectorize.builtin_mul_widen_odd
-      && targetm.vectorize.builtin_mul_widen_odd (vectype))
-    {
-      if (vect_print_dump_info (REPORT_DETAILS))
-        fprintf (vect_dump, "Unordered widening operation detected.");
-
-      *code1 = *code2 = CALL_EXPR;
-      *decl1 = targetm.vectorize.builtin_mul_widen_even (vectype);
-      *decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype);
-      return true;
-    }
-
   switch (code)
     {
     case WIDEN_MULT_EXPR:
+      /* The result of a vectorized widening operation usually requires
+	 two vectors (because the widened results do not fit into one vector).
+	 The generated vector results would normally be expected to be
+	 generated in the same order as in the original scalar computation,
+	 i.e. if 8 results are generated in each vector iteration, they are
+	 to be organized as follows:
+		vect1: [res1,res2,res3,res4],
+		vect2: [res5,res6,res7,res8].
+
+	 However, in the special case that the result of the widening
+	 operation is used in a reduction computation only, the order doesn't
+	 matter (because when vectorizing a reduction we change the order of
+	 the computation).  Some targets can take advantage of this and
+	 generate more efficient code.  For example, targets like Altivec,
+	 that support widen_mult using a sequence of {mult_even,mult_odd}
+	 generate the following vectors:
+		vect1: [res1,res3,res5,res7],
+		vect2: [res2,res4,res6,res8].
+
+	 When vectorizing outer-loops, we execute the inner-loop sequentially
+	 (each vectorized inner-loop iteration contributes to VF outer-loop
+	 iterations in parallel).  We therefore don't allow to change the
+	 order of the computation in the inner-loop during outer-loop
+	 vectorization.  */
+      /* TODO: Another case in which order doesn't *really* matter is when we
+	 widen and then contract again, e.g. (short)((int)x * y >> 8).
+	 Normally, pack_trunc performs an even/odd permute, whereas the 
+	 repack from an even/odd expansion would be an interleave, which
+	 would be significantly simpler for e.g. AVX2.  */
+      /* In any case, in order to avoid duplicating the code below, recurse
+	 on VEC_WIDEN_MULT_EVEN_EXPR.  If it succeeds, all the return values
+	 are properly set up for the caller.  If we fail, we'll continue with
+	 a VEC_WIDEN_MULT_LO/HI_EXPR check.  */
+      if (vect_loop
+	  && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
+	  && !nested_in_vect_loop_p (vect_loop, stmt)
+	  && supportable_widening_operation (VEC_WIDEN_MULT_EVEN_EXPR,
+					     stmt, vectype_out, vectype_in,
+					     NULL, NULL, code1, code2,
+					     multi_step_cvt, interm_types))
+	return true;
       c1 = VEC_WIDEN_MULT_LO_EXPR;
       c2 = VEC_WIDEN_MULT_HI_EXPR;
       break;
 
+    case VEC_WIDEN_MULT_EVEN_EXPR:
+      /* Support the recursion induced just above.  */
+      c1 = VEC_WIDEN_MULT_EVEN_EXPR;
+      c2 = VEC_WIDEN_MULT_ODD_EXPR;
+      break;
+
     case WIDEN_LSHIFT_EXPR:
       c1 = VEC_WIDEN_LSHIFT_LO_EXPR;
       c2 = VEC_WIDEN_LSHIFT_HI_EXPR;
@@ -6298,7 +6302,7 @@ supportable_widening_operation (enum tree_code code, gimple stmt,
       gcc_unreachable ();
     }
 
-  if (BYTES_BIG_ENDIAN)
+  if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
     {
       enum tree_code ctmp = c1;
       c1 = c2;