From patchwork Fri Jun 12 12:25:06 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 483535
Return-Path: 
 <gcc-patches-return-400396-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 8A5BC1402B4
	for <incoming@patchwork.ozlabs.org>;
	Fri, 12 Jun 2015 22:25:23 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=qN2RiAVu; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:reply-to:mime-version
	:content-type; q=dns; s=default; b=VJlUKdPa8l63gwmDAdHkwFZ1zUYh9
	4NTN2dut5d7Wmdc601MLdyWN2DGCfu8pI+DzQdifhHz8/xngT3dbyOI8Q8BGOc0S
	7aCCmKA822KxlMhTsxc6XzMpyOZpEBM8l4o3lqHWyUpYthlAto/EJIAMz9DOBger
	VtMbe5mS34gScM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:reply-to:mime-version
	:content-type; s=default; bh=z8d3O3/y/6uzklfjOD4Cx/q96ng=; b=qN2
	RiAVusj2RkcowQBClEJdTmxWUJvhiZHbLexenHnitF+Xr5VY6NsPrXe9Kcd3sxJi
	t5jW9+5XClXYQQ9MTJP4R+iCtRNDTV7wTUzGFSuAzGT2s1ItH5qmfg7DssqMlFXR
	NHwTlAG74eh0QgCTFuBHSnMxR5z8t4wgy7O3hnOI=
Received: (qmail 98953 invoked by alias); 12 Jun 2015 12:25:15 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 98944 invoked by uid 89); 12 Jun 2015 12:25:14 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL, BAYES_50,
	KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_PASS,
	T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-GCM-SHA384 encrypted) ESMTPS;
	Fri, 12 Jun 2015 12:25:13 +0000
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
	(int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])	by
	mx1.redhat.com (Postfix) with ESMTPS id 20B0C37C805;
	Fri, 12 Jun 2015 12:25:12 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-116-82.ams2.redhat.com
	[10.36.116.82])	by int-mx14.intmail.prod.int.phx2.redhat.com
	(8.14.4/8.14.4) with ESMTP id t5CCP9x7025919	(version=TLSv1/SSLv3
	cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
	Fri, 12 Jun 2015 08:25:11 -0400
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz
	(8.14.9/8.14.9) with ESMTP id t5CCP8Jj012032;
	Fri, 12 Jun 2015 14:25:08 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.14.9/8.14.9/Submit)
	id t5CCP6dg012031; Fri, 12 Jun 2015 14:25:06 +0200
Date: Fri, 12 Jun 2015 14:25:06 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: gcc-patches@gcc.gnu.org
Cc: "Verbin, Ilya" <ilya.verbin@intel.com>, "Yukhin,
	Kirill" <kirill.yukhin@intel.com>
Subject: [gomp4.1] Parsing of schedule(simd:...)
Message-ID: <20150612122506.GQ10247@tucnak.redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
X-IsSubscribed: yes

Hi!

I've committed following patch to add C/C++ parsing of simd: schedule
clause modifier, and a very rough implementation of it for
schedule with chunk and dynamic schedule kinds.
No idea what to do about runtime schedule, because there we don't pass
a chunk size to the library routine.
And for nochunk static it will need more work (well, for chunk static
likely too).  Best would be to arrange for the vectorizer to be able to
communicate its decisions back into the schedule static decisions
- the spec allows the first iteration to have even more than chunk_size
rounded up to a multiple of (estimated) vectorization factor, so best would
be if we e.g. decide to peel the loop for alignment etc. to schedule those
iterations in the first thread and then full portion of chunk_size rounded
up to vf, then second up to (last - 1)th thread doing anything always
run exactly chunk_size rounded up to vf iterations and last iteration doing
what is left.  Any help with that would be appreciated.

Also, not sure if we shouldn't replace here omp_max_vf with the
OMP_CLAUSE_SIMDLEN value if specified, that is the desired vectorization
factor, so perhaps it is enough to use that.  Also, omp_max_vf might be too
high, it assumes the loop might contain some QImode types that would need
vectorization, while if it is e.g. fully SImode+, the guess will be 4x
higher than needed.  Perhaps walk the loop and collect narrowest type used
in there?

2015-06-12  Jakub Jelinek  <jakub@redhat.com>

	* tree.h (OMP_CLAUSE_SCHEDULE_SIMD): Define.
	* omp-low.c (struct omp_for_data): Add simd_schedule field.
	(extract_omp_for_data): Initialize it.
	(omp_adjust_chunk_size): New function.
	(get_ws_args_for, expand_omp_for_generic,
	expand_omp_for_static_chunk): Use it.
	* tree-pretty-print.c (dump_omp_clause): Print simd: modifier
	on OMP_CLAUSE_SCHEDULE.
c-family/
	* c-omp.c (c_omp_split_clauses): Clear OMP_CLAUSE_SCHEDULE_SIMD
	when not combined with simd construct.
c/
	* c-parser.c (c_parser_omp_clause_schedule): Parse optional
	simd: modifier in schedule clause.
cp/
	* parser.c (cp_parser_omp_clause_schedule): Parse optional
	simd: modifier in schedule clause.
testsuite/
	* c-c++-common/gomp/schedule-simd-1.c: New test.


	Jakub

--- gcc/tree.h.jj	2015-06-11 14:36:37.000000000 +0200
+++ gcc/tree.h	2015-06-11 18:22:28.413686564 +0200
@@ -1526,6 +1526,10 @@ extern void protected_set_expr_location
 #define OMP_CLAUSE_SCHEDULE_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE)->omp_clause.subcode.schedule_kind)
 
+/* True if a SCHEDULE clause has the simd modifier on it.  */
+#define OMP_CLAUSE_SCHEDULE_SIMD(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE)->base.public_flag)
+
 #define OMP_CLAUSE_DEFAULT_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEFAULT)->omp_clause.subcode.default_kind)
 
--- gcc/omp-low.c.jj	2015-06-11 11:35:02.000000000 +0200
+++ gcc/omp-low.c	2015-06-12 12:23:06.857019167 +0200
@@ -251,7 +251,7 @@ struct omp_for_data
   gomp_for *for_stmt;
   tree pre, iter_type;
   int collapse;
-  bool have_nowait, have_ordered;
+  bool have_nowait, have_ordered, simd_schedule;
   enum omp_clause_schedule_kind sched_kind;
   struct omp_for_data_loop *loops;
 };
@@ -514,6 +514,7 @@ extract_omp_for_data (gomp_for *for_stmt
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  fd->simd_schedule = false;
   if (gimple_omp_for_kind (fd->for_stmt) == GF_OMP_FOR_KIND_CILKFOR)
     fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
@@ -532,6 +533,7 @@ extract_omp_for_data (gomp_for *for_stmt
 	gcc_assert (!distribute && !taskloop);
 	fd->sched_kind = OMP_CLAUSE_SCHEDULE_KIND (t);
 	fd->chunk_size = OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (t);
+	fd->simd_schedule = OMP_CLAUSE_SCHEDULE_SIMD (t);
 	break;
       case OMP_CLAUSE_DIST_SCHEDULE:
 	gcc_assert (distribute);
@@ -870,6 +872,29 @@ workshare_safe_to_combine_p (basic_block
 }
 
 
+static int omp_max_vf (void);
+
+/* Adjust CHUNK_SIZE from SCHEDULE clause, depending on simd modifier
+   presence (SIMD_SCHEDULE).  */
+
+static tree
+omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
+{
+  if (!simd_schedule)
+    return chunk_size;
+
+  int vf = omp_max_vf ();
+  if (vf == 1)
+    return chunk_size;
+
+  tree type = TREE_TYPE (chunk_size);
+  chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size,
+			    build_int_cst (type, vf - 1));
+  return fold_build2 (BIT_AND_EXPR, type, chunk_size,
+		      build_int_cst (type, -vf));
+}
+
+
 /* Collect additional arguments needed to emit a combined
    parallel+workshare call.  WS_STMT is the workshare directive being
    expanded.  */
@@ -917,6 +942,7 @@ get_ws_args_for (gimple par_stmt, gimple
       if (fd.chunk_size)
 	{
 	  t = fold_convert_loc (loc, long_integer_type_node, fd.chunk_size);
+	  t = omp_adjust_chunk_size (t, fd.simd_schedule);
 	  ws_args->quick_push (t);
 	}
 
@@ -7019,6 +7045,7 @@ expand_omp_for_generic (struct omp_regio
 	  if (fd->chunk_size)
 	    {
 	      t = fold_convert (fd->iter_type, fd->chunk_size);
+	      t = omp_adjust_chunk_size (t, fd->simd_schedule);
 	      t = build_call_expr (builtin_decl_explicit (start_fn),
 				   6, t0, t1, t2, t, t3, t4);
 	    }
@@ -7044,6 +7071,7 @@ expand_omp_for_generic (struct omp_regio
 	    {
 	      tree bfn_decl = builtin_decl_explicit (start_fn);
 	      t = fold_convert (fd->iter_type, fd->chunk_size);
+	      t = omp_adjust_chunk_size (t, fd->simd_schedule);
 	      t = build_call_expr (bfn_decl, 7, t5, t0, t1, t2, t, t3, t4);
 	    }
 	  else
@@ -7830,9 +7858,11 @@ expand_omp_for_static_chunk (struct omp_
 				 true, NULL_TREE, true, GSI_SAME_STMT);
   step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
 				   true, NULL_TREE, true, GSI_SAME_STMT);
-  fd->chunk_size
-    = force_gimple_operand_gsi (&gsi, fold_convert (itype, fd->chunk_size),
-				true, NULL_TREE, true, GSI_SAME_STMT);
+  tree chunk_size = fold_convert (itype, fd->chunk_size);
+  chunk_size = omp_adjust_chunk_size (chunk_size, fd->simd_schedule);
+  chunk_size
+    = force_gimple_operand_gsi (&gsi, chunk_size, true, NULL_TREE, true,
+				GSI_SAME_STMT);
 
   t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
   t = fold_build2 (PLUS_EXPR, itype, step, t);
@@ -7866,7 +7896,7 @@ expand_omp_for_static_chunk (struct omp_
     = gimple_build_assign (trip_init, build_int_cst (itype, 0));
   gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
 
-  t = fold_build2 (MULT_EXPR, itype, threadid, fd->chunk_size);
+  t = fold_build2 (MULT_EXPR, itype, threadid, chunk_size);
   t = fold_build2 (MULT_EXPR, itype, t, step);
   if (POINTER_TYPE_P (type))
     t = fold_build_pointer_plus (n1, t);
@@ -7883,11 +7913,11 @@ expand_omp_for_static_chunk (struct omp_
 
   t = fold_build2 (MULT_EXPR, itype, trip_main, nthreads);
   t = fold_build2 (PLUS_EXPR, itype, t, threadid);
-  t = fold_build2 (MULT_EXPR, itype, t, fd->chunk_size);
+  t = fold_build2 (MULT_EXPR, itype, t, chunk_size);
   s0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 				 false, GSI_CONTINUE_LINKING);
 
-  t = fold_build2 (PLUS_EXPR, itype, s0, fd->chunk_size);
+  t = fold_build2 (PLUS_EXPR, itype, s0, chunk_size);
   t = fold_build2 (MIN_EXPR, itype, t, n);
   e0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 				 false, GSI_CONTINUE_LINKING);
--- gcc/tree-pretty-print.c.jj	2015-06-11 14:43:37.000000000 +0200
+++ gcc/tree-pretty-print.c	2015-06-11 18:25:40.975760680 +0200
@@ -439,6 +439,8 @@ dump_omp_clause (pretty_printer *pp, tre
 
     case OMP_CLAUSE_SCHEDULE:
       pp_string (pp, "schedule(");
+      if (OMP_CLAUSE_SCHEDULE_SIMD (clause))
+	pp_string (pp, "simd:");
       switch (OMP_CLAUSE_SCHEDULE_KIND (clause))
 	{
 	case OMP_CLAUSE_SCHEDULE_STATIC:
--- gcc/c-family/c-omp.c.jj	2015-06-08 10:50:52.000000000 +0200
+++ gcc/c-family/c-omp.c	2015-06-11 20:07:49.845720479 +0200
@@ -766,10 +766,14 @@ c_omp_split_clauses (location_t loc, enu
 	  s = C_OMP_CLAUSE_SPLIT_PARALLEL;
 	  break;
 	case OMP_CLAUSE_ORDERED:
-	case OMP_CLAUSE_SCHEDULE:
 	case OMP_CLAUSE_NOWAIT:
 	  s = C_OMP_CLAUSE_SPLIT_FOR;
 	  break;
+	case OMP_CLAUSE_SCHEDULE:
+	  s = C_OMP_CLAUSE_SPLIT_FOR;
+	  if (code != OMP_SIMD)
+	    OMP_CLAUSE_SCHEDULE_SIMD (clauses) = 0;
+	  break;
 	case OMP_CLAUSE_SAFELEN:
 	case OMP_CLAUSE_SIMDLEN:
 	case OMP_CLAUSE_LINEAR:
--- gcc/c/c-parser.c.jj	2015-06-11 17:00:21.000000000 +0200
+++ gcc/c/c-parser.c	2015-06-11 18:41:48.136095564 +0200
@@ -11112,7 +11112,13 @@ c_parser_omp_clause_reduction (c_parser
 
    schedule-kind:
      static | dynamic | guided | runtime | auto
-*/
+
+   OpenMP 4.1:
+   schedule ( schedule-modifier : schedule-kind )
+   schedule ( schedule-modifier : schedule-kind , expression )
+
+   schedule-modifier:
+     simd  */
 
 static tree
 c_parser_omp_clause_schedule (c_parser *parser, tree list)
@@ -11127,6 +11133,19 @@ c_parser_omp_clause_schedule (c_parser *
 
   if (c_parser_next_token_is (parser, CPP_NAME))
     {
+      tree kind = c_parser_peek_token (parser)->value;
+      const char *p = IDENTIFIER_POINTER (kind);
+      if (strcmp ("simd", p) == 0
+	  && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+	{
+	  OMP_CLAUSE_SCHEDULE_SIMD (c) = 1;
+	  c_parser_consume_token (parser);
+	  c_parser_consume_token (parser);
+	}
+    }
+
+  if (c_parser_next_token_is (parser, CPP_NAME))
+    {
       tree kind = c_parser_peek_token (parser)->value;
       const char *p = IDENTIFIER_POINTER (kind);
 
--- gcc/cp/parser.c.jj	2015-06-11 16:59:24.000000000 +0200
+++ gcc/cp/parser.c	2015-06-11 18:42:54.267093129 +0200
@@ -28707,7 +28707,14 @@ cp_parser_omp_clause_reduction (cp_parse
    schedule ( schedule-kind , expression )
 
    schedule-kind:
-     static | dynamic | guided | runtime | auto  */
+     static | dynamic | guided | runtime | auto
+
+   OpenMP 4.1:
+   schedule ( schedule-modifier : schedule-kind )
+   schedule ( schedule-modifier : schedule-kind , expression )
+
+   schedule-modifier:
+     simd  */
 
 static tree
 cp_parser_omp_clause_schedule (cp_parser *parser, tree list, location_t location)
@@ -28721,6 +28728,19 @@ cp_parser_omp_clause_schedule (cp_parser
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
     {
+      tree id = cp_lexer_peek_token (parser->lexer)->u.value;
+      const char *p = IDENTIFIER_POINTER (id);
+      if (strcmp ("simd", p) == 0
+	  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_COLON))
+	{
+	  OMP_CLAUSE_SCHEDULE_SIMD (c) = 1;
+	  cp_lexer_consume_token (parser->lexer);
+	  cp_lexer_consume_token (parser->lexer);
+	}
+    }
+
+  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
+    {
       tree id = cp_lexer_peek_token (parser->lexer)->u.value;
       const char *p = IDENTIFIER_POINTER (id);
 
--- gcc/testsuite/c-c++-common/gomp/schedule-simd-1.c.jj	2015-06-12 12:49:39.030398681 +0200
+++ gcc/testsuite/c-c++-common/gomp/schedule-simd-1.c	2015-06-12 12:49:25.000000000 +0200
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O2" } */
+/* { dg-additional-options "-mavx512f" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 1024
+int a[N], b[N], c[N];
+
+void
+f1 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd:static)
+  for (i = 0; i < N; i++)
+    a[i] = b[i] + c[i];
+}
+
+void
+f2 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd: static, 7)
+  for (i = 0; i < N; i++)
+    a[i] = b[i] + c[i];
+}
+
+void
+f3 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd : dynamic, 7)
+  for (i = 0; i < N; i++)
+    a[i] = b[i] + c[i];
+}
+
+void
+f4 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule ( simd:runtime)
+  for (i = 0; i < N; i++)
+    a[i] = b[i] + c[i];
+}
+
+void
+f5 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd:auto)
+  for (i = 0; i < N; i++)
+    a[i] = b[i] + c[i];
+}