From patchwork Mon Feb 15 18:44:12 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Monakov <amonakov@ispras.ru>
X-Patchwork-Id: 583065
Return-Path: 
 <gcc-patches-return-421456-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id B36B7140761
	for <incoming@patchwork.ozlabs.org>;
	Tue, 16 Feb 2016 05:46:01 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=NN6U9pKp; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:in-reply-to:references; q=dns; s=
	default; b=rUbTsTGY4/6mGKfzZF0wHzoWodQEfLxnb3GuZ+9vPMw3ofOrTU/52
	q4rjOMQTpf+SaaXc+2vOUar1eR+t4TkVASKsiEMrH3lOeYtx/7klQQqeFrsfuKPY
	GIXDYlG+ZJXEntdTRvJbEta9WxNKvJ8aEDZrnnkWHhUBPsC3FjfMp8=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:in-reply-to:references; s=default;
	bh=n7V+vLmLahOUEq2SGxE6HmKAUvA=; b=NN6U9pKpzvq3W3vbwioaIpn3Yk+m
	bDCG0Mxx3ooBo6LQzYmsc+EnY53QDaqaZyhogqlSQ8XLugVJ1iEBzqtA+/aUe72G
	RfvNf3xFcFdD+im56Hp5Dy+V78SJeFCYBQz508VRNNyrwd2hXRz4JsCO6HMLAz8A
	KAjQwoXBFo/rWzA=
Received: (qmail 31578 invoked by alias); 15 Feb 2016 18:44:30 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 31524 invoked by uid 89); 15 Feb 2016 18:44:29 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.5 required=5.0 tests=AWL, BAYES_00,
	KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2
	spammy=HX-HELO:eggs.gnu.org,
	Hx-spam-relays-external:208.118.235.92, H*RU:208.118.235.92
X-HELO: eggs.gnu.org
Received: from eggs.gnu.org (HELO eggs.gnu.org) (208.118.235.92) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-SHA encrypted) ESMTPS; Mon, 15 Feb 2016 18:44:21 +0000
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim
	4.71)	(envelope-from <monoid@ispras.ru>)	id
	1aVO89-0003IK-Er	for gcc-patches@gcc.gnu.org;
	Mon, 15 Feb 2016 13:44:19 -0500
Received: from smtp.ispras.ru ([83.149.199.79]:35303)	by eggs.gnu.org with
	esmtp (Exim 4.71)	(envelope-from <monoid@ispras.ru>)	id
	1aVO89-0003HF-1B	for gcc-patches@gcc.gnu.org;
	Mon, 15 Feb 2016 13:44:17 -0500
Received: from condor.intra.ispras.ru (unknown [83.149.199.91])	by
	smtp.ispras.ru (Postfix) with ESMTP id 2A783226C0	for
	<gcc-patches@gcc.gnu.org>; Mon, 15 Feb 2016 21:44:13 +0300 (MSK)
Received: by condor.intra.ispras.ru (Postfix, from userid 23246)	id
	182B71225E36; Mon, 15 Feb 2016 21:44:13 +0300 (MSK)
From: Alexander Monakov <amonakov@ispras.ru>
To: gcc-patches@gcc.gnu.org
Subject: [gomp-nvptx 5/5] libgomp plugin: manage soft-stack storage
Date: Mon, 15 Feb 2016 21:44:12 +0300
Message-Id: <1455561852-9237-6-git-send-email-amonakov@ispras.ru>
In-Reply-To: <1455561852-9237-1-git-send-email-amonakov@ispras.ru>
References: <1455561852-9237-1-git-send-email-amonakov@ispras.ru>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 83.149.199.79
X-IsSubscribed: yes

This patch implements the libgomp plugin part of the transition to
host-allocated soft stacks.  For now only a simple scheme with
allocation/deallocation per launch is implemented; a followup change is
planned to cache and reuse allocations when appropriate.

The call to cuLaunchKernel is changed to pass kernel entry function arguments
in a way that allows the driver to check for mismatch (but only when the
cumulative size of passed arguments is different).

	* plugin/plugin-nvptx.c (nvptx_stacks_size): New.
	(nvptx_stacks_alloc): New.
	(nvptx_stacks_free): New.
	(GOMP_OFFLOAD_run): Allocate soft-stacks storage from the host using
	the above new functions.  Use kernel launch interface that allows
	checking for mismatched total size of entry function arguments.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index cb6a3ac..adf57b1 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1892,6 +1892,37 @@ nvptx_adjust_launch_bounds (struct targ_fn_descriptor *fn,
     *teams_p = max_blocks;
 }
 
+/* Return the size of per-warp stacks (see gcc -msoft-stack) to use for OpenMP
+   target regions.  */
+
+static size_t
+nvptx_stacks_size ()
+{
+  return 128 * 1024;
+}
+
+/* Return contiguous storage for NUM stacks, each SIZE bytes.  */
+
+static void *
+nvptx_stacks_alloc (size_t size, int num)
+{
+  CUdeviceptr stacks;
+  CUresult r = cuMemAlloc (&stacks, size * num);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemAlloc error: %s", cuda_error (r));
+  return (void *) stacks;
+}
+
+/* Release storage previously allocated by nvptx_stacks_alloc.  */
+
+static void
+nvptx_stacks_free (void *p, int num)
+{
+  CUresult r = cuMemFree ((CUdeviceptr) p);
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuMemFree error: %s", cuda_error (r));
+}
+
 void
 GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
 {
@@ -1899,7 +1930,6 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
   CUresult r;
   struct ptx_device *ptx_dev = ptx_devices[ord];
   const char *maybe_abort_msg = "(perhaps abort was called)";
-  void *fn_args = &tgt_vars;
   int teams = 0, threads = 0;
 
   if (!args)
@@ -1922,10 +1952,19 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
     }
   nvptx_adjust_launch_bounds (tgt_fn, ptx_dev, &teams, &threads);
 
+  size_t stack_size = nvptx_stacks_size ();
+  void *stacks = nvptx_stacks_alloc (stack_size, teams * threads);
+  void *fn_args[] = {tgt_vars, stacks, (void *) stack_size};
+  size_t fn_args_size = sizeof fn_args;
+  void *config[] = {
+    CU_LAUNCH_PARAM_BUFFER_POINTER, fn_args,
+    CU_LAUNCH_PARAM_BUFFER_SIZE, &fn_args_size,
+    CU_LAUNCH_PARAM_END
+  };
   r = cuLaunchKernel (function,
 		      teams, 1, 1,
 		      32, threads, 1,
-		      0, ptx_dev->null_stream->stream, &fn_args, 0);
+		      0, ptx_dev->null_stream->stream, NULL, config);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
 
@@ -1935,6 +1974,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
 		       maybe_abort_msg);
   else if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
+  nvptx_stacks_free (stacks, teams * threads);
 }
 
 void