get:
Show a patch.

patch:
Update a patch.

put:
Update a patch.

GET /api/1.2/patches/2233011/?format=api
HTTP 200 OK
Allow: GET, PUT, PATCH, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "id": 2233011,
    "url": "http://patchwork.ozlabs.org/api/1.2/patches/2233011/?format=api",
    "web_url": "http://patchwork.ozlabs.org/project/gcc/patch/20260505134929.3522938-3-aarsenovic@baylibre.com/",
    "project": {
        "id": 17,
        "url": "http://patchwork.ozlabs.org/api/1.2/projects/17/?format=api",
        "name": "GNU Compiler Collection",
        "link_name": "gcc",
        "list_id": "gcc-patches.gcc.gnu.org",
        "list_email": "gcc-patches@gcc.gnu.org",
        "web_url": null,
        "scm_url": null,
        "webscm_url": null,
        "list_archive_url": "",
        "list_archive_url_format": "",
        "commit_url_format": ""
    },
    "msgid": "<20260505134929.3522938-3-aarsenovic@baylibre.com>",
    "list_archive_url": null,
    "date": "2026-05-05T13:14:58",
    "name": "[2/4] libgomp: let plugins handle allocating the target variable table",
    "commit_ref": null,
    "pull_url": null,
    "state": "new",
    "archived": false,
    "hash": "866f67aadbb55fce4a3c079fa27fc79e4f735a6a",
    "submitter": {
        "id": 92125,
        "url": "http://patchwork.ozlabs.org/api/1.2/people/92125/?format=api",
        "name": "Arsen Arsenović",
        "email": "aarsenovic@baylibre.com"
    },
    "delegate": null,
    "mbox": "http://patchwork.ozlabs.org/project/gcc/patch/20260505134929.3522938-3-aarsenovic@baylibre.com/mbox/",
    "series": [
        {
            "id": 502845,
            "url": "http://patchwork.ozlabs.org/api/1.2/series/502845/?format=api",
            "web_url": "http://patchwork.ozlabs.org/project/gcc/list/?series=502845",
            "date": "2026-05-05T13:14:59",
            "name": "GCN: Target offload overhead improvements, batch 2",
            "version": 1,
            "mbox": "http://patchwork.ozlabs.org/series/502845/mbox/"
        }
    ],
    "comments": "http://patchwork.ozlabs.org/api/patches/2233011/comments/",
    "check": "pending",
    "checks": "http://patchwork.ozlabs.org/api/patches/2233011/checks/",
    "tags": {},
    "related": [],
    "headers": {
        "Return-Path": "<gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>",
        "X-Original-To": [
            "incoming@patchwork.ozlabs.org",
            "gcc-patches@gcc.gnu.org"
        ],
        "Delivered-To": [
            "patchwork-incoming@legolas.ozlabs.org",
            "gcc-patches@gcc.gnu.org"
        ],
        "Authentication-Results": [
            "legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=baylibre-com.20251104.gappssmtp.com\n header.i=@baylibre-com.20251104.gappssmtp.com header.a=rsa-sha256\n header.s=20251104 header.b=qFyfAeMr;\n\tdkim-atps=neutral",
            "legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org\n (client-ip=38.145.34.32; helo=vm01.sourceware.org;\n envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;\n receiver=patchwork.ozlabs.org)",
            "sourceware.org;\n\tdkim=pass (2048-bit key,\n unprotected) header.d=baylibre-com.20251104.gappssmtp.com\n header.i=@baylibre-com.20251104.gappssmtp.com header.a=rsa-sha256\n header.s=20251104 header.b=qFyfAeMr",
            "sourceware.org;\n dmarc=none (p=none dis=none) header.from=baylibre.com",
            "sourceware.org; spf=pass smtp.mailfrom=baylibre.com",
            "server2.sourceware.org;\n arc=none smtp.remote-ip=2a00:1450:4864:20::529"
        ],
        "Received": [
            "from vm01.sourceware.org (vm01.sourceware.org [38.145.34.32])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g90KK243lz1yJV\n\tfor <incoming@patchwork.ozlabs.org>; Tue, 05 May 2026 23:51:01 +1000 (AEST)",
            "from vm01.sourceware.org (localhost [127.0.0.1])\n\tby sourceware.org (Postfix) with ESMTP id 3F4034BA9034\n\tfor <incoming@patchwork.ozlabs.org>; Tue,  5 May 2026 13:50:59 +0000 (GMT)",
            "from mail-ed1-x529.google.com (mail-ed1-x529.google.com\n [IPv6:2a00:1450:4864:20::529])\n by sourceware.org (Postfix) with ESMTPS id A87744BA79B3\n for <gcc-patches@gcc.gnu.org>; Tue,  5 May 2026 13:50:11 +0000 (GMT)",
            "by mail-ed1-x529.google.com with SMTP id\n 4fb4d7f45d1cf-67c3cb1433cso4446736a12.0\n for <gcc-patches@gcc.gnu.org>; Tue, 05 May 2026 06:50:11 -0700 (PDT)",
            "from localhost ([146.70.193.12])\n by smtp.googlemail.com with ESMTPSA id\n a640c23a62f3a-bbe6d977b7esm518831066b.48.2026.05.05.06.50.08\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Tue, 05 May 2026 06:50:09 -0700 (PDT)"
        ],
        "DKIM-Filter": [
            "OpenDKIM Filter v2.11.0 sourceware.org 3F4034BA9034",
            "OpenDKIM Filter v2.11.0 sourceware.org A87744BA79B3"
        ],
        "DMARC-Filter": "OpenDMARC Filter v1.4.2 sourceware.org A87744BA79B3",
        "ARC-Filter": "OpenARC Filter v1.0.0 sourceware.org A87744BA79B3",
        "ARC-Seal": "i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1777989012; cv=none;\n b=t+97Btyw+90VtXfKFOHkekBgqOwW7TGnoZiIiGV5iHIDtJzspVD5z+bi6PeUWSxiOLO4T4y3ADWSm2O4jv63M27RwDEZJkMUdno1S3gzQ+NRLUcIV1opxoXP4HuekgUgOfPMbujPJGee/ap0hLiriXsyZTmETvMtbfTUulwIXZo=",
        "ARC-Message-Signature": "i=1; a=rsa-sha256; d=sourceware.org; s=key;\n t=1777989012; c=relaxed/simple;\n bh=3Ab9h7pFg6Co7E/U04lj6zHzOZqAlH1fJI5xhVHWJyI=;\n h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;\n b=i/RZqkliBzJgOdmLzGk+9mW2TondzMU6YUZEWmdahFSDejgyNwnIDGLzRP3ZztiKUel1kW306S/jBNu3FpkY5FplptLZxjXPoIT33mYHtuDCVTY/89JmxRGv5qm3oFnpJusccEtTmrcaQx6sPx5aokgztMqavQJiTx1sHOixT4Q=",
        "ARC-Authentication-Results": "i=1; server2.sourceware.org",
        "DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=baylibre-com.20251104.gappssmtp.com; s=20251104; t=1777989010;\n x=1778593810;\n darn=gcc.gnu.org;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:from:to:cc:subject:date\n :message-id:reply-to;\n bh=B/D0bYkyrihvJLauNMHgDf/hVaF5BbTGxxq8NEoJypI=;\n b=qFyfAeMr1vUMM+t0pzo7kNdp8eG5Z9nXlfibhplylgMK631coJeq89VMHuF1Z/3uiW\n 9DWgWk6ec3FnBZBe7TUJsw3fI8R97ypO4kzmOCOeONsgSNrlkyOLecb9adNyRrHPoqwP\n lnIJ5bZmg2nTbBHXwHcPhCFMg64Glh0Mh/9r3a3lJ56ZxfTwUINQ37FmpgZ0yPy1O0yd\n Lc3EvjEmyP664m/o26g7iupc2l5ulS2/73QmpKwb+gkATw9zw2V8TlVOyZPLxnQqK2P5\n XJVTHHUJ0CrsIzB43YnqCLz6BKGy2VBdE+SqVNhO1MA6VKkm9KBHVMxF9ReDZwxQmJwM\n HMiQ==",
        "X-Google-DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1777989010; x=1778593810;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from\n :to:cc:subject:date:message-id:reply-to;\n bh=B/D0bYkyrihvJLauNMHgDf/hVaF5BbTGxxq8NEoJypI=;\n b=MPGtbuIIyR6PN390jPYuzgg/kQf6EO8XCpXBOSl6YJE3tQcMrD913I8zdakz9+Jy7R\n xhnbO1m4fxRHI0kJLJGxxTYxU4V9ciBtL5T45WMdFBGC6jdS0czYB/zDcBHkX2iBr7s7\n P+gijxa4IgM8H5EX9DdJaYMe2CKj9Of4YV31dCJNU1a0qWacITLGlQadv7eZmtTCQEBT\n RcslEhm+49BVlSwnr0lkFmRWplIWrLLSxF+Ay7YTTsaYkfplLCg/IK4pVJcLevRBVqNJ\n CtEdLeVVkwSiDkQRbAh11DI9Nfr7pMi5vhXbh9cYkS7Slq1AMFdXya31jE15U6hJg4h8\n Bt6Q==",
        "X-Gm-Message-State": "AOJu0YxTqZCiQRNpYyzuGP5RQrrKkDA43rpBfgn+8o/0fqDDYKTtROnl\n YjRO/U2ZZpmbvooEC/Fm7SOhFP3VvSn14hqrB29oEOzzAFKtgt23gGRFYp4E3peqGpudWTTHXku\n NFRJkMo4=",
        "X-Gm-Gg": "AeBDievHxqR1Z3wwf8OOqV3GeaNSLTwszZpBfOjFwxtPmPpbWVNGVHv5E8/yeqkC9hx\n bgO0khp2+8U9zSxpVLJAqA574dmF/vCSZ65T22MP8TMU4YJmH5FulfWjGYM6fbjfpaon4FKFPOT\n GYQIJDVMYhwldAMYSvSEehR4Sw2mYxqMWhqVAFu/KS847st33zRjd5cJJE7+HYJ+1sTPmGcHW+W\n XWax512WEoKeIOvKmfPBubByWqRD9rV7at49R6a+fK1b7Xh2pRp6umRbp/QHzbyxVL5Rp1RTmud\n yJZzWtqC2vY44qPAPi5JEorPLiKvdJALU/KK1/4KdwunSqA0RA4PbnevqMw55Yk2dD+cS4dWdPI\n 9/xJWZ255KOnTcYwmWJ/kvyiBcqbYX2XRJkTIp41oNb1B0glT2QNtSwar5IFFDt09GqXADW5UPW\n VyXj9GkQOevwsY+d2GUBBYtahNIY6GKKWWAm4=",
        "X-Received": "by 2002:a17:907:970a:b0:bac:7f7d:2bf8 with SMTP id\n a640c23a62f3a-bbff72cece5mr754669766b.0.1777989009678;\n Tue, 05 May 2026 06:50:09 -0700 (PDT)",
        "From": "=?utf-8?q?Arsen_Arsenovi=C4=87?= <aarsenovic@baylibre.com>",
        "To": "gcc-patches@gcc.gnu.org",
        "Cc": "=?utf-8?q?Arsen_Arsenovi=C4=87?= <aarsenovic@baylibre.com>",
        "Subject": "[PATCH 2/4] libgomp: let plugins handle allocating the target\n variable table",
        "Date": "Tue,  5 May 2026 15:14:58 +0200",
        "Message-ID": "<20260505134929.3522938-3-aarsenovic@baylibre.com>",
        "X-Mailer": "git-send-email 2.54.0",
        "In-Reply-To": "<20260505134929.3522938-1-aarsenovic@baylibre.com>",
        "References": "<20260505134929.3522938-1-aarsenovic@baylibre.com>",
        "MIME-Version": "1.0",
        "Content-Type": "text/plain; charset=UTF-8",
        "Content-Transfer-Encoding": "8bit",
        "X-BeenThere": "gcc-patches@gcc.gnu.org",
        "X-Mailman-Version": "2.1.30",
        "Precedence": "list",
        "List-Id": "Gcc-patches mailing list <gcc-patches.gcc.gnu.org>",
        "List-Unsubscribe": "<https://gcc.gnu.org/mailman/options/gcc-patches>,\n <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>",
        "List-Archive": "<https://gcc.gnu.org/pipermail/gcc-patches/>",
        "List-Post": "<mailto:gcc-patches@gcc.gnu.org>",
        "List-Help": "<mailto:gcc-patches-request@gcc.gnu.org?subject=help>",
        "List-Subscribe": "<https://gcc.gnu.org/mailman/listinfo/gcc-patches>,\n <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>",
        "Errors-To": "gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org"
    },
    "content": "In my examination of BabelStream results on AMD GCN, I've found that,\nfor each BabelStream kernel execution, we spend significant time in\nallocating and initializing memory in gomp_map_vars (~55µs, whereas the\nactual BabelStream code executes in ~746µs, meaning we increase the time\nBabelStream measures by 7% just on that).\n\nUpon further examination, I've found that the only reason gomp_map_vars\ndecides to allocate and map any memory in the first place is because it\nis constructing the table of pointers to variables on the target, which\nI've taken to calling the \"target variable table\".  Given that the GCN\nplugin already must perform some memory allocation before starting up a\nkernel, namely to allocate kernel arguments, it would be beneficial if\nwe could merge this allocation with the kernel arguments allocation.\n\nIn addition, since the kernel arguments live in host memory, populating\nthem can be performed using string functions, without any need to call\nfor expensive host2dev copies.\n\nThis patch introduces an opaque type for \"offload sessions\".  This type\nis defined by each plugin and allows it to store data related to a\nsingle offload job.  The sessions are allocated and managed by libgomp,\nand initialized and utilized by the plugin.  Their lifetime starts with\na call to GOMP_OFFLOAD_session_start, and ends with\nGOMP_OFFLOAD_{openacc_{async_,}exec,{async_,}run}.\n\nThe patch then uses this framework to make management of the target\nvariable table more flexible: the plugin may elect to implement\nGOMP_OFFLOAD_session_allocate_target_var_table, which allows the plugin\nto attempt to allocate the target variable table in host memory.\n\nIf it fails, or if the plugin does not provide this function, libgomp\nwill perform this allocation as it does today - in target memory - and\ntell the session about it using\nGOMP_OFFLOAD_session_set_target_var_table.\n\nIn the case of AMD GCN, upon a call to\nGOMP_OFFLOAD_session_allocate_target_var_table, the plugin will\nimmediately allocate kernel arguments with enough space for the target\nvariable table, no matter what size the plugin asks for[1], and return\nthat pointer to libgomp.\n\nThis results in the runtime of gomp_map_vars effectively disappearing\nfrom traces.\n\n[1] It may be beneficial to limit this, to some fixed amount, to make it\n    so that the future allocation cache has a higher cache hit rate.  It\n    may also depend on whether hsa_memory_allocate for kernel arguments\n    takes runtime proportional to the number of bytes it needs to\n    allocate.\n\ninclude/ChangeLog:\n\n\t* gomp-constants.h (GOMP_VERSION): Bump.  Signature of\n\tGOMP_OFFLOAD_run et al changed.\n\nlibgomp/ChangeLog:\n\n\t* libgomp-plugin.h (GOMP_OFFLOAD_run, GOMP_OFFLOAD_exec)\n\t(GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_async_exec): Pass\n\tsession in place of target variable table and devices.\n\t(struct gomp_offload_session): New.\n\t(GOMP_OFFLOAD_session_size): New\n\t(GOMP_OFFLOAD_check_session_struct): New.\n\t(GOMP_OFFLOAD_session_boilerplate): New.\n\t(GOMP_OFFLOAD_session_start): New.\n\t(GOMP_OFFLOAD_session_allocate_target_var_table): New.\n\t(GOMP_OFFLOAD_session_set_target_var_table): New.\n\t* libgomp.h (struct gomp_target_task): Add offload_session\n\tfield.\n\t(struct gomp_device_descr): Add offload session management\n\tfunctions.\n\t(gomp_offload_session_new): New.\n\t(goacc_map_vars): Add SESSION to signature\n\t* oacc-host.c (struct gomp_offload_session): Define, for host\n\toffload fallback case.\n\t(host_session_size): New.  Implements GOMP_OFFLOAD_session_size.\n\t(host_session_start): New.  Implements\n\tGOMP_OFFLOAD_session_start.\n\t(host_session_set_target_var_table): New.  Implements\n\tGOMP_OFFLOAD_session_set_target_var_table.\n\t(host_run): Adjust to match GOMP_OFFLOAD_run.\n\t(host_openacc_exec): Adjust to match GOMP_OFFLOAD_openacc_exec.\n\t(host_openacc_async_exec): Adjust to match\n\tGOMP_OFFLOAD_openacc_async_exec.\n\t* oacc-mem.c (acc_map_data): Adjust call to goacc_map_vars.\n\t(goacc_enter_datum): Ditto.\n\t(goacc_enter_data_internal): Ditto.\n\t* oacc-parallel.c (GOACC_parallel_keyed): Allocate and pass\n\toffload session.\n\t(GOACC_data_start): Adjust call to goacc_map_vars.\n\t* plugin/plugin-gcn.c (struct kernel_dispatch): Remove\n\tkernarg_cache_node.\n\t(struct kernargs): Add a flexible array member for the target\n\tvariable table.\n\t(struct kernel_launch): Store an offload session rather than\n\ttarget var. table pointer.\n\t(print_kernel_dispatch): Receive kernargs as parameter.\n\t(struct gomp_offload_session): Define.\n\t(init_session): New.\n\t(GOMP_OFFLOAD_session_start): Implement, using init_session.\n\t(release_session): New.\n\t(alloc_kernargs_on_agent): Rename to...\n\t(allocate_session_kernargs): ... this, store result in\n\tpassed-in SESSION, and allocate extra room for target variable\n\ttable (rounding it up to nearest multiple of 64 pointers).\n\t(GOMP_OFFLOAD_session_allocate_target_var_table): Implement\n\tusing the previous function.\n\t(GOMP_OFFLOAD_session_set_target_var_table): Ditto.\n\t(create_kernel_dispatch): Remove kernarg allocation, instead\n\treceiving it as an argument.\n\t(release_kernel_dispatch): Receive kernargs as an argument,\n\tdon't release them.\n\t(run_kernel): Adjust to use sessions.\n\t(destroy_module): Ditto.\n\t(GOMP_OFFLOAD_load_image): Ditto.\n\t(execute_queue_entry): Adjust to match changed struct\n\tkernel_launch.\n\t(queue_push_launch): Ditto.\n\t(gcn_exec): Receive and pass along session.\n\t(GOMP_OFFLOAD_run): Ditto.\n\t(GOMP_OFFLOAD_async_run): Ditto.\n\t(GOMP_OFFLOAD_openacc_exec): Ditto.\n\t(GOMP_OFFLOAD_openacc_async_exec): Ditto.\n\t* plugin/plugin-nvptx.c (struct gomp_offload_session): Define.\n\t(GOMP_OFFLOAD_session_start): Implement.\n\t(GOMP_OFFLOAD_session_set_target_var_table): Implement.\n\t(GOMP_OFFLOAD_openacc_exec): Adjust to receive session.\n\t(GOMP_OFFLOAD_openacc_async_exec): Ditto.\n\t(GOMP_OFFLOAD_run): Ditto.\n\t* target.c (gomp_get_tvt_size): Extract helper from...\n\t(gomp_map_vars_internal): ... here.  Receive SESSION, iff doing\n\ttarget offload.  Use a target variable table on the host\n\tallocated by GOMP_OFFLOAD_session_allocate_target_var_table if\n\tpossible, or call GOMP_OFFLOAD_session_set_target_var_table with\n\tan allocated device pointer otherwise.\n\t(gomp_map_vars): Update to pass along session.\n\t(goacc_map_vars): Ditto.\n\t(GOMP_target): Allocate and pass along session.\n\t(GOMP_target_ext): Ditto.\n\t(gomp_target_data_fallback): Adjust call to gomp_map_vars.\n\t(GOMP_target_data): Ditto.\n\t(GOMP_target_data_ext): Ditto.\n\t(GOMP_target_enter_exit_data): Ditto.\n\t(gomp_target_task_fn): Start and pass along session, the storage\n\tfor which is allocated by gomp_create_target_task.\n\t(DLSYM2): Rename from DLSYM, adding a new parameter for the\n\tvariable to populate, akin to DLSYM_OPT.\n\t(DLSYM): Delegate to DLSYM2.\n\t(gomp_load_plugin_for_device): Populate session-related fields.\n\t* task.c (gomp_create_target_task): Allocate enough storage for\n\tan offload session.\n\t* testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c: New test.\n\t* testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c: New test.\n---\n include/gomp-constants.h                      |   2 +-\n libgomp/libgomp-plugin.h                      |  81 +++++-\n libgomp/libgomp.h                             |  27 +-\n libgomp/oacc-host.c                           |  63 ++++-\n libgomp/oacc-mem.c                            |   8 +-\n libgomp/oacc-parallel.c                       |  24 +-\n libgomp/plugin/plugin-gcn.c                   | 254 ++++++++++++------\n libgomp/plugin/plugin-nvptx.c                 |  45 +++-\n libgomp/target.c                              | 191 ++++++++-----\n libgomp/task.c                                |  33 ++-\n .../gcn-kernel-launch-no-tvt-alloc.c          |  51 ++++\n .../gcn-kernel-launch-tvt-alloc.c             |  16 ++\n 12 files changed, 604 insertions(+), 191 deletions(-)\n create mode 100644 libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c\n create mode 100644 libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c",
    "diff": "diff --git a/include/gomp-constants.h b/include/gomp-constants.h\nindex 0a0761043f96..8304ae839fd1 100644\n--- a/include/gomp-constants.h\n+++ b/include/gomp-constants.h\n@@ -332,7 +332,7 @@ enum gomp_map_kind\n /* Versions of libgomp and device-specific plugins.  GOMP_VERSION\n    should be incremented whenever an ABI-incompatible change is introduced\n    to the plugin interface defined in libgomp/libgomp.h.  */\n-#define GOMP_VERSION\t3\n+#define GOMP_VERSION\t4\n #define GOMP_VERSION_NVIDIA_PTX 1\n #define GOMP_VERSION_GCN 3\n \ndiff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h\nindex bb4d577b66d7..66351e826d96 100644\n--- a/libgomp/libgomp-plugin.h\n+++ b/libgomp/libgomp-plugin.h\n@@ -188,11 +188,76 @@ extern int GOMP_OFFLOAD_memcpy3d (int, int, size_t, size_t, size_t, void *,\n \t\t\t\t  size_t);\n extern bool GOMP_OFFLOAD_memset (int, void *, int, size_t);\n extern bool GOMP_OFFLOAD_can_run (void *);\n-extern void GOMP_OFFLOAD_run (int, void *, void *, void **);\n-extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);\n \n-extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,\n-\t\t\t\t       void **, unsigned *, void *);\n+/* An opaque type, encapsulating the state required to launch a single\n+   'target' region.  This type is expected to have alignment no greater than\n+   the alignment 'malloc' and 'alloca' can provide.\n+\n+   The lifetime of the memory reserved for an offload session is managed by\n+   libgomp.  It will ensure that it is deallocated only after a kernel is done\n+   executing.\n+\n+   Per offload session, exactly one of GOMP_OFFLOAD{,_async}_run or\n+   GOMP_OFFLOAD_openacc{,_async}_exec will be called.  This is also the last\n+   operation performed on a session.  */\n+struct gomp_offload_session;\n+\n+/* Validate that a 'struct gomp_offload_session' declaration is acceptable.  */\n+#define GOMP_OFFLOAD_check_session_struct()\t\t\t\t\\\n+  _Static_assert (_Alignof (struct gomp_offload_session) < __BIGGEST_ALIGNMENT__, \\\n+\t\t \"gomp_offload_session requires too high alignment\")\n+\n+/* Return size of a gomp_offload_session instance.  libgomp takes care of\n+   allocating and deallocating enough memory to store the session.  */\n+[[gnu::const]] extern size_t GOMP_OFFLOAD_session_size (void);\n+\n+/* Check that the 'struct gomp_offload_struct' declaration is acceptable, and\n+   implement GOMP_OFFLOAD_session_size.  */\n+#define GOMP_OFFLOAD_session_boilerplate()\t\t\\\n+  GOMP_OFFLOAD_check_session_struct ();\t\t\t\\\n+  [[gnu::const]] size_t\t\t\t\t\t\\\n+  GOMP_OFFLOAD_session_size (void)\t\t\t\\\n+  { return sizeof (struct gomp_offload_session); }\n+\n+/* Initialize SESSION for executing a kernel on DEVICE.  */\n+extern void GOMP_OFFLOAD_session_start (struct gomp_offload_session *session,\n+\t\t\t\t\tint device);\n+\n+/* Attempt to allocate a target variable table in host memory for SESSION.\n+   This table must be of at least table_size bytes and aligned to\n+   __BIGGEST_ALIGNMENT__.\n+\n+   This function will be called at most once per SESSION.\n+\n+   If this function returns NULL, or if libgomp never calls it,\n+   GOMP_OFFLOAD_session_set_target_var_table will be called instead, with\n+   memory allocated by libgomp for the purpose.\n+\n+   If this function is omitted, libgomp will behave as if it always returns\n+   NULL.  */\n+extern void **GOMP_OFFLOAD_session_allocate_target_var_table\n+  (struct gomp_offload_session *session, size_t table_size);\n+\n+/* Set TABLE, a device pointer, as the pointer to the target variable table.\n+   It may be NULL, in which case there's no target variable table.\n+\n+   Called iff GOMP_OFFLOAD_session_allocate_target_var_table did not succeed\n+   or was not called.  */\n+extern void GOMP_OFFLOAD_session_set_target_var_table\n+  (struct gomp_offload_session *session, void **table);\n+\n+extern void GOMP_OFFLOAD_run (struct gomp_offload_session *session,\n+\t\t\t      void *fn_ptr,\n+\t\t\t      void **args);\n+extern void GOMP_OFFLOAD_async_run (struct gomp_offload_session *session,\n+\t\t\t\t    void *tgt_fn,\n+\t\t\t\t    void **args,\n+\t\t\t\t    void *async_data);\n+\n+extern void GOMP_OFFLOAD_openacc_exec (struct gomp_offload_session *session,\n+\t\t\t\t       void (*tgt_fn) (void *),\n+\t\t\t\t       size_t mapnum, void **hostaddrs,\n+\t\t\t\t       unsigned *dims, void *targ_mem_desc);\n extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);\n extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);\n extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (int);\n@@ -203,9 +268,11 @@ extern bool GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,\n \t\t\t\t\t\t  struct goacc_asyncqueue *);\n extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,\n \t\t\t\t\t\t       void (*)(void *), void *);\n-extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,\n-\t\t\t\t\t     void **, unsigned *, void *,\n-\t\t\t\t\t     struct goacc_asyncqueue *);\n+extern void GOMP_OFFLOAD_openacc_async_exec (struct gomp_offload_session *session,\n+\t\t\t\t\t     void (*fn_ptr) (void *),\n+\t\t\t\t\t     size_t mapnum, void **hostaddrs,\n+\t\t\t\t\t     unsigned *dims, void *targ_mem_desc,\n+\t\t\t\t\t     struct goacc_asyncqueue *aq);\n extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, size_t,\n \t\t\t\t\t\t struct goacc_asyncqueue *);\n extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, size_t,\ndiff --git a/libgomp/libgomp.h b/libgomp/libgomp.h\nindex c51bd680713f..2b0327ebf557 100644\n--- a/libgomp/libgomp.h\n+++ b/libgomp/libgomp.h\n@@ -771,6 +771,9 @@ struct gomp_target_task\n   struct gomp_task *task;\n   struct gomp_team *team;\n   /* Device-specific target arguments.  */\n+\n+  /* Pointer to the offload session for this task.  */\n+  struct gomp_offload_session *offload_session;\n   void **args;\n   void *hostaddrs[];\n };\n@@ -1465,6 +1468,17 @@ struct gomp_device_descr\n   __typeof (GOMP_OFFLOAD_memcpy2d) *memcpy2d_func;\n   __typeof (GOMP_OFFLOAD_memcpy3d) *memcpy3d_func;\n   __typeof (GOMP_OFFLOAD_memset) *memset_func;\n+  struct {\n+    /* Cached below as 'size'.  */\n+    __typeof (GOMP_OFFLOAD_session_size) *size_func;\n+    __typeof (GOMP_OFFLOAD_session_start) *start_func;\n+    __typeof (GOMP_OFFLOAD_session_allocate_target_var_table) *alloc_tvt_func;\n+    __typeof (GOMP_OFFLOAD_session_set_target_var_table) *set_tvt_func;\n+\n+    /* Size of a single gomp_offload_session object, as specified by\n+       GOMP_OFFLOAD_session_size.  */\n+    size_t size;\n+  } session;\n   __typeof (GOMP_OFFLOAD_can_run) *can_run_func;\n   __typeof (GOMP_OFFLOAD_run) *run_func;\n   __typeof (GOMP_OFFLOAD_async_run) *async_run_func;\n@@ -1491,6 +1505,16 @@ struct gomp_device_descr\n   acc_dispatch_t openacc;\n };\n \n+/* Allocate an offload session using for gomp_device_descr DEV using ALLOC, and\n+   initialize it.  Provided as a macro, so that 'alloca' can be used as\n+   ALLOC. */\n+#define gomp_offload_session_new(devicep, alloc)\t\t\\\n+  ({\t\t\t\t\t\t\t\t\\\n+    void *session = alloc (devicep->session.size);\t\\\n+    devicep->session.start_func (session, devicep->target_id);\t\\\n+    session;\t\t\t\t\t\t\t\\\n+  })\n+\n /* Kind of the pragma, for which gomp_map_vars () is called.  */\n enum gomp_map_vars_kind\n {\n@@ -1524,7 +1548,8 @@ extern struct target_mem_desc *goacc_map_vars (struct gomp_device_descr *,\n \t\t\t\t\t       struct goacc_asyncqueue *,\n \t\t\t\t\t       size_t, void **, void **,\n \t\t\t\t\t       size_t *, void *, bool,\n-\t\t\t\t\t       enum gomp_map_vars_kind);\n+\t\t\t\t\t       enum gomp_map_vars_kind,\n+\t\t\t\t\t       struct gomp_offload_session *);\n extern void goacc_unmap_vars (struct target_mem_desc *, bool,\n \t\t\t      struct goacc_asyncqueue *);\n extern void gomp_init_device (struct gomp_device_descr *);\ndiff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c\nindex 028a5c943b7e..cdfd6822afe7 100644\n--- a/libgomp/oacc-host.c\n+++ b/libgomp/oacc-host.c\n@@ -30,8 +30,24 @@\n #include \"oacc-int.h\"\n #include \"gomp-constants.h\"\n \n+#include <assert.h>\n #include <stdbool.h>\n #include <stddef.h>\n+#include <string.h>\n+\n+/* Defined under a name other than gomp_offload_session to make debugging with\n+   GDB easier.  If this struct was called gomp_offload_session, GDB would\n+   frequently ignore the plugin-specific definition.  */\n+struct host_offload_session\n+{\n+  void *vars;\n+};\n+_Static_assert (_Alignof (struct host_offload_session) < __BIGGEST_ALIGNMENT__,\n+\t\t\"gomp_offload_session requires too high alignment\");\n+\n+static size_t\n+host_session_size (void)\n+{ return sizeof (struct host_offload_session); }\n \n static struct gomp_device_descr host_dispatch;\n \n@@ -128,19 +144,41 @@ host_host2dev (int n __attribute__ ((unused)),\n }\n \n static void\n-host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars,\n-\t  void **args __attribute__((unused)))\n+host_session_start (struct gomp_offload_session *osession, int dev)\n {\n-  void (*fn)(void *) = (void (*)(void *)) fn_ptr;\n-\n-  fn (vars);\n+  (void) dev;\n+  struct host_offload_session *session;\n+  memcpy (&session, &osession, sizeof (session));\n+  *session = (struct host_offload_session) {\n+    .vars = NULL,\n+  };\n }\n \n static void\n-host_openacc_exec (void (*fn) (void *),\n+host_session_set_target_var_table (struct gomp_offload_session *osession,\n+\t\t\t\t   void **table)\n+{\n+  struct host_offload_session *session;\n+  memcpy (&session, &osession, sizeof (session));\n+  assert (!session->vars);\n+  session->vars = table;\n+}\n+\n+static void\n+host_run (struct gomp_offload_session *osession, void *fn_ptr, void **args)\n+{\n+  struct host_offload_session *session;\n+  memcpy (&session, &osession, sizeof (session));\n+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;\n+\n+  fn (session->vars);\n+}\n+\n+static void\n+host_openacc_exec (struct gomp_offload_session *session __attribute__((unused)),\n+\t\t   void (*fn) (void *),\n \t\t   size_t mapnum __attribute__ ((unused)),\n \t\t   void **hostaddrs,\n-\t\t   void **devaddrs __attribute__ ((unused)),\n \t\t   unsigned *dims __attribute__ ((unused)),\n \t\t   void *targ_mem_desc __attribute__ ((unused)))\n {\n@@ -148,10 +186,10 @@ host_openacc_exec (void (*fn) (void *),\n }\n \n static void\n-host_openacc_async_exec (void (*fn) (void *),\n+host_openacc_async_exec (struct gomp_offload_session *session __attribute__((unused)),\n+\t\t\t void (*fn) (void *),\n \t\t\t size_t mapnum __attribute__ ((unused)),\n \t\t\t void **hostaddrs,\n-\t\t\t void **devaddrs __attribute__ ((unused)),\n \t\t\t unsigned *dims __attribute__ ((unused)),\n \t\t\t void *targ_mem_desc __attribute__ ((unused)),\n \t\t\t struct goacc_asyncqueue *aq __attribute__ ((unused)))\n@@ -288,6 +326,13 @@ static struct gomp_device_descr host_dispatch =\n     .memcpy3d_func = NULL,\n     .run_func = host_run,\n \n+    .session = {\n+      .size_func = host_session_size,\n+      .start_func = host_session_start,\n+      .set_tvt_func = host_session_set_target_var_table,\n+      .size = sizeof (struct host_offload_session),\n+    },\n+\n     .mem_map = { NULL },\n     .mem_map_rev = { NULL },\n     /* .lock initialized in goacc_host_init.  */\ndiff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c\nindex 738281f5701c..5601daf13957 100644\n--- a/libgomp/oacc-mem.c\n+++ b/libgomp/oacc-mem.c\n@@ -421,7 +421,7 @@ acc_map_data (void *h, void *d, size_t s)\n \n       struct target_mem_desc *tgt\n \t= goacc_map_vars (acc_dev, NULL, mapnum, &hostaddrs, &devaddrs, &sizes,\n-\t\t\t  &kinds, true, GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t  &kinds, true, GOMP_MAP_VARS_ENTER_DATA, NULL);\n       assert (tgt);\n       assert (tgt->list_count == 1);\n       splay_tree_key n = tgt->list[0].key;\n@@ -586,7 +586,7 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void *kinds, int async)\n \n       struct target_mem_desc *tgt\n \t= goacc_map_vars (acc_dev, aq, mapnum, hostaddrs, NULL, sizes,\n-\t\t\t  kinds, true, GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t  kinds, true, GOMP_MAP_VARS_ENTER_DATA, NULL);\n       assert (tgt);\n       assert (tgt->list_count == 1);\n       n = tgt->list[0].key;\n@@ -1225,7 +1225,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,\n \t      struct target_mem_desc *tgt_ __attribute__((unused))\n \t\t= goacc_map_vars (acc_dev, aq, groupnum, &hostaddrs[i], NULL,\n \t\t\t\t  &sizes[i], &kinds[i], true,\n-\t\t\t\t  GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t\t  GOMP_MAP_VARS_ENTER_DATA, NULL);\n \t      assert (tgt_ == NULL);\n \t      gomp_mutex_lock (&acc_dev->lock);\n \n@@ -1276,7 +1276,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,\n \t  struct target_mem_desc *tgt\n \t    = goacc_map_vars (acc_dev, aq, groupnum, &hostaddrs[i], NULL,\n \t\t\t      &sizes[i], &kinds[i], true,\n-\t\t\t      GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t      GOMP_MAP_VARS_ENTER_DATA, NULL);\n \t  assert (tgt);\n \n \t  gomp_mutex_lock (&acc_dev->lock);\ndiff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c\nindex 9f48c8b7f644..04ff26f28e59 100644\n--- a/libgomp/oacc-parallel.c\n+++ b/libgomp/oacc-parallel.c\n@@ -291,9 +291,14 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),\n \n   goacc_aq aq = get_goacc_asyncqueue (async);\n \n+  /* Prepare an offload session.  */\n+  struct gomp_offload_session *session\n+    = (aq ? gomp_offload_session_new (acc_dev, gomp_malloc)\n+       : gomp_offload_session_new (acc_dev, alloca));\n+\n   struct target_mem_desc *tgt\n     = goacc_map_vars (acc_dev, aq, mapnum, hostaddrs, NULL, sizes, kinds, true,\n-\t\t      GOMP_MAP_VARS_TARGET);\n+\t\t      GOMP_MAP_VARS_TARGET, session);\n \n   if (profiling_p)\n     {\n@@ -304,13 +309,11 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),\n \t\t\t\t&api_info);\n     }\n \n-  void **devaddrs = (void **) tgt->tgt_start;\n   if (aq == NULL)\n-    acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims,\n-\t\t\t\ttgt);\n+    acc_dev->openacc.exec_func (session, tgt_fn, mapnum, hostaddrs, dims, tgt);\n   else\n-    acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,\n-\t\t\t\t      dims, tgt, aq);\n+    acc_dev->openacc.async.exec_func (session, tgt_fn, mapnum, hostaddrs, dims,\n+\t\t\t\t      tgt, aq);\n \n   if (profiling_p)\n     {\n@@ -324,6 +327,11 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),\n   /* If running synchronously (aq == NULL), this will unmap immediately.  */\n   goacc_unmap_vars (tgt, true, aq);\n \n+  if (aq)\n+      /* We need to clean up the above-allocated session later if executing\n+\t asynchronously. */\n+      acc_dev->openacc.async.queue_callback_func (aq, free, session);\n+\n   if (profiling_p)\n     {\n       prof_info.event_type = acc_ev_exit_data_end;\n@@ -454,7 +462,7 @@ GOACC_data_start (int flags_m, size_t mapnum,\n     {\n       prof_info.device_type = acc_device_host;\n       api_info.device_type = prof_info.device_type;\n-      tgt = goacc_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, 0);\n+      tgt = goacc_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, 0, NULL);\n       tgt->prev = thr->mapped_data;\n       thr->mapped_data = tgt;\n \n@@ -463,7 +471,7 @@ GOACC_data_start (int flags_m, size_t mapnum,\n \n   gomp_debug (0, \"  %s: prepare mappings\\n\", __FUNCTION__);\n   tgt = goacc_map_vars (acc_dev, NULL, mapnum, hostaddrs, NULL, sizes, kinds,\n-\t\t\ttrue, 0);\n+\t\t\ttrue, 0, NULL);\n   gomp_debug (0, \"  %s: mappings prepared\\n\", __FUNCTION__);\n   tgt->prev = thr->mapped_data;\n   thr->mapped_data = tgt;\ndiff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c\nindex 32f573f1b7f6..99ba65e14243 100644\n--- a/libgomp/plugin/plugin-gcn.c\n+++ b/libgomp/plugin/plugin-gcn.c\n@@ -282,9 +282,6 @@ struct kernel_dispatch\n   struct agent_info *agent;\n   /* Pointer to a command queue associated with a kernel dispatch agent.  */\n   void *queue;\n-  /* Pointer to a memory space used for kernel arguments passing, wrapped in a\n-     node from the agent kernel argument cache.  */\n-  struct alloc_cache_node *kernarg_cache_node;\n   /* Kernel object.  */\n   uint64_t object;\n   /* Synchronization signal used for dispatch synchronization.  */\n@@ -305,14 +302,18 @@ struct kernargs {\n \n   /* Output data.  */\n   struct output output_data;\n+\n+  /* Target variable table.  Size determined by gomp_map_vars.  See\n+     GOMP_OFFLOAD_session_allocate_target_var_table.  */\n+  _Alignas (__BIGGEST_ALIGNMENT__) void *target_variable_table[];\n };\n \n /* A queue entry for a future asynchronous launch.  */\n \n struct kernel_launch\n {\n+  struct gomp_offload_session *session;\n   struct kernel_info *kernel;\n-  void *vars;\n   struct GOMP_kernel_launch_attributes kla;\n };\n \n@@ -1085,11 +1086,10 @@ dump_executable_symbols (hsa_executable_t executable)\n \n /* Dump kernel DISPATCH data structure and indent it by INDENT spaces.  */\n \n-static void\n-print_kernel_dispatch (struct kernel_dispatch *dispatch, unsigned indent)\n+static inline void\n+print_kernel_dispatch (struct kernel_dispatch *dispatch, unsigned indent,\n+\t\t       struct kernargs *kernargs)\n {\n-  struct kernargs *kernargs = dispatch->kernarg_cache_node->allocation;\n-\n   fprintf (stderr, \"%*sthis: %p\\n\", indent, \"\", dispatch);\n   fprintf (stderr, \"%*squeue: %p\\n\", indent, \"\", dispatch->queue);\n   fprintf (stderr, \"%*skernarg_address: %p\\n\", indent, \"\", kernargs);\n@@ -1824,6 +1824,118 @@ max_isa_vgprs (int isa)\n \n /* }}}  */\n /* {{{ Run  */\n+struct gomp_offload_session\n+{\n+  /* Pointer to a memory space used for kernel arguments passing, wrapped in a\n+     node from the agent kernel argument cache.  */\n+  struct alloc_cache_node *kernarg_cache_node;\n+  /* Pointer to the actual target variable table.  */\n+  void **target_var_table;\n+  /* Device executing the kernel for this offload session.  */\n+  struct agent_info *agent;\n+};\n+GOMP_OFFLOAD_session_boilerplate ();\n+\n+/* Prepare SESSION for use by AGENT.  */\n+void\n+init_session (struct gomp_offload_session *session, struct agent_info *agent)\n+{\n+  assert (agent);\n+  *session = (struct gomp_offload_session) {\n+    .kernarg_cache_node = NULL,\n+    .target_var_table = NULL,\n+    .agent = agent,\n+  };\n+}\n+\n+void\n+GOMP_OFFLOAD_session_start (struct gomp_offload_session *session, int device)\n+{\n+  GCN_DEBUG (\"Starting session %p\\n\", session);\n+  assert ((((uintptr_t) session) % __BIGGEST_ALIGNMENT__) == 0);\n+  init_session (session, get_agent_info (device));\n+}\n+\n+/* Release resources held by SESSION (but not SESSION itself).  */\n+void\n+release_session (struct gomp_offload_session *session)\n+{\n+  release_alloc_cache_node (session->kernarg_cache_node);\n+}\n+\n+/* Get new kernargs for SESSION such that it can store TABLE_SIZE char units of\n+   target variable table, reusing cached kernargs allocations, if possible.  */\n+\n+static inline struct kernargs *\n+allocate_session_kernargs (struct gomp_offload_session *session,\n+\t\t\t   size_t table_size)\n+{\n+  GCN_DEBUG (\"Session %p asked for allocation of kernargs+%zu...\\n\", session, table_size);\n+  struct agent_info *agent = session->agent;\n+  assert (!session->kernarg_cache_node);\n+\n+  /* To increase chance of cache hit, round up size of the target variable\n+     table to a multiple of (64*sizeof(void*)), and ensure that this size is\n+     nonzero.  */\n+  if (!table_size)\n+    table_size++;\n+\n+  {\n+    constexpr size_t rounding_factor = 64 * sizeof (void*);\n+    table_size += rounding_factor - 1;\n+    table_size = (table_size / rounding_factor) * table_size;\n+  }\n+  size_t kernargs_size = sizeof (struct kernargs) + table_size;\n+\n+  session->kernarg_cache_node = (alloc_cache_try_find\n+\t\t\t\t (&agent->kernarg_cache,\n+\t\t\t\t  kernargs_size));\n+\n+  if (!session->kernarg_cache_node)\n+    {\n+      /* Cache miss.  */\n+      void *ka_addr;\n+      hsa_status_t status = hsa_fns.hsa_memory_allocate_fn\n+\t(agent->kernarg_region, sizeof (struct kernargs), &ka_addr);\n+      if (status != HSA_STATUS_SUCCESS)\n+\thsa_fatal (\"Could not allocate memory for GCN kernel arguments\", status);\n+\n+      session->kernarg_cache_node = (alloc_cache_add_taken_node\n+\t\t\t\t     (&agent->kernarg_cache, ka_addr,\n+\t\t\t\t      kernargs_size));\n+      if (!session->kernarg_cache_node)\n+\tGOMP_PLUGIN_fatal (\"Could not allocate cache node for kernel arguments\");\n+    }\n+\n+  return session->kernarg_cache_node->allocation;\n+}\n+\n+void **\n+GOMP_OFFLOAD_session_allocate_target_var_table (struct gomp_offload_session *session,\n+\t\t\t\t\t\tsize_t table_size)\n+{\n+  GCN_DEBUG (\"Session %p asked to allocate\\n\", session);\n+  /* libgomp wants us to handle the TVT.  */\n+  assert (!session->target_var_table);\n+\n+  if (secure_getenv (\"GCN_INHIBIT_KERNARGS_TVT_MERGE\"))\n+    /* ... but the user does not.  Used for testing.  */\n+    return NULL;\n+\n+  struct kernargs *kernargs = allocate_session_kernargs (session, table_size);\n+  return session->target_var_table = &kernargs->target_variable_table[0];\n+}\n+\n+void\n+GOMP_OFFLOAD_session_set_target_var_table (struct gomp_offload_session *session,\n+\t\t\t\t\t   void **table)\n+{\n+  GCN_DEBUG (\"Session %p will use TVT %p...\\n\", session, table);\n+  assert (!session->target_var_table);\n+  /* libgomp wants to handle the TVT.  */\n+  allocate_session_kernargs (session, 0);\n+  session->target_var_table = table;\n+}\n \n /* Create or reuse a team arena and stack space.\n  \n@@ -2010,40 +2122,12 @@ alloc_by_agent (struct agent_info *agent, size_t size)\n   return ptr;\n }\n \n-/* Get a cached kernargs from AGENT, returning an existing one if any are\n-   available.  Returns an alloc_cache_node whose value is this allocation.  */\n-\n-static struct alloc_cache_node *\n-alloc_kernargs_on_agent (struct agent_info *agent, size_t size)\n-{\n-  struct alloc_cache_node *ka_node = (alloc_cache_try_find\n-\t\t\t\t      (&agent->kernarg_cache, size));\n-\n-  /* The cache was empty.  */\n-  if (!ka_node)\n-    {\n-      void *ka_addr;\n-      hsa_status_t status = hsa_fns.hsa_memory_allocate_fn\n-\t(agent->kernarg_region, sizeof (struct kernargs), &ka_addr);\n-      if (status != HSA_STATUS_SUCCESS)\n-\thsa_fatal (\"Could not allocate memory for GCN kernel arguments\", status);\n-\n-      ka_node = alloc_cache_add_taken_node (&agent->kernarg_cache,\n-\t\t\t\t\t    ka_addr,\n-\t\t\t\t\t    size);\n-      if (!ka_node)\n-\tGOMP_PLUGIN_fatal (\"Could not allocate cache node for kernel arguments\");\n-    }\n-\n-  return ka_node;\n-}\n-\n /* Create kernel dispatch data structure for given KERNEL, along with\n    the necessary device signals and memory allocations.  */\n \n static struct kernel_dispatch *\n create_kernel_dispatch (struct kernel_info *kernel, int num_teams,\n-\t\t\tint num_threads)\n+\t\t\tint num_threads, struct kernargs *kernargs)\n {\n   struct agent_info *agent = kernel->agent;\n   struct kernel_dispatch *shadow\n@@ -2088,11 +2172,6 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams,\n       return NULL;\n     }\n \n-  /* Get an allocation, if possible from the cache.  */\n-  shadow->kernarg_cache_node = (alloc_kernargs_on_agent\n-\t\t\t\t(agent, sizeof (struct kernargs)));\n-  struct kernargs *kernargs = shadow->kernarg_cache_node->allocation;\n-\n   /* Zero-initialize the output_data (minimum needed).  */\n   kernargs->abi.out_ptr = (int64_t)&kernargs->output_data;\n   kernargs->output_data.next_output = 0;\n@@ -2185,19 +2264,17 @@ console_output (struct kernel_info *kernel, struct kernargs *kernargs,\n /* Release data structure created for a kernel dispatch in SHADOW argument,\n    and clean up the signal and memory allocations.  */\n \n-static void\n-release_kernel_dispatch (struct kernel_dispatch *shadow)\n+static inline void\n+release_kernel_dispatch (struct kernel_dispatch *shadow,\n+\t\t\t struct kernargs *kernargs)\n {\n   GCN_DEBUG (\"Released kernel dispatch: %p\\n\", shadow);\n \n-  struct kernargs *kernargs = shadow->kernarg_cache_node->allocation;\n   void *addr = (void *)kernargs->abi.arena_ptr;\n   if (!addr)\n     addr = (void *)kernargs->abi.stack_ptr;\n   release_ephemeral_memories (shadow->agent, addr);\n \n-  release_alloc_cache_node (shadow->kernarg_cache_node);\n-\n   hsa_signal_t s;\n   s.handle = shadow->signal;\n   hsa_fns.hsa_signal_destroy_fn (s);\n@@ -2295,15 +2372,16 @@ init_kernel (struct kernel_info *kernel)\n \t\t       \"mutex\");\n }\n \n-/* Run KERNEL on its agent, pass VARS to it as arguments and take\n-   launch attributes from KLA.\n+/* Run KERNEL on its agent as part of SESSION and take launch attributes from\n+   KLA.\n    \n    MODULE_LOCKED indicates that the caller already holds the lock and\n    run_kernel need not lock it again.\n    If AQ is NULL then agent->sync_queue will be used.  */\n \n static void\n-run_kernel (struct kernel_info *kernel, void *vars,\n+run_kernel (struct gomp_offload_session *session,\n+\t    struct kernel_info *kernel,\n \t    struct GOMP_kernel_launch_attributes *kla,\n \t    struct goacc_asyncqueue *aq, bool module_locked)\n {\n@@ -2389,6 +2467,9 @@ run_kernel (struct kernel_info *kernel, void *vars,\n \t\t\t\t\t     packet->grid_size_x,\n \t\t\t\t\t     kla->wdims[0]);\n \n+  struct kernargs *kernargs = session->kernarg_cache_node->allocation;\n+  packet->kernarg_address = kernargs;\n+\n   if (kla->ndim >= 2)\n     {\n       packet->grid_size_y = kla->gdims[1];\n@@ -2426,27 +2507,25 @@ run_kernel (struct kernel_info *kernel, void *vars,\n \n   struct kernel_dispatch *shadow\n     = create_kernel_dispatch (kernel, packet->grid_size_x,\n-\t\t\t      packet->grid_size_z);\n+\t\t\t      packet->grid_size_z, kernargs);\n   shadow->queue = command_q;\n \n   if (debug)\n     {\n       fprintf (stderr, \"\\nKernel has following dependencies:\\n\");\n-      print_kernel_dispatch (shadow, 2);\n+      print_kernel_dispatch (shadow, 2, kernargs);\n     }\n \n   packet->private_segment_size = shadow->private_segment_size;\n   packet->group_segment_size = shadow->group_segment_size;\n   packet->kernel_object = shadow->object;\n-  struct kernargs *kernargs = (packet->kernarg_address\n-\t\t\t       = shadow->kernarg_cache_node->allocation);\n   hsa_signal_t s;\n   s.handle = shadow->signal;\n   packet->completion_signal = s;\n   hsa_fns.hsa_signal_store_relaxed_fn (s, 1);\n-  memcpy (kernargs, &vars, sizeof (vars));\n \n-  GCN_DEBUG (\"Copying kernel runtime pointer to kernarg_address\\n\");\n+  GCN_DEBUG (\"Copying kernel runtime pointer %p to kernarg_address\\n\", session->target_var_table);\n+  memcpy (kernargs, &session->target_var_table, sizeof (session->target_var_table));\n \n   uint16_t header;\n   header = HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE;\n@@ -2476,7 +2555,8 @@ run_kernel (struct kernel_info *kernel, void *vars,\n \n   unsigned int return_value = (unsigned int)kernargs->output_data.return_value;\n \n-  release_kernel_dispatch (shadow);\n+  release_kernel_dispatch (shadow, kernargs);\n+  release_session (session);\n \n   if (!module_locked && pthread_rwlock_unlock (&agent->module_rwlock))\n     GOMP_PLUGIN_fatal (\"Unable to unlock a GCN agent rwlock\");\n@@ -2788,7 +2868,10 @@ destroy_module (struct module_info *module, bool locked)\n   if (module->fini_array_func)\n     {\n       init_kernel (module->fini_array_func);\n-      run_kernel (module->fini_array_func, NULL, &kla, NULL, locked);\n+      struct gomp_offload_session session;\n+      init_session (&session, module->fini_array_func->agent);\n+      GOMP_OFFLOAD_session_set_target_var_table (&session, NULL);\n+      run_kernel (&session, module->fini_array_func, &kla, NULL, locked);\n     }\n   module->constructors_run_p = false;\n \n@@ -2820,8 +2903,8 @@ execute_queue_entry (struct goacc_asyncqueue *aq, int index)\n       if (DEBUG_QUEUES)\n \tGCN_DEBUG (\"Async thread %d:%d: Executing launch entry (%d)\\n\",\n \t\t   aq->agent->device_id, aq->id, index);\n-      run_kernel (entry->u.launch.kernel,\n-\t\t  entry->u.launch.vars,\n+      run_kernel (entry->u.launch.session,\n+\t\t  entry->u.launch.kernel,\n \t\t  &entry->u.launch.kla, aq, false);\n       if (DEBUG_QUEUES)\n \tGCN_DEBUG (\"Async thread %d:%d: Executing launch entry (%d) done\\n\",\n@@ -2983,8 +3066,10 @@ wait_for_queue_nonfull (struct goacc_asyncqueue *aq)\n    kernel to run.  */\n \n static void\n-queue_push_launch (struct goacc_asyncqueue *aq, struct kernel_info *kernel,\n-\t\t   void *vars, struct GOMP_kernel_launch_attributes *kla)\n+queue_push_launch (struct goacc_asyncqueue *aq,\n+\t\t   struct gomp_offload_session *session,\n+\t\t   struct kernel_info *kernel,\n+\t\t   struct GOMP_kernel_launch_attributes *kla)\n {\n   assert (aq->agent == kernel->agent);\n \n@@ -2999,8 +3084,8 @@ queue_push_launch (struct goacc_asyncqueue *aq, struct kernel_info *kernel,\n \t       aq->id, queue_last);\n \n   aq->queue[queue_last].type = KERNEL_LAUNCH;\n+  aq->queue[queue_last].u.launch.session = session;\n   aq->queue[queue_last].u.launch.kernel = kernel;\n-  aq->queue[queue_last].u.launch.vars = vars;\n   aq->queue[queue_last].u.launch.kla = *kla;\n \n   aq->queue_n++;\n@@ -3401,8 +3486,8 @@ managed_heap_create (struct agent_info *agent, size_t size)\n /* Execute an OpenACC kernel, synchronously or asynchronously.  */\n \n static void\n-gcn_exec (struct kernel_info *kernel,\n-\t  void **devaddrs, unsigned *dims, void *targ_mem_desc, bool async,\n+gcn_exec (struct kernel_info *kernel, struct gomp_offload_session *session,\n+\t  unsigned *dims, void *targ_mem_desc, bool async,\n \t  struct goacc_asyncqueue *aq)\n {\n   if (!GOMP_OFFLOAD_can_run (kernel))\n@@ -3522,9 +3607,9 @@ gcn_exec (struct kernel_info *kernel,\n     }\n \n   if (!async)\n-    run_kernel (kernel, devaddrs, &kla, NULL, false);\n+    run_kernel (session, kernel, &kla, NULL, false);\n   else\n-    queue_push_launch (aq, kernel, devaddrs, &kla);\n+    queue_push_launch (aq, session, kernel, &kla);\n \n   if (profiling_dispatch_p)\n     {\n@@ -4096,7 +4181,10 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,\n   if (module->init_array_func)\n     {\n       init_kernel (module->init_array_func);\n-      run_kernel (module->init_array_func, NULL, &kla, NULL, false);\n+      struct gomp_offload_session session;\n+      init_session (&session, agent);\n+      GOMP_OFFLOAD_session_set_target_var_table (&session, NULL);\n+      run_kernel (&session, module->init_array_func, &kla, NULL, false);\n     }\n   module->constructors_run_p = true;\n \n@@ -5232,9 +5320,9 @@ GOMP_OFFLOAD_get_interop_type_desc (struct interop_obj_t *obj,\n    specified device.  */\n \n void\n-GOMP_OFFLOAD_run (int device, void *fn_ptr, void *vars, void **args)\n+GOMP_OFFLOAD_run (struct gomp_offload_session *session, void *fn_ptr, void **args)\n {\n-  struct agent_info *agent = get_agent_info (device);\n+  struct agent_info *agent = session->agent;\n   struct kernel_info *kernel = (struct kernel_info *) fn_ptr;\n   struct GOMP_kernel_launch_attributes def;\n   struct GOMP_kernel_launch_attributes *kla;\n@@ -5248,7 +5336,7 @@ GOMP_OFFLOAD_run (int device, void *fn_ptr, void *vars, void **args)\n       GCN_WARNING (\"Will not run GCN kernel because the grid size is zero\\n\");\n       return;\n     }\n-  run_kernel (kernel, vars, kla, NULL, false);\n+  run_kernel (session, kernel, kla, NULL, false);\n }\n \n /* Run an asynchronous OpenMP kernel on DEVICE.  This is similar to\n@@ -5256,11 +5344,13 @@ GOMP_OFFLOAD_run (int device, void *fn_ptr, void *vars, void **args)\n    GOMP_PLUGIN_target_task_completion when it has finished.  */\n \n void\n-GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,\n-\t\t\tvoid **args, void *async_data)\n+GOMP_OFFLOAD_async_run (struct gomp_offload_session *session,\n+\t\t\tvoid *tgt_fn,\n+\t\t\tvoid **args,\n+\t\t\tvoid *async_data)\n {\n   GCN_DEBUG (\"GOMP_OFFLOAD_async_run invoked\\n\");\n-  struct agent_info *agent = get_agent_info (device);\n+  struct agent_info *agent = session->agent;\n   struct kernel_info *kernel = (struct kernel_info *) tgt_fn;\n   struct GOMP_kernel_launch_attributes def;\n   struct GOMP_kernel_launch_attributes *kla;\n@@ -5278,7 +5368,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,\n   maybe_init_omp_async (agent);\n   if (!agent->omp_async_queue)\n     GOMP_PLUGIN_fatal (\"Asynchronous queue initialization failed\");\n-  queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);\n+  queue_push_launch (agent->omp_async_queue, session, kernel, kla);\n   queue_push_callback (agent->omp_async_queue,\n \t\t       GOMP_PLUGIN_target_task_completion, async_data);\n }\n@@ -5422,30 +5512,30 @@ GOMP_OFFLOAD_is_accessible_ptr (int device, const void *ptr, size_t size)\n    already-loaded KERNEL.  */\n \n void\n-GOMP_OFFLOAD_openacc_exec (void (*fn_ptr) (void *),\n+GOMP_OFFLOAD_openacc_exec (struct gomp_offload_session *session,\n+\t\t\t   void (*fn_ptr) (void *),\n \t\t\t   size_t mapnum __attribute__((unused)),\n \t\t\t   void **hostaddrs __attribute__((unused)),\n-\t\t\t   void **devaddrs, unsigned *dims,\n-\t\t\t   void *targ_mem_desc)\n+\t\t\t   unsigned *dims, void *targ_mem_desc)\n {\n   struct kernel_info *kernel = (struct kernel_info *) fn_ptr;\n \n-  gcn_exec (kernel, devaddrs, dims, targ_mem_desc, false, NULL);\n+  gcn_exec (kernel, session, dims, targ_mem_desc, false, NULL);\n }\n \n /* Run an asynchronous OpenACC kernel on the specified queue.  */\n \n void\n-GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),\n+GOMP_OFFLOAD_openacc_async_exec (struct gomp_offload_session *session,\n+\t\t\t\t void (*fn_ptr) (void *),\n \t\t\t\t size_t mapnum __attribute__((unused)),\n \t\t\t\t void **hostaddrs __attribute__((unused)),\n-\t\t\t\t void **devaddrs,\n \t\t\t\t unsigned *dims, void *targ_mem_desc,\n \t\t\t\t struct goacc_asyncqueue *aq)\n {\n   struct kernel_info *kernel = (struct kernel_info *) fn_ptr;\n \n-  gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);\n+  gcn_exec (kernel, session, dims, targ_mem_desc, true, aq);\n }\n \n /* Create a new asynchronous thread and queue for running future kernels.  */\ndiff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c\nindex a540e9d4cce8..a0100ec3f97a 100644\n--- a/libgomp/plugin/plugin-nvptx.c\n+++ b/libgomp/plugin/plugin-nvptx.c\n@@ -828,6 +828,33 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,\n   return true;\n }\n \n+/* The NVPTX plugin can't make much use of this abstraction, so it has the bare\n+   minimum possible.  */\n+struct gomp_offload_session\n+{\n+  int device;\n+  void **target_var_table;\n+};\n+GOMP_OFFLOAD_session_boilerplate();\n+\n+void\n+GOMP_OFFLOAD_session_start (struct gomp_offload_session *session, int device)\n+{\n+  assert ((((uintptr_t) session) % __BIGGEST_ALIGNMENT__) == 0);\n+  *session = (struct gomp_offload_session) {\n+    .device = device,\n+    .target_var_table = NULL,\n+  };\n+}\n+\n+void\n+GOMP_OFFLOAD_session_set_target_var_table (struct gomp_offload_session *session,\n+\t\t\t\t\t   void **table)\n+{\n+  assert (!session->target_var_table);\n+  session->target_var_table = table;\n+}\n+\n static void\n nvptx_exec (void (*fn), unsigned *dims, void *targ_mem_desc,\n \t    CUdeviceptr dp, CUstream stream)\n@@ -1991,15 +2018,15 @@ GOMP_OFFLOAD_page_locked_host_free (void *ptr)\n }\n \n void\n-GOMP_OFFLOAD_openacc_exec (void (*fn) (void *),\n+GOMP_OFFLOAD_openacc_exec (struct gomp_offload_session *session,\n+\t\t\t   void (*fn) (void *),\n \t\t\t   size_t mapnum  __attribute__((unused)),\n \t\t\t   void **hostaddrs __attribute__((unused)),\n-\t\t\t   void **devaddrs,\n \t\t\t   unsigned *dims, void *targ_mem_desc)\n {\n   GOMP_PLUGIN_debug (0, \"nvptx %s\\n\", __FUNCTION__);\n \n-  CUdeviceptr dp = (CUdeviceptr) devaddrs;\n+  CUdeviceptr dp = (CUdeviceptr) session->target_var_table;\n   nvptx_exec (fn, dims, targ_mem_desc, dp, NULL);\n \n   CUresult r = CUDA_CALL_NOCHECK (cuStreamSynchronize, NULL);\n@@ -2012,16 +2039,16 @@ GOMP_OFFLOAD_openacc_exec (void (*fn) (void *),\n }\n \n void\n-GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *),\n+GOMP_OFFLOAD_openacc_async_exec (struct gomp_offload_session *session,\n+\t\t\t\t void (*fn) (void *),\n \t\t\t\t size_t mapnum __attribute__((unused)),\n \t\t\t\t void **hostaddrs __attribute__((unused)),\n-\t\t\t\t void **devaddrs,\n \t\t\t\t unsigned *dims, void *targ_mem_desc,\n \t\t\t\t struct goacc_asyncqueue *aq)\n {\n   GOMP_PLUGIN_debug (0, \"nvptx %s\\n\", __FUNCTION__);\n \n-  CUdeviceptr dp = (CUdeviceptr) devaddrs;\n+  CUdeviceptr dp = (CUdeviceptr) session->target_var_table;\n   nvptx_exec (fn, dims, targ_mem_desc, dp, aq->cuda_stream);\n }\n \n@@ -2957,7 +2984,7 @@ GOMP_OFFLOAD_get_interop_type_desc (struct interop_obj_t *obj,\n }\n \n void\n-GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)\n+GOMP_OFFLOAD_run (struct gomp_offload_session *session, void *tgt_fn, void **args)\n {\n   struct targ_fn_descriptor *tgt_fn_desc\n     = (struct targ_fn_descriptor *) tgt_fn;\n@@ -2965,7 +2992,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)\n   const struct targ_fn_launch *launch = tgt_fn_desc->launch;\n   const char *fn_name = launch->fn;\n   CUresult r;\n-  struct ptx_device *ptx_dev = ptx_devices[ord];\n+  struct ptx_device *ptx_dev = ptx_devices[session->device];\n   const char *maybe_abort_msg = \"(perhaps abort was called)\";\n   int teams = 0, threads = 0;\n \n@@ -3003,7 +3030,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)\n \n   pthread_mutex_lock (&ptx_dev->omp_stacks.lock);\n   void *stacks = nvptx_stacks_acquire (ptx_dev, stack_size, teams * threads);\n-  void *fn_args[] = {tgt_vars, stacks, (void *) stack_size};\n+  void *fn_args[] = {session->target_var_table, stacks, (void *) stack_size};\n   size_t fn_args_size = sizeof fn_args;\n   void *config[] = {\n     CU_LAUNCH_PARAM_BUFFER_POINTER, fn_args,\ndiff --git a/libgomp/target.c b/libgomp/target.c\nindex d562b0493eac..8f1612b74824 100644\n--- a/libgomp/target.c\n+++ b/libgomp/target.c\n@@ -1173,14 +1173,27 @@ gomp_present_fatal (void *addr, size_t size, struct gomp_device_descr *devicep)\n #endif\n }\n \n+/* Get size of region required for target variable table for MAPNUM\n+   mappings.  */\n+\n+static inline size_t\n+gomp_get_tvt_size (size_t mapnum)\n+{\n+  return mapnum * sizeof (void *);\n+}\n+\n static inline __attribute__((always_inline)) struct target_mem_desc *\n gomp_map_vars_internal (struct gomp_device_descr *devicep,\n \t\t\tstruct goacc_asyncqueue *aq, size_t mapnum,\n \t\t\tvoid **hostaddrs, void **devaddrs, size_t *sizes,\n \t\t\tvoid *kinds, bool short_mapkind,\n \t\t\thtab_t *refcount_set,\n-\t\t\tenum gomp_map_vars_kind pragma_kind)\n+\t\t\tenum gomp_map_vars_kind pragma_kind,\n+\t\t\tstruct gomp_offload_session *session)\n {\n+  bool target_p = pragma_kind & GOMP_MAP_VARS_TARGET;\n+  assert (/* SESSION must be present iff doing target offload.  */\n+\t  !!session == target_p);\n   size_t i, tgt_align, tgt_size, not_found_cnt = 0;\n   bool has_firstprivate = false;\n   bool has_always_ptrset = false;\n@@ -1206,32 +1219,12 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,\n     {\n       tgt->tgt_start = 0;\n       tgt->tgt_end = 0;\n+      if (session)\n+\t/* We promise to always call either set_tvt or alloc_tvt.  */\n+\tdevicep->session.set_tvt_func (session, NULL);\n       return tgt;\n     }\n \n-  tgt_align = sizeof (void *);\n-  tgt_size = 0;\n-  cbuf.chunks = NULL;\n-  cbuf.chunk_cnt = -1;\n-  cbuf.use_cnt = 0;\n-  cbuf.buf = NULL;\n-  if (mapnum > 1 || (pragma_kind & GOMP_MAP_VARS_TARGET))\n-    {\n-      size_t chunks_size = (mapnum + 1) * sizeof (struct gomp_coalesce_chunk);\n-      cbuf.chunks = (struct gomp_coalesce_chunk *) gomp_alloca (chunks_size);\n-      cbuf.chunk_cnt = 0;\n-    }\n-  if (pragma_kind & GOMP_MAP_VARS_TARGET)\n-    {\n-      size_t align = 4 * sizeof (void *);\n-      tgt_align = align;\n-      tgt_size = mapnum * sizeof (void *);\n-      cbuf.chunk_cnt = 1;\n-      cbuf.use_cnt = 1 + (mapnum > 1);\n-      cbuf.chunks[0].start = 0;\n-      cbuf.chunks[0].end = tgt_size;\n-    }\n-\n   gomp_mutex_lock (&devicep->lock);\n   if (devicep->state == GOMP_DEVICE_FINALIZED)\n     {\n@@ -1240,6 +1233,48 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,\n       return NULL;\n     }\n \n+  /* Target variable table in host memory.  If we're doing target offloading,\n+     we'll let the plugin attempt to allocate it..  */\n+  void **host_tvt = NULL;\n+  if (target_p && devicep->session.alloc_tvt_func)\n+    {\n+      host_tvt = (devicep->session.alloc_tvt_func\n+\t\t  (session, gomp_get_tvt_size (mapnum)));\n+      assert ((((uintptr_t)host_tvt) % __BIGGEST_ALIGNMENT__) == 0);\n+    }\n+\n+  /* True if we need to allocate the target var table, i.e. when doing\n+     offloading and when we fail to allocate it above.  */\n+  bool need_tvt_alloc = !host_tvt && target_p;\n+\n+  /* Initialize size tracking variables.  */\n+  tgt_align = sizeof (void *);\n+  tgt_size = 0;\n+\n+  /* Prepare coalesce buffer.  */\n+  cbuf.chunks = NULL;\n+  cbuf.chunk_cnt = -1;\n+  cbuf.use_cnt = 0;\n+  cbuf.buf = NULL;\n+  if (mapnum > 1 || !host_tvt)\n+    {\n+      size_t chunks_size = (mapnum + 1) * sizeof (struct gomp_coalesce_chunk);\n+      cbuf.chunks = (struct gomp_coalesce_chunk *) gomp_alloca (chunks_size);\n+      cbuf.chunk_cnt = 0;\n+    }\n+\n+  if (need_tvt_alloc)\n+    {\n+      /* Prepare for allocating the target variable table.  */\n+      size_t align = 4 * sizeof (void *);\n+      tgt_align = align;\n+      tgt_size = gomp_get_tvt_size (mapnum);\n+      cbuf.chunk_cnt = 1;\n+      cbuf.use_cnt = 1 + (mapnum > 1);\n+      cbuf.chunks[0].start = 0;\n+      cbuf.chunks[0].end = tgt_size;\n+    }\n+\n   for (i = 0; i < mapnum; i++)\n     {\n       int kind = get_kind (short_mapkind, kinds, i);\n@@ -1496,7 +1531,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,\n       tgt->tgt_start = (uintptr_t) tgt->to_free;\n       tgt->tgt_end = tgt->tgt_start + sizes[0];\n     }\n-  else if (not_found_cnt || (pragma_kind & GOMP_MAP_VARS_TARGET))\n+  else if (not_found_cnt || need_tvt_alloc || has_firstprivate)\n     {\n       /* Allocate tgt_align aligned tgt_size block of memory.  */\n       /* FIXME: Perhaps change interface to allocate properly aligned\n@@ -1534,8 +1569,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,\n     }\n \n   tgt_size = 0;\n-  if (pragma_kind & GOMP_MAP_VARS_TARGET)\n-    tgt_size = mapnum * sizeof (void *);\n+  if (need_tvt_alloc)\n+    tgt_size = gomp_get_tvt_size (mapnum);\n \n   tgt->array = NULL;\n   if (not_found_cnt || has_firstprivate || has_always_ptrset)\n@@ -2068,7 +2103,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,\n \t  }\n     }\n \n-  if (pragma_kind & GOMP_MAP_VARS_TARGET)\n+  if (target_p)\n     {\n       /* The target variables table is constructed with maps using iterators\n \t unexpanded. Now that the iterator maps are expanded, we will need to\n@@ -2080,12 +2115,24 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,\n \tif (!iterator_count || iterator_count[i] <= 1)\n \t  {\n \t    cur_node.tgt_offset = gomp_map_val (tgt, hostaddrs, i);\n-\t    gomp_copy_host2dev (devicep, aq,\n-\t\t\t\t(void *) (tgt->tgt_start + map_num * sizeof (void *)),\n-\t\t\t\t(void *) &cur_node.tgt_offset, sizeof (void *),\n-\t\t\t\ttrue, cbufp);\n+\t    if (host_tvt)\n+\t      /* In this case, it's on the host.  */\n+\t      memcpy (&host_tvt[map_num], &cur_node.tgt_offset,\n+\t\t      sizeof (void *));\n+\t    else\n+\t      /* Otherwise, the table is on the device.  */\n+\t      gomp_copy_host2dev (devicep, aq,\n+\t\t\t\t  (void *) (tgt->tgt_start + map_num * sizeof (void *)),\n+\t\t\t\t  (void *) &cur_node.tgt_offset, sizeof (void *),\n+\t\t\t\t  true, cbufp);\n \t    map_num++;\n \t  }\n+\n+      if (!host_tvt)\n+\t/* The call to GOMP_OFFLOAD_session_allocate_target_var_table failed,\n+\t   so we must inform the session about the target var table we\n+\t   allocated.  */\n+\tdevicep->session.set_tvt_func (session, (void **) tgt->tgt_start);\n     }\n \n   if (cbufp)\n@@ -2133,7 +2180,8 @@ static struct target_mem_desc *\n gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,\n \t       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,\n \t       bool short_mapkind, htab_t *refcount_set,\n-\t       enum gomp_map_vars_kind pragma_kind)\n+\t       enum gomp_map_vars_kind pragma_kind,\n+\t       struct gomp_offload_session *session)\n {\n   /* This management of a local refcount_set is for convenience of callers\n      who do not share a refcount_set over multiple map/unmap uses.  */\n@@ -2147,7 +2195,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,\n   struct target_mem_desc *tgt;\n   tgt = gomp_map_vars_internal (devicep, NULL, mapnum, hostaddrs, devaddrs,\n \t\t\t\tsizes, kinds, short_mapkind, refcount_set,\n-\t\t\t\tpragma_kind);\n+\t\t\t\tpragma_kind, session);\n   if (local_refcount_set)\n     htab_free (local_refcount_set);\n \n@@ -2159,11 +2207,13 @@ goacc_map_vars (struct gomp_device_descr *devicep,\n \t\tstruct goacc_asyncqueue *aq, size_t mapnum,\n \t\tvoid **hostaddrs, void **devaddrs, size_t *sizes,\n \t\tvoid *kinds, bool short_mapkind,\n-\t\tenum gomp_map_vars_kind pragma_kind)\n+\t\tenum gomp_map_vars_kind pragma_kind,\n+\t\tstruct gomp_offload_session *session)\n {\n   return gomp_map_vars_internal (devicep, aq, mapnum, hostaddrs, devaddrs,\n \t\t\t\t sizes, kinds, short_mapkind, NULL,\n-\t\t\t\t GOMP_MAP_VARS_OPENACC | pragma_kind);\n+\t\t\t\t GOMP_MAP_VARS_OPENACC | pragma_kind,\n+\t\t\t\t session);\n }\n \n static void\n@@ -3200,12 +3250,14 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,\n       || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))\n     return gomp_target_fallback (fn, hostaddrs, devicep, NULL);\n \n+  struct gomp_offload_session *session = (gomp_offload_session_new\n+\t\t\t\t\t  (devicep, alloca));\n+\n   htab_t refcount_set = htab_create (mapnum);\n   struct target_mem_desc *tgt_vars\n     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,\n-\t\t     &refcount_set, GOMP_MAP_VARS_TARGET);\n-  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start,\n-\t\t     NULL);\n+\t\t     &refcount_set, GOMP_MAP_VARS_TARGET, session);\n+  devicep->run_func (session, fn_addr, NULL);\n   htab_clear (refcount_set);\n   gomp_unmap_vars (tgt_vars, true, &refcount_set);\n   htab_free (refcount_set);\n@@ -3524,6 +3576,9 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,\n   struct target_mem_desc *tgt_vars;\n   htab_t refcount_set = NULL;\n \n+  struct gomp_offload_session *session = (gomp_offload_session_new\n+\t\t\t\t\t  (devicep, alloca));\n+\n   if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)\n     {\n       if (!fpc_done)\n@@ -3538,16 +3593,16 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,\n \t    }\n \t}\n       tgt_vars = NULL;\n+      devicep->session.set_tvt_func (session, hostaddrs);\n     }\n   else\n     {\n       refcount_set = htab_create (mapnum);\n       tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds,\n-\t\t\t\ttrue, &refcount_set, GOMP_MAP_VARS_TARGET);\n+\t\t\t\ttrue, &refcount_set, GOMP_MAP_VARS_TARGET,\n+\t\t\t\tsession);\n     }\n-  devicep->run_func (devicep->target_id, fn_addr,\n-\t\t     tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,\n-\t\t     new_args);\n+  devicep->run_func (session, fn_addr, new_args);\n   if (tgt_vars)\n     {\n       htab_clear (refcount_set);\n@@ -4146,7 +4201,7 @@ gomp_target_data_fallback (struct gomp_device_descr *devicep)\n          would get out of sync.  */\n       struct target_mem_desc *tgt\n \t= gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false,\n-\t\t\t NULL, GOMP_MAP_VARS_DATA);\n+\t\t\t NULL, GOMP_MAP_VARS_DATA, NULL);\n       tgt->prev = icv->target_data;\n       icv->target_data = tgt;\n     }\n@@ -4165,7 +4220,7 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,\n \n   struct target_mem_desc *tgt\n     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,\n-\t\t     NULL, GOMP_MAP_VARS_DATA);\n+\t\t     NULL, GOMP_MAP_VARS_DATA, NULL);\n   struct gomp_task_icv *icv = gomp_icv (true);\n   tgt->prev = icv->target_data;\n   icv->target_data = tgt;\n@@ -4184,7 +4239,7 @@ GOMP_target_data_ext (int device, size_t mapnum, void **hostaddrs,\n \n   struct target_mem_desc *tgt\n     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,\n-\t\t     NULL, GOMP_MAP_VARS_DATA);\n+\t\t     NULL, GOMP_MAP_VARS_DATA, NULL);\n   struct gomp_task_icv *icv = gomp_icv (true);\n   tgt->prev = icv->target_data;\n   icv->target_data = tgt;\n@@ -4509,7 +4564,7 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,\n \t{\n \t  gomp_map_vars (devicep, sizes[i] + 1, &hostaddrs[i], NULL, &sizes[i],\n \t\t\t &kinds[i], true, &refcount_set,\n-\t\t\t GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t GOMP_MAP_VARS_ENTER_DATA, NULL);\n \t  i += sizes[i];\n \t}\n       else if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET)\n@@ -4520,7 +4575,7 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,\n \t      break;\n \t  gomp_map_vars (devicep, j-i, &hostaddrs[i], NULL, &sizes[i],\n \t\t\t &kinds[i], true, &refcount_set,\n-\t\t\t GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t GOMP_MAP_VARS_ENTER_DATA, NULL);\n \t  i += j - i - 1;\n \t}\n       else if (i + 1 < mapnum\n@@ -4531,12 +4586,12 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,\n \t  /* An attach operation must be processed together with the mapped\n \t     base-pointer list item.  */\n \t  gomp_map_vars (devicep, 2, &hostaddrs[i], NULL, &sizes[i], &kinds[i],\n-\t\t\t true, &refcount_set, GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t true, &refcount_set, GOMP_MAP_VARS_ENTER_DATA, NULL);\n \t  i += 1;\n \t}\n       else\n \tgomp_map_vars (devicep, 1, &hostaddrs[i], NULL, &sizes[i], &kinds[i],\n-\t\t       true, &refcount_set, GOMP_MAP_VARS_ENTER_DATA);\n+\t\t       true, &refcount_set, GOMP_MAP_VARS_ENTER_DATA, NULL);\n   else\n     gomp_exit_data (devicep, mapnum, hostaddrs, sizes, kinds, &refcount_set);\n   htab_free (refcount_set);\n@@ -4569,24 +4624,23 @@ gomp_target_task_fn (void *data)\n \t  return false;\n \t}\n \n-      void *actual_arguments;\n+      struct gomp_offload_session *session = ttask->offload_session;\n+      devicep->session.start_func (session, devicep->target_id);\n+\n       if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)\n \t{\n \t  ttask->tgt = NULL;\n-\t  actual_arguments = ttask->hostaddrs;\n+\t  devicep->session.set_tvt_func (session, ttask->hostaddrs);\n \t}\n       else\n-\t{\n-\t  ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs,\n-\t\t\t\t      NULL, ttask->sizes, ttask->kinds, true,\n-\t\t\t\t      NULL, GOMP_MAP_VARS_TARGET);\n-\t  actual_arguments = (void *) ttask->tgt->tgt_start;\n-\t}\n+\tttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs,\n+\t\t\t\t    NULL, ttask->sizes, ttask->kinds, true,\n+\t\t\t\t    NULL, GOMP_MAP_VARS_TARGET,\n+\t\t\t\t    session);\n       ttask->state = GOMP_TARGET_TASK_READY_TO_RUN;\n \n       assert (devicep->async_run_func);\n-      devicep->async_run_func (devicep->target_id, fn_addr, actual_arguments,\n-\t\t\t       ttask->args, (void *) ttask);\n+      devicep->async_run_func (session, fn_addr, ttask->args, (void *) ttask);\n       return true;\n     }\n   else if (devicep == NULL\n@@ -4608,13 +4662,13 @@ gomp_target_task_fn (void *data)\n \t    {\n \t      gomp_map_vars (devicep, ttask->sizes[i] + 1, &ttask->hostaddrs[i],\n \t\t\t     NULL, &ttask->sizes[i], &ttask->kinds[i], true,\n-\t\t\t     &refcount_set, GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t     &refcount_set, GOMP_MAP_VARS_ENTER_DATA, NULL);\n \t      i += ttask->sizes[i];\n \t    }\n \t  else\n \t    gomp_map_vars (devicep, 1, &ttask->hostaddrs[i], NULL, &ttask->sizes[i],\n \t\t\t   &ttask->kinds[i], true, &refcount_set,\n-\t\t\t   GOMP_MAP_VARS_ENTER_DATA);\n+\t\t\t   GOMP_MAP_VARS_ENTER_DATA, NULL);\n       else\n \tgomp_exit_data (devicep, ttask->mapnum, ttask->hostaddrs, ttask->sizes,\n \t\t\tttask->kinds, &refcount_set);\n@@ -6020,8 +6074,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,\n   /* Check if all required functions are available in the plugin and store\n      their handlers.  None of the symbols can legitimately be NULL,\n      so we don't need to check dlerror all the time.  */\n-#define DLSYM(f)\t\t\t\t\t\t\t\\\n-  if (!(device->f##_func = dlsym (plugin_handle, \"GOMP_OFFLOAD_\" #f)))\t\\\n+#define DLSYM(f) DLSYM2(f, f)\n+#define DLSYM2(f, n)\t\t\t\t\t\t\t\\\n+  if (!(device->f##_func = dlsym (plugin_handle, \"GOMP_OFFLOAD_\" #n)))\t\\\n     goto dl_fail\n   /* Similar, but missing functions are not an error.  Return false if\n      failed, true otherwise.  */\n@@ -6065,8 +6120,15 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,\n     }\n \n   device->capabilities = device->get_caps_func ();\n+  device->session.size = 0;\n   if (device->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)\n     {\n+      DLSYM2 (session.size, session_size);\n+      device->session.size = device->session.size_func ();\n+      DLSYM2 (session.start, session_start);\n+      DLSYM_OPT (session.alloc_tvt, session_allocate_target_var_table);\n+      DLSYM2 (session.set_tvt, session_set_target_var_table);\n+\n       DLSYM (run);\n       DLSYM_OPT (async_run, async_run);\n       DLSYM_OPT (can_run, can_run);\n@@ -6114,6 +6176,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,\n \t}\n     }\n #undef DLSYM\n+#undef DLSYM2\n #undef DLSYM_OPT\n \n   return 1;\ndiff --git a/libgomp/task.c b/libgomp/task.c\nindex cbba28516e3f..89dafb872208 100644\n--- a/libgomp/task.c\n+++ b/libgomp/task.c\n@@ -936,12 +936,25 @@ gomp_create_target_task (struct gomp_device_descr *devicep,\n \t}\n     }\n \n-  task = gomp_malloc (sizeof (*task) + depend_size\n-\t\t      + sizeof (*ttask)\n-\t\t      + args_cnt * sizeof (void *)\n-\t\t      + mapnum * (sizeof (void *) + sizeof (size_t)\n-\t\t\t\t  + sizeof (unsigned short))\n-\t\t      + tgt_size);\n+  size_t task_alloc_size = (sizeof (*task) + depend_size\n+\t\t       + sizeof (*ttask)\n+\t\t       + args_cnt * sizeof (void *)\n+\t\t       + mapnum * (sizeof (void *) + sizeof (size_t)\n+\t\t\t\t   + sizeof (unsigned short))\n+\t\t       + tgt_size);\n+  size_t session_start_offset = 0;\n+  if (devicep && devicep->session.size)\n+    {\n+      /* gomp_malloc always aligns to __BIGGEST_ALIGNMENT__, so, we can just\n+\t round up the size to preserve that alignment...  */\n+      size_t align = __BIGGEST_ALIGNMENT__ - 1;\n+      task_alloc_size = (task_alloc_size + align) & ~align;\n+      session_start_offset = task_alloc_size;\n+\n+      /* ... and reserve enough room.  */\n+      task_alloc_size += devicep->session.size;\n+    }\n+  task = gomp_malloc (task_alloc_size);\n   gomp_init_task (task, parent, gomp_icv (false));\n   task->priority = 0;\n   task->kind = GOMP_TASK_WAITING;\n@@ -951,6 +964,14 @@ gomp_create_target_task (struct gomp_device_descr *devicep,\n   ttask->devicep = devicep;\n   ttask->fn = fn;\n   ttask->mapnum = mapnum;\n+\n+  ttask->offload_session = NULL;\n+  if (session_start_offset)\n+    {\n+      uintptr_t session_ptr = (uintptr_t) task + session_start_offset;\n+      ttask->offload_session = (void *) session_ptr;\n+    }\n+\n   memcpy (ttask->hostaddrs, hostaddrs, mapnum * sizeof (void *));\n   if (args_cnt)\n     {\ndiff --git a/libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c b/libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c\nnew file mode 100644\nindex 000000000000..7494c5a5f4c8\n--- /dev/null\n+++ b/libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c\n@@ -0,0 +1,51 @@\n+/* { dg-do run } */\n+/* { dg-require-effective-target offload_device_gcn } */\n+#include <assert.h>\n+#include <stdio.h>\n+#include <stdlib.h>\n+\n+int\n+main ()\n+{\n+  setenv (\"GCN_DEBUG\", \"1\", true);\n+#ifdef INHIBIT_KERNARGS_MERGE\n+  /* See gcn-kernel-launch-tvt-alloc.c  */\n+  setenv (\"GCN_INHIBIT_KERNARGS_TVT_MERGE\", \"1\", true);\n+#endif\n+\n+  int i[1] = { 0 };\n+\n+#pragma omp target data map(tofrom: i[0:1])\n+  {\n+    fprintf (stderr, \"================\\n\");\n+\n+#pragma omp target\n+    { i[0] = 1; }\n+  }\n+\n+  assert (i[0] == 1);\n+}\n+\n+/* Here, we want to ensure that we have no allocations after the point\n+   delimited by ===...\n+\n+   Past that point, the only data to map onto the device is the target\n+   variable table, which should be passed as kernel arguments.  The GCN plugin\n+   currently does not log allocating those.  We rely on that here.\n+\n+   So, dg-output lets us match the entire output with a regex.  Multiple\n+   dg-output invocations will have their regexes concatenated in order.  The\n+   following is that regex, broken down by function:\n+\n+   Ignore ===... marker and everything before it.\n+     { dg-output {^.*================[\\r\\n]+} }\n+   Then, each further line is either...\n+     { dg-output {((} }\n+   ... a line not starting with \"GCN debug: \"...\n+     { dg-output {(?!GCN debug:)[^\\r\\n]+} }\n+   ... or a \"GCN debug: ...\" line that is not an allocation:\n+     { dg-output {|GCN debug: (?!Allocating )[^\\r\\n]*} }\n+   ... followed by a line terminator, of course.\n+     { dg-output {)[\\r\\n]+)*} }\n+   There should be nothing left.\n+     { dg-output {$} }  */\ndiff --git a/libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c b/libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c\nnew file mode 100644\nindex 000000000000..ab5ed2dc4336\n--- /dev/null\n+++ b/libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c\n@@ -0,0 +1,16 @@\n+/* { dg-do run } */\n+/* { dg-require-effective-target offload_device_gcn } */\n+\n+/* Test that the no-merge case still works.  */\n+\n+#define INHIBIT_KERNARGS_MERGE\n+#include \"./gcn-kernel-launch-no-tvt-alloc.c\"\n+\n+/* See commentary in <gcn-kernel-launch-no-tvt-alloc.c>.\n+\n+   Ignore ===... marker and everything before it.\n+     { dg-output {^.*================[\\r\\n]+} }\n+   We expect at least \"GCN debug: Allocating \\d+ bytes...\"\n+     { dg-output {.*[\\r\\n]+GCN debug: Allocating \\d+ bytes.*} }\n+   There should be nothing left.\n+     { dg-output {$} }  */\n",
    "prefixes": [
        "2/4"
    ]
}