{"id":2233009,"url":"http://patchwork.ozlabs.org/api/1.2/patches/2233009/?format=json","web_url":"http://patchwork.ozlabs.org/project/gcc/patch/20260505134929.3522938-4-aarsenovic@baylibre.com/","project":{"id":17,"url":"http://patchwork.ozlabs.org/api/1.2/projects/17/?format=json","name":"GNU Compiler Collection","link_name":"gcc","list_id":"gcc-patches.gcc.gnu.org","list_email":"gcc-patches@gcc.gnu.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20260505134929.3522938-4-aarsenovic@baylibre.com>","list_archive_url":null,"date":"2026-05-05T13:14:59","name":"[3/4] libgomp/plugin-gcn: remove unneeded heap allocation in run_kernel","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"787bca1a2281a7c821303940c39d134d885c466b","submitter":{"id":92125,"url":"http://patchwork.ozlabs.org/api/1.2/people/92125/?format=json","name":"Arsen Arsenović","email":"aarsenovic@baylibre.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/gcc/patch/20260505134929.3522938-4-aarsenovic@baylibre.com/mbox/","series":[{"id":502845,"url":"http://patchwork.ozlabs.org/api/1.2/series/502845/?format=json","web_url":"http://patchwork.ozlabs.org/project/gcc/list/?series=502845","date":"2026-05-05T13:14:59","name":"GCN: Target offload overhead improvements, batch 2","version":1,"mbox":"http://patchwork.ozlabs.org/series/502845/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2233009/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2233009/checks/","tags":{},"related":[],"headers":{"Return-Path":"<gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":["incoming@patchwork.ozlabs.org","gcc-patches@gcc.gnu.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","gcc-patches@gcc.gnu.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=baylibre-com.20251104.gappssmtp.com\n header.i=@baylibre-com.20251104.gappssmtp.com header.a=rsa-sha256\n header.s=20251104 header.b=Vh2Fskpf;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org\n (client-ip=38.145.34.32; helo=vm01.sourceware.org;\n envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;\n receiver=patchwork.ozlabs.org)","sourceware.org;\n\tdkim=pass (2048-bit key,\n unprotected) header.d=baylibre-com.20251104.gappssmtp.com\n header.i=@baylibre-com.20251104.gappssmtp.com header.a=rsa-sha256\n header.s=20251104 header.b=Vh2Fskpf","sourceware.org;\n dmarc=none (p=none dis=none) header.from=baylibre.com","sourceware.org; spf=pass smtp.mailfrom=baylibre.com","server2.sourceware.org;\n arc=none smtp.remote-ip=2a00:1450:4864:20::52f"],"Received":["from vm01.sourceware.org (vm01.sourceware.org [38.145.34.32])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g90Jy3FRMz1yJV\n\tfor <incoming@patchwork.ozlabs.org>; Tue, 05 May 2026 23:50:42 +1000 (AEST)","from vm01.sourceware.org (localhost [127.0.0.1])\n\tby sourceware.org (Postfix) with ESMTP id 9F0EC4BA900F\n\tfor <incoming@patchwork.ozlabs.org>; Tue,  5 May 2026 13:50:40 +0000 (GMT)","from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com\n [IPv6:2a00:1450:4864:20::52f])\n by sourceware.org (Postfix) with ESMTPS id 97C834BA79BC\n for <gcc-patches@gcc.gnu.org>; Tue,  5 May 2026 13:50:12 +0000 (GMT)","by mail-ed1-x52f.google.com with SMTP id\n 4fb4d7f45d1cf-67c1e0229acso4361497a12.1\n for <gcc-patches@gcc.gnu.org>; Tue, 05 May 2026 06:50:12 -0700 (PDT)","from localhost ([146.70.193.12])\n by smtp.googlemail.com with ESMTPSA id\n 4fb4d7f45d1cf-67cd90fe127sm460130a12.13.2026.05.05.06.50.10\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Tue, 05 May 2026 06:50:10 -0700 (PDT)"],"DKIM-Filter":["OpenDKIM Filter v2.11.0 sourceware.org 9F0EC4BA900F","OpenDKIM Filter v2.11.0 sourceware.org 97C834BA79BC"],"DMARC-Filter":"OpenDMARC Filter v1.4.2 sourceware.org 97C834BA79BC","ARC-Filter":"OpenARC Filter v1.0.0 sourceware.org 97C834BA79BC","ARC-Seal":"i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1777989012; cv=none;\n b=u7OAtvI/rWtKmz7qqv2qqd/am4f/Lrq8X8oO9LyrYhu17r53qPnyXM/QDeVadm6DKM+YgRb73CE/y+9ZPWU8uEXK4jk25R4iFnNOAje6UmROjnlpY7aVWDe56eQkckjJEVn8CsdKYixvJjzpPr9x5qKEUAfhhh8UFWhcL3qIu50=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=sourceware.org; s=key;\n t=1777989012; c=relaxed/simple;\n bh=qHsv2dbPojc9FH5UB0DAdU0PURJR5AgVguBoQp3tJbU=;\n h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;\n b=g0sYF186kkA7VOk97NIfMFQ6Mec1/eXpv4kS84p72Qr23tfQ7CsODwKJTxMXhAFSGAsvJytkbFZv5TkIbnmjfbm8QUCG0NN6dcmJCx9oj0U9LZ4kfIxUmp1kj+Pqj0qWvOI3H2+kFb4n8hypWMep+1jeahlQI1JnJ/R9sDXEf7s=","ARC-Authentication-Results":"i=1; server2.sourceware.org","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=baylibre-com.20251104.gappssmtp.com; s=20251104; t=1777989011;\n x=1778593811;\n darn=gcc.gnu.org;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:from:to:cc:subject:date\n :message-id:reply-to;\n bh=+3pT8bSaUC/hgxwxbw9Rr/p4UMy6Dv39SfGE1pptXfA=;\n b=Vh2FskpfvJbmujpUagQ2Sf93wAHfXsqeA+LQx6BKWJ8yCfsA62hBngk0f1ixUfLu/f\n AaBKTO8ZiJIakrFHqoYSu8xOssqnBD6PDtjDEpgP6vL1ly43YIwtLltrIbDrwSj9Mg+a\n 05R0yd0juwi2ODa0nTlGWl50932eG+k7pBhpB5TljqnnkizWmyXB34ChCY7I25CNFquO\n EOi5F0YOe7bGxRA4zUpDqzeFagBF8OmCx0Mo9uFQKvu8bI2uTs9yIrlsUnqrey+dIAHr\n hU+PSCb7bsm1jI9YTKyQ11w8kOodEzlfLu3s9uEfPVL92W3UXrkodZk9H9zglGkvnLVv\n 7w7w==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1777989011; x=1778593811;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from\n :to:cc:subject:date:message-id:reply-to;\n bh=+3pT8bSaUC/hgxwxbw9Rr/p4UMy6Dv39SfGE1pptXfA=;\n b=N1KQH2edlRs/cU3BbO3py5kTDHV8peQdmAWnVqcTEoAU+bGPaXIAGaOQQ4RVQ0NDVR\n AG4LThkZ7b4MVh0/DHMZ7jJi4Oob8Nq6N/Ny25xySZAkZlrayLpBoK8SSCQ8cJZ9vicP\n s5rHjVs6EArBmKoqCnrmgTlR0YnMmnTb4q3VuKfTtNFu33c5mkMjOzjfDmEF2BQqIuL4\n PJPIQdqKW9q6upqrX3C+HYKqdiPAxCkj4ErTtdRCu29zNsD9jS4KXpGTLCVOJ/g/yUGX\n PMxnLB9u/hfV4WGunsv+40FTHf5KBE7U3h6zwmgR6ooVF7XQYSQV47BBe+Dgg1IiA0Tf\n MAbw==","X-Gm-Message-State":"AOJu0Ywg4kYr731l0TzB9JZra9xq9jo2TPqRDvBxp0lciB+6ySS5/F1L\n kqo+pw/DpijBq1Dgl6zPGYxxsuQj5YjRTt/VWBhfAzN40JiY2REKPHio+aGId9p/cXpI5F0G6cr\n 2N60jFNw=","X-Gm-Gg":"AeBDiet+Wwi03ebGGIh18FaMskDW9l/XQkTXFfyWAjhr0t008ExDSE9atI0g80zo6Nj\n imfPmZ77CbHl0cDG2MT6zl3af93xh9dcOQ/11LyFNxrDnhB/R5PSwi6RZ5maQVpQaIJ350JRDs0\n e0rw+rlVcMa1VaPJiuVgNV1+ySkETECRGXMxyhVEeXO/aUS2fPgzUQVqs2quFyJZ6xYiBz9U3eT\n 0O5DvFBs/WjxIwbHZ1gzU11cL2g05Il+te06EsNiLTRQPUVA5pj1RbWU4qw67LK5fbTBe4GdMei\n G6mpkGLhNwUQeJtmkUiE04BxOtnZDsKNuNLRb8+ttLcif8F40vV07z8XNKE+PhQvlM9O9zIPjJH\n K7N414T2d74MDM1j0ZXcanJPjj1sf0mxIkvq8pnu+4EnKqiflS4P8dr5bVuMi/miaLSEAQ60MCo\n ba1qz78ArPcthnCQgcMAPR+dYUtuh83Ir/pJs=","X-Received":"by 2002:a05:6402:5043:b0:672:f3e:1475 with SMTP id\n 4fb4d7f45d1cf-67ccb9bfedbmr1414112a12.12.1777989011275;\n Tue, 05 May 2026 06:50:11 -0700 (PDT)","From":"=?utf-8?q?Arsen_Arsenovi=C4=87?= <aarsenovic@baylibre.com>","To":"gcc-patches@gcc.gnu.org","Cc":"=?utf-8?q?Arsen_Arsenovi=C4=87?= <aarsenovic@baylibre.com>","Subject":"[PATCH 3/4] libgomp/plugin-gcn: remove unneeded heap allocation in\n run_kernel","Date":"Tue,  5 May 2026 15:14:59 +0200","Message-ID":"<20260505134929.3522938-4-aarsenovic@baylibre.com>","X-Mailer":"git-send-email 2.54.0","In-Reply-To":"<20260505134929.3522938-1-aarsenovic@baylibre.com>","References":"<20260505134929.3522938-1-aarsenovic@baylibre.com>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit","X-BeenThere":"gcc-patches@gcc.gnu.org","X-Mailman-Version":"2.1.30","Precedence":"list","List-Id":"Gcc-patches mailing list <gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<https://gcc.gnu.org/mailman/options/gcc-patches>,\n <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>","List-Archive":"<https://gcc.gnu.org/pipermail/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-request@gcc.gnu.org?subject=help>","List-Subscribe":"<https://gcc.gnu.org/mailman/listinfo/gcc-patches>,\n <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>","Errors-To":"gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org"},"content":"So far, the GCN plugin has used a kernel_dispatch struct instance it\ncalls \"shadow\" to keep effectively a copy of part of the HSA dispatch\npacket before populating said packet.  It also allocated it on the heap.\n\nThis, at first glance, seems useless: why double up the data in a shadow\nwhen it's already in packet?\n\nBut, it serves a purpose.  The packet is owned by the HSA runtime.\nAfter dispatch, its contents are to be considered no longer accessible\nby the dispatcher (i.e. run_kernel).  So, we can't read back from it the\naddresses or handles of resources we allocated, and so, we can't clean\nthem up.\n\nHowever, this allocation doesn't need to happen on the heap.  It's of a\nknown fixed size, and its lifetime is the same as the lifetime of an\nautomatic variable.\n\nThis patch demotes the heap allocation into an automatic variable, and\nadds commentary to make it clear what the purpose of this \"shadow\" is.\nIn the end, the result of this patch is that the run_kernel hot path has\none fewer allocation.\n\nI've also taken the opportunity to do some very minor code cleanup.\n\nlibgomp/ChangeLog:\n\n\t* plugin/plugin-gcn.c (struct kernel_dispatch): Store\n\thsa_signal_t, rather than a uint64_t, so that we don't rely on\n\tknowledge of the contents of hsa_signal_t.\n\t(create_kernel_dispatch): Rename...\n\t(prepare_kernel_dispatch): ... to this, as it no longer creates\n\ta kernel dispatch.  The allocation that would've created it is\n\thoisted...\n\t(run_kernel): ... here, as an automatic variable.  Move logic\n\tthat copies the fields of kernel_dispatch...\n\t(populate_packet_from_dispatch): ... into this standalone\n\tfunction, to make it clearer.\n\t(release_kernel_dispatch): Rename....\n\t(cleanup_kernel_dispatch): ... to this, don't free 'shadow'.\n---\n libgomp/plugin/plugin-gcn.c | 64 ++++++++++++++++++++-----------------\n 1 file changed, 35 insertions(+), 29 deletions(-)","diff":"diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c\nindex 99ba65e14243..af35b06f83af 100644\n--- a/libgomp/plugin/plugin-gcn.c\n+++ b/libgomp/plugin/plugin-gcn.c\n@@ -274,8 +274,11 @@ struct GOMP_kernel_launch_attributes\n   uint32_t wdims[3];\n };\n \n-/* Collection of information needed for a dispatch of a kernel from a\n-   kernel.  */\n+/* Collection of information needed for a dispatch of a kernel from a kernel.\n+   Redundant with parts of hsa_kernel_dispatch_packet_t.  It is maintained\n+   separately because the contents of the aforementioned packet become\n+   unspecified after dispatch, so, we can't re-read back pointers we wrote into\n+   the dispatch packet in order to clean them up.  */\n \n struct kernel_dispatch\n {\n@@ -285,7 +288,7 @@ struct kernel_dispatch\n   /* Kernel object.  */\n   uint64_t object;\n   /* Synchronization signal used for dispatch synchronization.  */\n-  uint64_t signal;\n+  hsa_signal_t signal;\n   /* Private segment size.  */\n   uint32_t private_segment_size;\n   /* Group segment size.  */\n@@ -2125,13 +2128,12 @@ alloc_by_agent (struct agent_info *agent, size_t size)\n /* Create kernel dispatch data structure for given KERNEL, along with\n    the necessary device signals and memory allocations.  */\n \n-static struct kernel_dispatch *\n-create_kernel_dispatch (struct kernel_info *kernel, int num_teams,\n-\t\t\tint num_threads, struct kernargs *kernargs)\n+static inline void\n+prepare_kernel_dispatch (struct kernel_dispatch *shadow,\n+\t\t\t struct kernel_info *kernel, int num_teams,\n+\t\t\t int num_threads, struct kernargs *kernargs)\n {\n   struct agent_info *agent = kernel->agent;\n-  struct kernel_dispatch *shadow\n-    = GOMP_PLUGIN_malloc_cleared (sizeof (struct kernel_dispatch));\n \n   shadow->agent = kernel->agent;\n   shadow->object = kernel->object;\n@@ -2141,7 +2143,7 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams,\n   if (status != HSA_STATUS_SUCCESS)\n     hsa_fatal (\"Error creating the GCN sync signal\", status);\n \n-  shadow->signal = sync_signal.handle;\n+  shadow->signal = sync_signal;\n   shadow->private_segment_size = kernel->private_segment_size;\n \n   if (lowlat_size < 0)\n@@ -2169,7 +2171,7 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams,\n   if (kernel->kernarg_segment_size > 8)\n     {\n       GOMP_PLUGIN_fatal (\"Unexpectedly large kernargs segment requested\");\n-      return NULL;\n+      return;\n     }\n \n   /* Zero-initialize the output_data (minimum needed).  */\n@@ -2190,8 +2192,19 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams,\n \n   /* Ensure we can recognize unset return values.  */\n   kernargs->output_data.return_value = 0xcafe0000;\n+}\n \n-  return shadow;\n+/* Copy information from DISPATCH into PACKET, to get it ready for\n+   dispatching.  */\n+\n+static inline void\n+populate_packet_from_dispatch (hsa_kernel_dispatch_packet_t *packet,\n+\t\t\t       struct kernel_dispatch *shadow)\n+{\n+  packet->private_segment_size = shadow->private_segment_size;\n+  packet->group_segment_size = shadow->group_segment_size;\n+  packet->kernel_object = shadow->object;\n+  packet->completion_signal = shadow->signal;\n }\n \n static void\n@@ -2265,7 +2278,7 @@ console_output (struct kernel_info *kernel, struct kernargs *kernargs,\n    and clean up the signal and memory allocations.  */\n \n static inline void\n-release_kernel_dispatch (struct kernel_dispatch *shadow,\n+cleanup_kernel_dispatch (struct kernel_dispatch *shadow,\n \t\t\t struct kernargs *kernargs)\n {\n   GCN_DEBUG (\"Released kernel dispatch: %p\\n\", shadow);\n@@ -2275,11 +2288,7 @@ release_kernel_dispatch (struct kernel_dispatch *shadow,\n     addr = (void *)kernargs->abi.stack_ptr;\n   release_ephemeral_memories (shadow->agent, addr);\n \n-  hsa_signal_t s;\n-  s.handle = shadow->signal;\n-  hsa_fns.hsa_signal_destroy_fn (s);\n-\n-  free (shadow);\n+  hsa_fns.hsa_signal_destroy_fn (shadow->signal);\n }\n \n /* Extract the properties from a kernel binary.  */\n@@ -2505,23 +2514,20 @@ run_kernel (struct gomp_offload_session *session,\n \t     packet->workgroup_size_x, packet->workgroup_size_y,\n \t     packet->workgroup_size_z);\n \n-  struct kernel_dispatch *shadow\n-    = create_kernel_dispatch (kernel, packet->grid_size_x,\n-\t\t\t      packet->grid_size_z, kernargs);\n-  shadow->queue = command_q;\n+  struct kernel_dispatch shadow;\n+  prepare_kernel_dispatch (&shadow, kernel, packet->grid_size_x,\n+\t\t\t   packet->grid_size_z, kernargs);\n+  shadow.queue = command_q;\n \n   if (debug)\n     {\n       fprintf (stderr, \"\\nKernel has following dependencies:\\n\");\n-      print_kernel_dispatch (shadow, 2, kernargs);\n+      print_kernel_dispatch (&shadow, 2, kernargs);\n     }\n \n-  packet->private_segment_size = shadow->private_segment_size;\n-  packet->group_segment_size = shadow->group_segment_size;\n-  packet->kernel_object = shadow->object;\n-  hsa_signal_t s;\n-  s.handle = shadow->signal;\n-  packet->completion_signal = s;\n+  populate_packet_from_dispatch (packet, &shadow);\n+\n+  hsa_signal_t s = shadow.signal;\n   hsa_fns.hsa_signal_store_relaxed_fn (s, 1);\n \n   GCN_DEBUG (\"Copying kernel runtime pointer %p to kernarg_address\\n\", session->target_var_table);\n@@ -2555,7 +2561,7 @@ run_kernel (struct gomp_offload_session *session,\n \n   unsigned int return_value = (unsigned int)kernargs->output_data.return_value;\n \n-  release_kernel_dispatch (shadow, kernargs);\n+  cleanup_kernel_dispatch (&shadow, kernargs);\n   release_session (session);\n \n   if (!module_locked && pthread_rwlock_unlock (&agent->module_rwlock))\n","prefixes":["3/4"]}