From patchwork Thu Nov 14 15:32:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kwok Cheung Yeung X-Patchwork-Id: 1194929 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-513422-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="V3jiRhAI"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DQTp2sSdz9sNT for ; Fri, 15 Nov 2019 02:32:46 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:references:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; q=dns; s= default; b=vTOxjAoCKElZiQqhPCIXpGkCJKVqwG5qGtZ3gZcztJIZ0cHH/5sQx Guyxo3QU/NUkjhwLetWEHfDAQVw3mUQghiRj2eFPebGeIrAe/MdzoiF5tC1vbAI6 dDnpe083jkkty1jgGhyPLRizY/7VwmZ248w3M3NEBhcYsSev6eSGu4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:references:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; s=default; bh=DfcoAeWHv95ENb3wxNoXKhii7UY=; b=V3jiRhAIoV4B8Dg79ZHD7PhnRSBJ HW0BMxbTD6TpWMclhJIsJOUYM3hxAWuARa6vELLltyV2zCZD9QYKRdST4r7Gk5b6 eSWBgRayLasM335FAI8lhLGLcB8JeoGBbiFuBKupoAQx3YKfEA1SwQL43dskcV3Q llBf70OJJ9d06G8= Received: (qmail 112305 invoked by alias); 14 Nov 2019 15:32:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 112295 invoked by uid 89); 14 Nov 2019 15:32:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy=Restrict X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 14 Nov 2019 15:32:36 +0000 IronPort-SDR: 3HP/dzk7PNwiq1NalKligDedj1o7wEdYsFgNqMKDL2/FClxGwU/hctJ41EL9xlIIZ+SqBhQn4S LcLosv8DceO4hpDZDaX37AGjnoJEAeHwkRNlXJUViuUHlMWZyPJJYonG89EGHH5wpDNfWUf/kI R0+oWLtDz6ELKcj5QuLMo2fO3LCocqm6FQmU78wxRDmuRCcUq0fTQJYsucJxnMoDru4iIwvbQ8 edSQtpABG54kh9KDouQyuFfFlnOwPLspspY4+MNj/hgkIwml3xLP50bcz5g2CU7SbGisIVv5dj bb4= Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 14 Nov 2019 07:32:35 -0800 IronPort-SDR: Kc6rRCpRoJPiyRO7zlV+eArazn6cyYeBxUnWZxcxrhMJgqAfUR8SBqxnooDk+Wxl1knCKStF5K Xj9zYUUnxpnkL4sMPX661CkUgyMEE+oCih2zZQygBGVG2Zedsn8ghR9MVi0Gx0Tf0pKrlpoLUM GicaeNbFmqsmp+al2H9HMUWTbHW7UJBqR0avlpGTVjraQslw2PZxEpuCkDg29MAOTqFRfI0oeF E6FMTK4G2/7qg8uBMJOUBKQv96Odt3KC6u6dXK+Zusp9gmMUTqAbBvVza2QDPft78Y9FFoQT8A rFI= Subject: [PATCH 3/5] [amdgcn] Restrict register usage in non-kernel functions From: Kwok Cheung Yeung To: , Andrew Stubbs , Julian Brown References: <0b37b07a-be6c-2ac6-c579-c7a522024419@codesourcery.com> Message-ID: Date: Thu, 14 Nov 2019 15:32:16 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <0b37b07a-be6c-2ac6-c579-c7a522024419@codesourcery.com> This patch restricts non-kernel functions to using a maximum of 64 SGPRs and 24 VGPRs. Kernels can request various pieces of information from the HSA runtime, and these will be loaded into the registers consecutively before the kernel executes. These registers are normally fixed. Since non-kernel functions cannot make these requests, they have to assume that the default set of information has been requested. If a non-leaf kernel requests information not in the defaults, a warning is now emitted as pieces of info needed by callees may have shifted locations. A leaf kernel can do whatever it wants. I have setup FIXED_REGISTERS for the default case now - if a different set of startup info is requested (which should be rare), then the set of fixed registers will be adjusted accordingly by gcn_conditional_register_usage. Compared to before, v0, s2 and s3 are now unfixed (due to the newlib patch 'Stash reent marker in upper bits of s1 on AMD GCN' and the first patch in this series). Okay to commit? Kwok 2019-11-14 Kwok Cheung Yeung gcc/ * config/gcn/gcn.c (default_requested_args): New. (gcn_parse_amdgpu_hsa_kernel_attribute): Initialize requested args set with default_requested_args. (gcn_conditional_register_usage): Limit register usage of non-kernel functions. Reassign fixed registers if a non-standard set of args is requested. * config/gcn/gcn.h (FIXED_REGISTERS): Fix registers according to ABI. --- gcc/config/gcn/gcn.c | 63 ++++++++++++++++++++++++++++++---------------------- gcc/config/gcn/gcn.h | 6 ++--- 2 files changed, 39 insertions(+), 30 deletions(-) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 09dfabb..8a2f7d7 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -191,6 +191,17 @@ static const struct gcn_kernel_arg_type {"work_item_id_Z", NULL, V64SImode, FIRST_VGPR_REG + 2} }; +static const long default_requested_args + = (1 << PRIVATE_SEGMENT_BUFFER_ARG) + | (1 << DISPATCH_PTR_ARG) + | (1 << QUEUE_PTR_ARG) + | (1 << KERNARG_SEGMENT_PTR_ARG) + | (1 << PRIVATE_SEGMENT_WAVE_OFFSET_ARG) + | (1 << WORKGROUP_ID_X_ARG) + | (1 << WORK_ITEM_ID_X_ARG) + | (1 << WORK_ITEM_ID_Y_ARG) + | (1 << WORK_ITEM_ID_Z_ARG); + /* Extract parameter settings from __attribute__((amdgpu_hsa_kernel ())). This function also sets the default values for some arguments. @@ -201,10 +212,7 @@ gcn_parse_amdgpu_hsa_kernel_attribute (struct gcn_kernel_args *args, tree list) { bool err = false; - args->requested = ((1 << PRIVATE_SEGMENT_BUFFER_ARG) - | (1 << QUEUE_PTR_ARG) - | (1 << KERNARG_SEGMENT_PTR_ARG) - | (1 << PRIVATE_SEGMENT_WAVE_OFFSET_ARG)); + args->requested = default_requested_args; args->nargs = 0; for (int a = 0; a < GCN_KERNEL_ARG_TYPES; a++) @@ -242,8 +250,6 @@ gcn_parse_amdgpu_hsa_kernel_attribute (struct gcn_kernel_args *args, args->requested |= (1 << a); args->order[args->nargs++] = a; } - args->requested |= (1 << WORKGROUP_ID_X_ARG); - args->requested |= (1 << WORK_ITEM_ID_Z_ARG); /* Requesting WORK_ITEM_ID_Z_ARG implies requesting WORK_ITEM_ID_X_ARG and WORK_ITEM_ID_Y_ARG. Similarly, requesting WORK_ITEM_ID_Y_ARG implies @@ -253,10 +259,6 @@ gcn_parse_amdgpu_hsa_kernel_attribute (struct gcn_kernel_args *args, if (args->requested & (1 << WORK_ITEM_ID_Y_ARG)) args->requested |= (1 << WORK_ITEM_ID_X_ARG); - /* Always enable this so that kernargs is in a predictable place for - gomp_print, etc. */ - args->requested |= (1 << DISPATCH_PTR_ARG); - int sgpr_regno = FIRST_SGPR_REG; args->nsgprs = 0; for (int a = 0; a < GCN_KERNEL_ARG_TYPES; a++) @@ -2041,27 +2043,34 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t rclass, static void gcn_conditional_register_usage (void) { - int i; + if (!cfun || !cfun->machine) + return; - /* FIXME: Do we need to reset fixed_regs? */ + if (cfun->machine->normal_function) + { + /* Restrict the set of SGPRs and VGPRs used by non-kernel functions. */ + for (int i = SGPR_REGNO (62); i <= LAST_SGPR_REG; i++) + fixed_regs[i] = 1, call_used_regs[i] = 1; -/* Limit ourselves to 1/16 the register file for maximimum sized workgroups. - There are enough SGPRs not to limit those. - TODO: Adjust this more dynamically. */ - for (i = FIRST_VGPR_REG + 64; i <= LAST_VGPR_REG; i++) - fixed_regs[i] = 1, call_used_regs[i] = 1; + for (int i = VGPR_REGNO (24); i <= LAST_VGPR_REG; i++) + fixed_regs[i] = 1, call_used_regs[i] = 1; - if (!cfun || !cfun->machine || cfun->machine->normal_function) - { - /* Normal functions can't know what kernel argument registers are - live, so just fix the bottom 16 SGPRs, and bottom 3 VGPRs. */ - for (i = 0; i < 16; i++) - fixed_regs[FIRST_SGPR_REG + i] = 1; - for (i = 0; i < 3; i++) - fixed_regs[FIRST_VGPR_REG + i] = 1; return; } + /* If the set of requested args is the default set, nothing more needs to + be done. */ + if (cfun->machine->args.requested == default_requested_args) + return; + + /* Requesting a set of args different from the default violates the ABI. */ + if (!leaf_function_p ()) + warning (0, "A non-default set of initial values has been requested, " + "which violates the ABI!"); + + for (int i = SGPR_REGNO (0); i < SGPR_REGNO (14); i++) + fixed_regs[i] = 0; + /* Fix the runtime argument register containing values that may be needed later. DISPATCH_PTR_ARG and FLAT_SCRATCH_* should not be needed after the prologue so there's no need to fix them. */ @@ -2069,10 +2078,10 @@ gcn_conditional_register_usage (void) fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]] = 1; if (cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0) { + /* The upper 32-bits of the 64-bit descriptor are not used, so allow + the containing registers to be used for other purposes. */ fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]] = 1; fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] + 1] = 1; - fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] + 2] = 1; - fixed_regs[cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] + 3] = 1; } if (cfun->machine->args.reg[KERNARG_SEGMENT_PTR_ARG] >= 0) { diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index b3b2d1a..dd3789b 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -160,9 +160,9 @@ #define FIXED_REGISTERS { \ /* Scalars. */ \ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, \ /* fp sp lr. */ \ - 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, \ + 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, \ /* exec_save, cc_save */ \ 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ @@ -180,7 +180,7 @@ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \ /* VGRPs */ \ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ + 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \