From patchwork Wed Jan 24 10:41:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 865287 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-471937-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="SmhEaUFE"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zRMDX0T2Mz9ryr for ; Wed, 24 Jan 2018 21:42:02 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=VnQtKRjTi8ZPvWk8hJ6mugEdmYbFWn53IzHQFuO28deNd8LG83 UHhkslWpIzwypm2F9tegnw0JoT6tmQ7JVW2ixS+j3+wJcC+Sa0nnXSnv/efEyTRe /wTyW3mbFk8AmYwDe6RdejW1fRxUTJuRp1+uDTcnzkBJq9+osSAYg6vVI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=8KgZcXUSEeD9hB+LvXClq011xjE=; b=SmhEaUFEoMVpSU1/5ckI af8Yde2t3Kx1OmxZeMASQTctFtXEaVY9PZO5+hrSAC2GmkqJImFPBoUhBgOhbbfW 1lNuVeF6I7ePIECmV/s73IEOpCNM8udkMBdK6IGGv1W1W587q9H6iFhHnErKTc6Q r9AFK6nELjLtNDxb2MFQo3M= Received: (qmail 13505 invoked by alias); 24 Jan 2018 10:41:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 13479 invoked by uid 89); 24 Jan 2018 10:41:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=L5, r36 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 24 Jan 2018 10:41:51 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1eeIV3-0006eN-FM from Tom_deVries@mentor.com ; Wed, 24 Jan 2018 02:41:49 -0800 Received: from [137.202.13.177] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Wed, 24 Jan 2018 10:41:45 +0000 To: Jakub Jelinek CC: GCC Patches , Richard Biener From: Tom de Vries Subject: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug Message-ID: <34fb1d00-dc5d-04f2-d601-ee6fe710ac3b@mentor.com> Date: Wed, 24 Jan 2018 11:41:45 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) Hi, this patch adds a workaround for the nvptx target JIT bug PR83589 - "[nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0". When compiling a branch-around-nothing (where the branch is warp neutering, so it's a divergent branch): ... .reg .pred %r36; { .reg .u32 %x; mov.u32 %x,%tid.x; setp.ne.u32 %r36,%x,0; } @ %r36 bra $L5; $L5: ... The JIT fails to generate a convergence point here: ... /*0128*/ @P0 BRA `(.L_1); .L_1: ... Consequently, we execute subsequent code in divergent mode, and when executing a shfl.idx a bit later we run into the undefined behaviour that shfl.idx has when executing in divergent mode. The workaround detects branch-around-nothing, and inserts a ptx operation that does nothing (I'm calling it a fake nop, I haven't been able to come up with a better term yet): ... @ %r36 bra $L5; { .reg .u32 %nop_src; .reg .u32 %nop_dst; mov.u32 %nop_dst, %nop_src; } $L5: ... which makes the test pass, because then we generate a convergence point here at .L1: ... /*0128*/ SSY `(.L_1); /*0130*/ @P0 SYNC (*"TARGET= .L_1 "*); /*0138*/ SYNC (*"TARGET= .L_1 "*); .L_1: ... The workaround is not minimal given that it inserts the fake nop in all branch-around-nothings it detects, not just the warp neutering ones, but I think this is more robust than trying to identify the warp neutering branches. Furthermore, I'm not going for optimality here anyway. The optimal way to fix this is making sure we don't generate branch-around-nothing, but that's for stage1. Build and reg-tested on x86_64 with nvptx accelerator. I'd like to commit in stage4, but I'd appreciate a review of the code. Does the patch look OK? Thanks, - Tom [nvptx, PR83589] Workaround for branch-around-nothing JIT bug 2018-01-23 Tom de Vries PR target/83589 * config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1. (nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c. Add strict parameter. (prevent_branch_around_nothing): Insert dummy insn between branch to label and label with no ptx insn inbetween. * config/nvptx/nvptx.md (define_insn "fake_nop"): New insn. * testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test. --- gcc/config/nvptx/nvptx.c | 92 ++++++++++++++++++++++ gcc/config/nvptx/nvptx.md | 9 +++ .../testsuite/libgomp.oacc-c-c++-common/pr83589.c | 21 +++++ 3 files changed, 122 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 3516740..e55b426 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -78,6 +78,7 @@ #include "target-def.h" #define WORKAROUND_PTXJIT_BUG 1 +#define WORKAROUND_PTXJIT_BUG_2 1 /* The various PTX memory areas an object might reside in. */ enum nvptx_data_area @@ -4363,6 +4364,93 @@ nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer) nvptx_neuter_pars (par->next, modes, outer); } +#if WORKAROUND_PTXJIT_BUG_2 +/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT. This variant + is needed in the nvptx target because the branches generated for + parititioning are NONJUMP_INSN_P, not JUMP_P. */ + +static rtx +nvptx_pc_set (const rtx_insn *insn, bool strict = true) +{ + rtx pat; + if ((strict && !JUMP_P (insn)) + || (!strict && !INSN_P (insn))) + return NULL_RTX; + pat = PATTERN (insn); + + /* The set is allowed to appear either as the insn pattern or + the first set in a PARALLEL. */ + if (GET_CODE (pat) == PARALLEL) + pat = XVECEXP (pat, 0, 0); + if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC) + return pat; + + return NULL_RTX; +} + +/* Variant of condjump_label that only requires JUMP_P (INSN) if STRICT. */ + +static rtx +nvptx_condjump_label (const rtx_insn *insn, bool strict = true) +{ + rtx x = nvptx_pc_set (insn, strict); + + if (!x) + return NULL_RTX; + x = SET_SRC (x); + if (GET_CODE (x) == LABEL_REF) + return x; + if (GET_CODE (x) != IF_THEN_ELSE) + return NULL_RTX; + if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF) + return XEXP (x, 1); + if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF) + return XEXP (x, 2); + return NULL_RTX; +} + +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx + insn inbetween the branch and the label. This works around a JIT bug + observed at driver version 384.111, at -O0 for sm_50. */ + +static void +prevent_branch_around_nothing (void) +{ + rtx_insn *seen_label = 0; + for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn)) + { + if (seen_label == 0) + { + if (INSN_P (insn) && condjump_p (insn)) + seen_label = label_ref_label (nvptx_condjump_label (insn, false)); + + continue; + } + + if (NOTE_P (insn)) + continue; + + if (INSN_P (insn)) + switch (recog_memoized (insn)) + { + case CODE_FOR_nvptx_fork: + case CODE_FOR_nvptx_forked: + case CODE_FOR_nvptx_joining: + case CODE_FOR_nvptx_join: + continue; + default: + seen_label = 0; + continue; + } + + if (LABEL_P (insn) && insn == seen_label) + emit_insn_before (gen_fake_nop (), insn); + + seen_label = 0; + } + } +#endif + /* PTX-specific reorganization - Split blocks at fork and join instructions - Compute live registers @@ -4442,6 +4530,10 @@ nvptx_reorg (void) if (TARGET_UNIFORM_SIMT) nvptx_reorg_uniform_simt (); +#if WORKAROUND_PTXJIT_BUG_2 + prevent_branch_around_nothing (); +#endif + regstat_free_n_sets_and_refs (); df_finish_pass (true); diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 135479b..4f4453d 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -999,6 +999,15 @@ "" "exit;") +(define_insn "fake_nop" + [(const_int 2)] + "" + "{ + .reg .u32 %%nop_src; + .reg .u32 %%nop_dst; + mov.u32 %%nop_dst, %%nop_src; + }") + (define_insn "return" [(return)] "" diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c new file mode 100644 index 0000000..a6ed5cf --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c @@ -0,0 +1,21 @@ +/* { dg-do run } */ +/* { dg-set-target-env-var GOMP_NVPTX_JIT "-O0" } */ + +#define n 32 + +int +main (void) +{ + int arr_a[n]; + +#pragma acc parallel copyout(arr_a) num_gangs(1) num_workers(1) vector_length(32) + { + #pragma acc loop vector + for (int m = 0; m < 32; m++) + ; + + #pragma acc loop vector + for (int m = 0; m < 32; m++) + arr_a[m] = 0; + } +}