From patchwork Thu Apr 12 13:57:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 897685 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-476279-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="EUKwmwoW"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40MMtf278Vz9s2R for ; Thu, 12 Apr 2018 23:58:00 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=Ibf2d7iVlB7SJN8vmLBnWkeN98BQMmziaWd/e7006IHIhdzrSG RtymDi2uRmUv68DSNn5+LOpAQm2CviwMFyLViZjlGQSDETR1JdRdwtP6SD4QYv4V sNTCBBB13lcPAZ8kPvE78gJ8s2KYgsZTyjEY4xK+enBCyMyM3hnv0NG2I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=Xl5Vfx9MNM5B3q/Roig+a2jBQ+U=; b=EUKwmwoWNxdlkoqOO1Yq GQCC69oDLZ16WVPhdVOunRahDEFXliucseTyXyCC+SRu8JyWw2kiZJChrrhzQAa8 cUAMfIuhDjY9QjAgOAGPxUAbcTRvOqBLO+pd8qhrro6i/5aEQSEUww+bXLSPkqGz ymqFtzAnnBVcj8bypJ6hH5c= Received: (qmail 107540 invoked by alias); 12 Apr 2018 13:57:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 107531 invoked by uid 89); 12 Apr 2018 13:57:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=L1, beta, l1 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Apr 2018 13:57:52 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1f6cjW-0002TB-6y from Tom_deVries@mentor.com for gcc-patches@gcc.gnu.org; Thu, 12 Apr 2018 06:57:50 -0700 Received: from [172.30.72.177] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Thu, 12 Apr 2018 14:57:46 +0100 To: GCC Patches CC: Cesar Philippidis , Thomas Schwinge From: Tom de Vries Subject: [og7, nvptx, committed] Fix propagation of branch cond in vw-neutered code Message-ID: <1b54da81-fe3e-09a8-8d9f-6f3492a00808@mentor.com> Date: Thu, 12 Apr 2018 15:57:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) Hi, Currently, when we enable -mlong-vector-in-workers in gemm.f90, we get: ... { .reg.u32 %tidy; .reg.u64 %t_bcast; .reg.u64 %y64; mov.u32 %tidy, %tid.y; cvt.u64.u32 %y64, %tidy; add.u64 %y64, %y64, 1; cvta.shared.u64 %t_bcast, __oacc_bcast; mad.lo.u64 %r166, %y64, 104, %t_bcast; } @ %r179 bra.uni $L28; @ %r174 bra $L29; ... setp.le.s32 %r114,%r113,0; selp.u32 %r182,1,0,%r114; st.u32 [%r166],%r182; $L29: $L28: bar.sync %r167,128; ld.u32 %r183,[%r166]; setp.ne.u32 %r114,%r183,0; bar.sync %r167,128; @ %r114 bra.uni $L1 ... The branch condition %114 is computed in a W0V0 region, and then broadcast to a WAVA region. The broadcast is done using a partition of the broadcast buffer at %r166, but this is a worker-specific buffer. So since the writing of the buffer is done in worker 0 only, the read in workers other than 0 is reading uninitialized memory. This patch fixes this by using the generic broadcast buffer in this case, rather than a worker-specific one. Build x86_64 with nvptx accelerator and tested libgomp. Committed to og7. Thanks, - Tom [nvptx] Fix propagation of branch cond in vw-neutered code 2018-04-12 Tom de Vries PR target/85246 * config/nvptx/nvptx.c (nvptx_single): Don't use partitioning when propagating branch condition calculated in vector-worker-neutered code. * testsuite/libgomp.oacc-fortran/gemm.f90: Use -foffload=-mlong-vector-in-workers. --- gcc/config/nvptx/nvptx.c | 3 ++- libgomp/testsuite/libgomp.oacc-fortran/gemm.f90 | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 547022e..9d011eb 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4306,13 +4306,14 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) broadcast_data_t data; unsigned size = GET_MODE_SIZE (SImode); bool vector = (GOMP_DIM_MASK (GOMP_DIM_VECTOR) == mask) != 0; + bool worker = (GOMP_DIM_MASK (GOMP_DIM_WORKER) == mask) != 0; rtx barrier = GEN_INT (0); int threads = 0; data.base = oacc_bcast_sym; data.ptr = 0; - bool use_partitioning_p = (vector + bool use_partitioning_p = (vector && !worker && nvptx_mach_max_workers () > 1 && cfun->machine->bcast_partition); if (use_partitioning_p) diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90 index ad67dce..744d21e 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90 @@ -1,6 +1,7 @@ ! Exercise three levels of parallelism using SGEMM from BLAS. ! { dg-additional-options "-fopenacc-dim=-:-:128" } +! { dg-additional-options "-foffload=-mlong-vector-in-workers" } ! Implicitly set vector_length to 128 using -fopenacc-dim. subroutine openacc_sgemm (m, n, k, alpha, a, b, beta, c)