From patchwork Wed Jul 22 16:47:32 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 498734 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5AC5D1402C8 for ; Thu, 23 Jul 2015 02:47:47 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=jX0whEa0; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=D1ZRRxW3pgwQbIXkt 2FND9ngpWMbwKj1ReXSFer7S6g0qZKrCL1K8M9BE2mstFOGUYAR05AFsCwo8z/fz BA1AZ1JHck2VJbNeU20yvve+a5huC9N0vh5qDKp5cC7KeBornVHbAnGjXqkZjrhj RP737Mk3SxoPdUA0p1IuOkRr1Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=BjPv59awwIFLD0Y8WrZ5L+x AaBw=; b=jX0whEa0vLZD396lQUWm2IBILAzlqeFLPFBkwA0H3wfl2HlvMwtRhUm 2iEjlFWx8CBJzHZxYDTGn42xdfEjocLf/ITQDhSETVdRGp08ft3MlSRo8FV4WH3O mV4M5a5ZHd7sstVHdRW1O42cXHRBwYNa3MhA/LVkM0IbifN+DLac= Received: (qmail 52017 invoked by alias); 22 Jul 2015 16:47:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 52006 invoked by uid 89); 22 Jul 2015 16:47:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qk0-f176.google.com Received: from mail-qk0-f176.google.com (HELO mail-qk0-f176.google.com) (209.85.220.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 22 Jul 2015 16:47:38 +0000 Received: by qkdv3 with SMTP id v3so156887794qkd.3 for ; Wed, 22 Jul 2015 09:47:35 -0700 (PDT) X-Received: by 10.55.17.100 with SMTP id b97mr4961505qkh.94.1437583655799; Wed, 22 Jul 2015 09:47:35 -0700 (PDT) Received: from ?IPv6:2601:181:c000:c497:a2a8:cdff:fe3e:b48? ([2601:181:c000:c497:a2a8:cdff:fe3e:b48]) by smtp.googlemail.com with ESMTPSA id w67sm948990qgw.41.2015.07.22.09.47.33 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Jul 2015 09:47:34 -0700 (PDT) Subject: Re: [gomp] Move openacc vector& worker single handling to RTL To: Thomas Schwinge References: <5597120D.2080308@acm.org> <20150703231159.GP10247@tucnak.redhat.com> <559844EF.6010208@acm.org> <559AD85B.2050102@acm.org> <20150707095408.GD10247@tucnak.redhat.com> <559BDE68.9010302@acm.org> <20150707142229.GG10247@tucnak.redhat.com> <559D381C.7020804@acm.org> <20150708145822.GB10247@tucnak.redhat.com> <559D9A29.2020409@acm.org> <559F10F2.9050102@acm.org> <87bnf9v5ma.fsf@kepler.schwinge.homeip.net> <55ACF144.4050201@acm.org> <55AD0ED6.8020105@acm.org> Cc: GCC Patches , Jakub Jelinek From: Nathan Sidwell Message-ID: <55AFC924.7050709@acm.org> Date: Wed, 22 Jul 2015 12:47:32 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <55AD0ED6.8020105@acm.org> On 07/20/15 11:08, Nathan Sidwell wrote: > On 07/20/15 09:01, Nathan Sidwell wrote: >> On 07/18/15 11:37, Thomas Schwinge wrote: >>> Hi Nathan! >> >>> For OpenACC nvptx offloading, there must still be something wrong; here's >>> a count of the (non-deterministic!) regressions of ten runs of the >>> libgomp testsuite. As private-vars-loop-worker-5.c fails most often, it >>> probably makes sense to look into that one first. >> >> I'll take a look. :( > > Having difficulty reproducing it (preprocessed source compiled at -O0 works for > me). Do you have an exact recipe? Thomas helped me reproduce them -- they are very intermittent. Anyway, fixed with the attached patch I've committed to gomp branch. The bug was a race condition in the worker-level 'follow along' algorithm. Worker zero could overwrite the flag for some subsequent block before all the other workers had read the previous value of the flag. This wasn't optimization-level specific, but it appears unoptimized code creates better conditions to cause the behaviour. This appears to fix all the -O0 regressions you observed Thomas. nathan 2015-07-22 Nathan Sidwell * config/nvptx/nvptx.c (nvptx_option_override): Initialize worker buffer alignment here. (nvptx_wsync): Generate pattern, not emit instruction. (nvptx_single): Insert barrier after read. (nvptx_process_pars): Adjust nvptx_wsync use. (nvptx_file_end): No need to apply default alignment here. Index: config/nvptx/nvptx.c =================================================================== --- config/nvptx/nvptx.c (revision 226044) +++ config/nvptx/nvptx.c (working copy) @@ -124,6 +124,7 @@ nvptx_option_override (void) = hash_table::create_ggc (17); worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, worker_bcast_name); + worker_bcast_align = GET_MODE_SIZE (SImode); } /* Return the mode to be used when declaring a ptx object for OBJ. @@ -2627,12 +2628,13 @@ nvptx_wpropagate (bool pre_p, basic_bloc } } -/* Emit a worker-level synchronization barrier. */ +/* Emit a worker-level synchronization barrier. We use different + markers for before and after synchronizations. */ -static void -nvptx_wsync (bool tail_p, rtx_insn *insn) +static rtx +nvptx_wsync (bool after) { - emit_insn_after (gen_nvptx_barsync (GEN_INT (tail_p)), insn); + return gen_nvptx_barsync (GEN_INT (after)); } /* Single neutering according to MASK. FROM is the incoming block and @@ -2750,7 +2752,7 @@ nvptx_single (unsigned mask, basic_block } else { - /* Includes worker mode, do spill & fill. by construction + /* Includes worker mode, do spill & fill. By construction we should never have worker mode only. */ wcast_data_t data; @@ -2763,10 +2765,14 @@ nvptx_single (unsigned mask, basic_block data.offset = 0; emit_insn_before (nvptx_gen_wcast (pvar, PM_read, 0, &data), before); - emit_insn_before (gen_nvptx_barsync (GEN_INT (2)), tail); + /* Barrier so other workers can see the write. */ + emit_insn_before (nvptx_wsync (false), tail); data.offset = 0; - emit_insn_before (nvptx_gen_wcast (pvar, PM_write, 0, &data), - tail); + emit_insn_before (nvptx_gen_wcast (pvar, PM_write, 0, &data), tail); + /* This barrier is needed to avoid worker zero clobbering + the broadcast buffer before all the other workers have + had a chance to read this instance of it. */ + emit_insn_before (nvptx_wsync (true), tail); } extract_insn (tail); @@ -2824,8 +2830,8 @@ nvptx_process_pars (parallel *par) par->forked_insn); nvptx_wpropagate (true, par->forked_block, par->fork_insn); /* Insert begin and end synchronizations. */ - nvptx_wsync (false, par->forked_insn); - nvptx_wsync (true, par->joining_insn); + emit_insn_after (nvptx_wsync (false), par->forked_insn); + emit_insn_before (nvptx_wsync (true), par->joining_insn); } break; @@ -3046,8 +3052,6 @@ nvptx_file_end (void) { /* Define the broadcast buffer. */ - if (worker_bcast_align < GET_MODE_SIZE (SImode)) - worker_bcast_align = GET_MODE_SIZE (SImode); worker_bcast_hwm = (worker_bcast_hwm + worker_bcast_align - 1) & ~(worker_bcast_align - 1);