From patchwork Fri Nov 8 00:51:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John David Anglin X-Patchwork-Id: 1191542 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-512775-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bell.net Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="sfX1wgOK"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 478MCp6Jgzz9sP4 for ; Fri, 8 Nov 2019 11:51:32 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=WspgEvd4yOkdv/d/ bQC0/tSNbNxC4MMmamf9FLhVatiz1P3FzgtvparBjhggmUtiApryC4994J7rCHpP A74mdu3JuDmidTbbh5KlgIWVx4iA5tzFVu3LoMe3HTnr/ABhMn0F2dhB3WROqL/u 5+qFR8O78kpnnaYc+HrAkZqJqNM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type :content-transfer-encoding; s=default; bh=ZWuAQYvX6uk2/DnRilptz2 w+Gbc=; b=sfX1wgOK6SLX4OmUxhWN85bW4NCY3Jq+uspd1BoA91PTDf64PRBXf+ 9l7DkVC6+1cyGw7EBt2RMi6LYmhAdQlMEEshXEAevFjs1kPiNW/P/NMPaFacrepe UCxgetnY6msYHHmRlvJ0hmpFYbhoLR7fvvb9wUEjWkUQ+k9Nl0gJs= Received: (qmail 87279 invoked by alias); 8 Nov 2019 00:51:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87260 invoked by uid 89); 8 Nov 2019 00:51:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy=1x, sk:hppa64, sk:hppa64- X-HELO: mtlfep01.bell.net Received: from belmont79srvr.owm.bell.net (HELO mtlfep01.bell.net) (184.150.200.79) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Nov 2019 00:51:19 +0000 Received: from bell.net mtlfep01 184.150.200.30 by mtlfep01.bell.net with ESMTP id <20191108005116.WEKB4947.mtlfep01.bell.net@mtlspm02.bell.net> for ; Thu, 7 Nov 2019 19:51:16 -0500 Received: from [192.168.2.49] (really [70.53.53.104]) by mtlspm02.bell.net with ESMTP id <20191108005116.LIVJ16482.mtlspm02.bell.net@[192.168.2.49]>; Thu, 7 Nov 2019 19:51:16 -0500 To: GCC Patches From: John David Anglin Subject: [committed] pa: Revise memory barriers to use strongly ordered ldcw instruction Openpgp: preference=signencrypt Message-ID: Date: Thu, 7 Nov 2019 19:51:15 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 X-CM-Analysis: v=2.3 cv=bJBo382Z c=1 sm=1 tr=0 cx=a_idp_d a=htCe9XT+XAlGhzqgweArVg==:117 a=htCe9XT+XAlGhzqgweArVg==:17 a=IkcTkHD0fZMA:10 a=MeAgGD-zjQ4A:10 a=mDV3o1hIAAAA:8 a=u0C97uyWr6-Hf1oXfCEA:9 a=gHkOI_AEB1CBVIPz:21 a=6lHdSfK7odMbxq6Q:21 a=QEXdDO2ut3YA:10 a=_FVE-zBwftR9WsbkzFJk:22 X-CM-Envelope: MS4wfMsCT37svFc2MOe8fqi7EVSN1U3bjSN+gsaeCVMXzofxhZdiSEzKJyHcYy2OnuFdC/r2SpHNw6Z5nsq5+cT2r9umegm0SKIcd/x/3gTPILTMriGYGs1y TVJn/3UXBZYT88FDsEpiQYadrbCk+0Fjvl8Jss2wmQnjC0SHfCH/7O4eC0kGjdObKIbxPwjNzVQ+mw== This change revises the memory barrier patterns to use the ldcw instruction instead of the sync instruction. The sync instruction performs better and I have more confidence in it than sync. We use a location just above the top of the stack for these operations. The stack address is aligned to a 16-byte boundary if the system is not coherent. I have added two new options. The first is the -mcoherent-ldcw option. The majority of PA 2.0 system have coherent caches and as a result the coherent ldcw completer can be used. In that case, the ldcw address doesn't require 16-byte alignment. We set the default to -mcoherent-ldcw. The second option is the -mordered option. Although all PA 1.x systems have ordered memory accesses, PA 2.0 systems are weakly ordered. Since PA 2.0 are now prevalent, we set the default to -mno-ordered. For ordered systems, we fall back to just a compiler memory barrier. I believe acquire and release fences can be defined in a similar way using an ordered load and an ordered store, respectively. Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11. Committed to trunk. Dave 2019-11-07 John David Anglin * config/pa/pa.md (memory_barrier): Revise to use ldcw barriers. Enhance comment. (memory_barrier_coherent, memory_barrier_64, memory_barrier_32): New insn patterns using ldcw instruction. (memory_barrier): Remove insn pattern using sync instruction. * config/pa/pa.opt (coherent-ldcw): New option. (ordered): New option. Index: config/pa/pa.md =================================================================== --- config/pa/pa.md (revision 277870) +++ config/pa/pa.md (working copy) @@ -10086,23 +10086,55 @@ (set_attr "length" "4,16")]) ;; PA 2.0 hardware supports out-of-order execution of loads and stores, so -;; we need a memory barrier to enforce program order for memory references. -;; Since we want PA 1.x code to be PA 2.0 compatible, we also need the -;; barrier when generating PA 1.x code. +;; we need memory barriers to enforce program order for memory references +;; when the TLB and PSW O bits are not set. We assume all PA 2.0 systems +;; are weakly ordered since neither HP-UX or Linux set the PSW O bit. Since +;; we want PA 1.x code to be PA 2.0 compatible, we also need barriers when +;; generating PA 1.x code even though all PA 1.x systems are strongly ordered. +;; When barriers are needed, we use a strongly ordered ldcw instruction as +;; the barrier. Most PA 2.0 targets are cache coherent. In that case, we +;; can use the coherent cache control hint and avoid aligning the ldcw +;; address. In spite of its description, it is not clear that the sync +;; instruction works as a barrier. + (define_expand "memory_barrier" - [(set (match_dup 0) - (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + [(parallel + [(set (match_dup 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_dup 1))])] "" { - operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + /* We don't need a barrier if the target uses ordered memory references. */ + if (TARGET_ORDERED) + FAIL; + operands[1] = gen_reg_rtx (Pmode); + operands[0] = gen_rtx_MEM (BLKmode, operands[1]); MEM_VOLATILE_P (operands[0]) = 1; }) -(define_insn "*memory_barrier" +(define_insn "*memory_barrier_coherent" [(set (match_operand:BLK 0 "" "") - (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_operand 1 "pmode_register_operand" "=r"))] + "TARGET_PA_20 && TARGET_COHERENT_LDCW" + "ldcw,co 0(%%sp),%1" + [(set_attr "type" "binary") + (set_attr "length" "4")]) + +(define_insn "*memory_barrier_64" + [(set (match_operand:BLK 0 "" "") + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_operand 1 "pmode_register_operand" "=&r"))] + "TARGET_64BIT" + "ldo 15(%%sp),%1\n\tdepd %%r0,63,3,%1\n\tldcw 0(%1),%1" + [(set_attr "type" "binary") + (set_attr "length" "12")]) + +(define_insn "*memory_barrier_32" + [(set (match_operand:BLK 0 "" "") + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) + (clobber (match_operand 1 "pmode_register_operand" "=&r"))] "" - "sync" + "ldo 15(%%sp),%1\n\t{dep|depw} %%r0,31,3,%1\n\tldcw 0(%1),%1" [(set_attr "type" "binary") - (set_attr "length" "4")]) + (set_attr "length" "12")]) Index: config/pa/pa.opt =================================================================== --- config/pa/pa.opt (revision 277870) +++ config/pa/pa.opt (working copy) @@ -45,6 +45,10 @@ Target Report Mask(CALLER_COPIES) Caller copies function arguments passed by hidden reference. +mcoherent-ldcw +Target Report Var(TARGET_COHERENT_LDCW) Init(1) +Use ldcw/ldcd coherent cache-control hint. + mdisable-fpregs Target Report Mask(DISABLE_FPREGS) Disable FP regs. @@ -90,6 +94,10 @@ Target RejectNegative Report Mask(NO_SPACE_REGS) Disable space regs. +mordered +Target Report Var(TARGET_ORDERED) Init(0) +Assume memory references are ordered and barriers are not needed. + mpa-risc-1-0 Target RejectNegative Generate PA1.0 code.