From patchwork Thu May 21 15:57:00 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wahab X-Patchwork-Id: 475074 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B22C114018C for ; Fri, 22 May 2015 01:57:12 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=GeKS+mbu; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=b7Wq96CMLAMcdQQyjoxXxoOS6e+pNRDINTVgr3ObGCYLVE lNdCKqA+0Mv3jjQlP+KUNzySPMOWeP5XB5hixEWqrSf24ar6aUl0AwJhRDsGeUGC DvjodSgMePjr6eSEhDiZoaajYL1TPl3GkIOfqvBiLcekfpeUQOZLAPWJLU0gM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=3pVV7fk3pRCpiHZ3zdvlE6HHIik=; b=GeKS+mbulKWNkcRM7r7L OEDn13snf41QTHBDLuiwoti/3SCeY/Pb01Kk3Dvfp53yZnjuxqoUkApZYT0pUKdJ s8ePVpuV11+wRdukVr9JwG8WyNWOCKSMR01pj06Y3BW9V7PGRxC6v0Z07JsitwZK 4pB0Eqjm1LkCJTHtpFhxx8Y= Received: (qmail 38758 invoked by alias); 21 May 2015 15:57:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 38744 invoked by uid 89); 21 May 2015 15:57:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 21 May 2015 15:57:03 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-21.uk.mimecast.lan; Thu, 21 May 2015 16:57:00 +0100 Received: from e106327-lin.cambridge.arm.com ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 21 May 2015 16:56:59 +0100 Message-ID: <555E004C.3060704@arm.com> Date: Thu, 21 May 2015 16:57:00 +0100 From: Matthew Wahab User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: gcc-patches Subject: [PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins. X-MC-Unique: MZ_yBldZQNGblLkm2et9QA-1 X-IsSubscribed: yes On Aarch64, the __sync builtins are implemented using the __atomic operations and barriers. This makes the the __sync builtins inconsistent with their documentation which requires stronger barriers than those for the __atomic builtins. The difference between __sync and __atomic builtins is that the restrictions imposed by a __sync operation's barrier apply to all memory references while the restrictions of an __atomic operation's barrier only need to apply to a subset. This affects Aarch64 in particular because, although its implementation of __atomic builtins is correct, the barriers generated are too weak for the __sync builtins. The affected __sync builtins are the __sync_fetch_and_op (and __sync_op_and_fetch) functions, __sync_compare_and_swap and __sync_lock_test_and_set. This and a following patch modifies the code generated for these functions to weaken initial load-acquires to a simple load and to add a final fence to prevent code-hoisting. The last patch will add tests for the code generated by the Aarch64 backend for the __sync builtins. - Full barriers: __sync_fetch_and_op, __sync_op_and_fetch __sync_*_compare_and_swap [load-acquire; code; store-release] becomes [load; code ; store-release; fence]. - Acquire barriers: __sync_lock_test_and_set [load-acquire; code; store] becomes [load; code; store; fence] The code generated for release barriers and for the __atomic builtins is unchanged. This patch changes the code generated for __sync_fetch_and_ and __sync__and_fetch builtins. Tested with check-gcc for aarch64-none-linux-gnu. Ok for trunk? Matthew gcc/ 2015-05-21 Matthew Wahab * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New. (aarch64_split_atomic_op): Check for __sync memory models, emit appropriate initial and final barriers. From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Fri, 15 May 2015 09:26:28 +0100 Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin. Change-Id: I3342a572d672163ffc703e4e51603744680334fc --- gcc/config/aarch64/aarch64.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 7f0cc0d..778571f 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9249,6 +9249,22 @@ aarch64_expand_compare_and_swap (rtx operands[]) emit_insn (gen_rtx_SET (bval, x)); } +/* Emit a post-operation barrier. */ + +static void +aarch64_emit_post_barrier (enum memmodel model) +{ + const enum memmodel base_model = memmodel_base (model); + + if (is_mm_sync (model) + && (base_model == MEMMODEL_ACQUIRE + || base_model == MEMMODEL_ACQ_REL + || base_model == MEMMODEL_SEQ_CST)) + { + emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST))); + } +} + /* Split a compare and swap pattern. */ void @@ -9311,12 +9327,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, { machine_mode mode = GET_MODE (mem); machine_mode wmode = (mode == DImode ? DImode : SImode); + const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); + const bool is_sync = is_mm_sync (model); + rtx load_model_rtx = model_rtx; rtx_code_label *label; rtx x; label = gen_label_rtx (); emit_label (label); + /* A __sync operation will emit a final fence to stop code hoisting, so the + load can be relaxed. */ + if (is_sync) + load_model_rtx = GEN_INT (MEMMODEL_RELAXED); + if (new_out) new_out = gen_lowpart (wmode, new_out); if (old_out) @@ -9325,7 +9349,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, old_out = new_out; value = simplify_gen_subreg (wmode, value, mode, 0); - aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx); + aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx); switch (code) { @@ -9361,6 +9385,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, label), pc_rtx); aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + + /* Emit any fence needed for a __sync operation. */ + if (is_sync) + aarch64_emit_post_barrier (model); } static void -- 1.9.1