From patchwork Wed May 10 14:20:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Monakov X-Patchwork-Id: 760598 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wNJLK4jzKz9s86 for ; Thu, 11 May 2017 00:20:41 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="oShiano8"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:in-reply-to:message-id:references:mime-version :content-type; q=dns; s=default; b=MRk2P6pKuiPTIopCDWi76J2+f97e+ yQ1r/NN63Qy+aNnN6NABAeVEPsQdoO9qO0cyIWO849FJMUMuyoMip0XKyRuUvSlt zAwZ07OyCvEMOn64bJsJaXVyBFWs0ljBT6SKZ89ylX/NsAli7fp3MnGKuxnypNlr eAiUinjArGePp4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:in-reply-to:message-id:references:mime-version :content-type; s=default; bh=bJyo8oQe7pkMprRyiU1EfXzcXjs=; b=oSh iano8P2hzmTFzVluk0yTfngyfBupMOqq75+hobdEa/cnOdSF04yl8M+Q2/hjGpsd yY0VayrIsIxdqEjxr6nk/6aIkhxDUcXdv7yATFQc0rd2sG1PKII0V9qFADDm8K/1 3JKJEp6Teq1eKidYd1pe41UihVuoJ88ZcRl+ZTko= Received: (qmail 61804 invoked by alias); 10 May 2017 14:20:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 61776 invoked by uid 89); 10 May 2017 14:20:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=oversight, approaches X-HELO: smtp.ispras.ru Received: from bran.ispras.ru (HELO smtp.ispras.ru) (83.149.199.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 10 May 2017 14:20:28 +0000 Received: from monopod.intra.ispras.ru (monopod.intra.ispras.ru [10.10.3.121]) by smtp.ispras.ru (Postfix) with ESMTP id 7365661286 for ; Wed, 10 May 2017 17:20:28 +0300 (MSK) Date: Wed, 10 May 2017 17:20:28 +0300 (MSK) From: Alexander Monakov To: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 1/2] x86, s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14) MIME-Version: 1.0 While fixing the fences issue of PR80640 I've noticed that a similar oversight is present in expansion of atomic loads on x86: they become volatile loads, but that is insufficient, a compiler memory barrier is still needed. Volatility prevents tearing the load (preserves non-divisibility of atomic access), but does not prevent propagation and code motion of non-volatile accesses across the load. The testcase below demonstrates that even SEQ_CST loads are mishandled. It's less clear how to fix this one. AFAICS there are two possible approaches: either emit explicit compiler barriers on either side of the load (a barrier before the load is needed for SEQ_CST loads), or change how atomic loads are expressed in RTL: atomic stores already use ATOMIC_STA UNSPECs, so they don't suffer from this issue. The asymmetry seems odd to me. The former approach is easier and exposes non-seq-cst loads upwards for optimization, so that's what the proposed patch implements. The s390 backend suffers from the same issue, except there the atomic stores are affected too. Alexander * config/i386/sync.md (atomic_load): Emit a compiler barrier before (only for SEQ_CST order) and after the load. testsuite/ * gcc.target/i386/pr80640-2.c: New testcase. --- gcc/config/i386/sync.md | 7 +++++++ gcc/testsuite/gcc.target/i386/pr80640-2.c | 11 +++++++++++ 2 files changed, 18 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-2.c diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 619d53b..35a7b38 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -155,6 +155,11 @@ (define_expand "atomic_load" UNSPEC_LDA))] "" { + enum memmodel model = memmodel_from_int (INTVAL (operands[2])); + + if (is_mm_seq_cst (model)) + expand_asm_memory_barrier (); + /* For DImode on 32-bit, we can use the FPU to perform the load. */ if (mode == DImode && !TARGET_64BIT) emit_insn (gen_atomic_loaddi_fpu @@ -173,6 +178,8 @@ (define_expand "atomic_load" if (dst != operands[0]) emit_move_insn (operands[0], dst); } + if (!is_mm_relaxed (model)) + expand_asm_memory_barrier (); DONE; }) diff --git a/gcc/testsuite/gcc.target/i386/pr80640-2.c b/gcc/testsuite/gcc.target/i386/pr80640-2.c new file mode 100644 index 0000000..f0a050c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80640-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-ter" } */ +/* { dg-final { scan-assembler-not "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" } } */ + +int f(int *p, volatile _Bool *p1, _Atomic _Bool *p2) +{ + int v = *p; + *p1 = !v; + while (__atomic_load_n (p2, __ATOMIC_SEQ_CST)); + return v - *p; +}