From patchwork Wed May 10 14:20:28 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Monakov <amonakov@ispras.ru>
X-Patchwork-Id: 760598
Return-Path: 
 <gcc-patches-return-453277-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3wNJLK4jzKz9s86
	for <incoming@patchwork.ozlabs.org>;
	Thu, 11 May 2017 00:20:41 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="oShiano8"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:in-reply-to:message-id:references:mime-version
	:content-type; q=dns; s=default; b=MRk2P6pKuiPTIopCDWi76J2+f97e+
	yQ1r/NN63Qy+aNnN6NABAeVEPsQdoO9qO0cyIWO849FJMUMuyoMip0XKyRuUvSlt
	zAwZ07OyCvEMOn64bJsJaXVyBFWs0ljBT6SKZ89ylX/NsAli7fp3MnGKuxnypNlr
	eAiUinjArGePp4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:in-reply-to:message-id:references:mime-version
	:content-type; s=default; bh=bJyo8oQe7pkMprRyiU1EfXzcXjs=; b=oSh
	iano8P2hzmTFzVluk0yTfngyfBupMOqq75+hobdEa/cnOdSF04yl8M+Q2/hjGpsd
	yY0VayrIsIxdqEjxr6nk/6aIkhxDUcXdv7yATFQc0rd2sG1PKII0V9qFADDm8K/1
	3JKJEp6Teq1eKidYd1pe41UihVuoJ88ZcRl+ZTko=
Received: (qmail 61804 invoked by alias); 10 May 2017 14:20:30 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 61776 invoked by uid 89); 10 May 2017 14:20:29 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	RP_MATCHES_RCVD,
	SPF_PASS autolearn=ham version=3.3.2 spammy=oversight,
	approaches
X-HELO: smtp.ispras.ru
Received: from bran.ispras.ru (HELO smtp.ispras.ru) (83.149.199.196) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 10 May 2017 14:20:28 +0000
Received: from monopod.intra.ispras.ru (monopod.intra.ispras.ru
	[10.10.3.121])	by smtp.ispras.ru (Postfix) with ESMTP id
	7365661286	for <gcc-patches@gcc.gnu.org>;
	Wed, 10 May 2017 17:20:28 +0300 (MSK)
Date: Wed, 10 May 2017 17:20:28 +0300 (MSK)
From: Alexander Monakov <amonakov@ispras.ru>
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 1/2] x86,
	s390: add compiler memory barriers when expanding
	atomic_thread_fence (PR 80640)
In-Reply-To: <alpine.LNX.2.20.13.1705101620050.11444@monopod.intra.ispras.ru>
Message-ID: <alpine.LNX.2.20.13.1705101702530.11444@monopod.intra.ispras.ru>
References: <alpine.LNX.2.20.13.1705101620050.11444@monopod.intra.ispras.ru>
User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14)
MIME-Version: 1.0

While fixing the fences issue of PR80640 I've noticed that a similar oversight
is present in expansion of atomic loads on x86: they become volatile loads, but
that is insufficient, a compiler memory barrier is still needed.  Volatility
prevents tearing the load (preserves non-divisibility of atomic access), but
does not prevent propagation and code motion of non-volatile accesses across the
load.  The testcase below demonstrates that even SEQ_CST loads are mishandled.

It's less clear how to fix this one.  AFAICS there are two possible approaches:
either emit explicit compiler barriers on either side of the load (a barrier
before the load is needed for SEQ_CST loads), or change how atomic loads are
expressed in RTL: atomic stores already use ATOMIC_STA UNSPECs, so they don't
suffer from this issue.  The asymmetry seems odd to me.

The former approach is easier and exposes non-seq-cst loads upwards for
optimization, so that's what the proposed patch implements.

The s390 backend suffers from the same issue, except there the atomic stores are
affected too.

Alexander

	* config/i386/sync.md (atomic_load<mode>): Emit a compiler barrier
	before (only for SEQ_CST order) and after the load.
testsuite/
	* gcc.target/i386/pr80640-2.c: New testcase.
---
 gcc/config/i386/sync.md                   |  7 +++++++
 gcc/testsuite/gcc.target/i386/pr80640-2.c | 11 +++++++++++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-2.c

diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 619d53b..35a7b38 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -155,6 +155,11 @@ (define_expand "atomic_load<mode>"
 		       UNSPEC_LDA))]
   ""
 {
+  enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+  if (is_mm_seq_cst (model))
+    expand_asm_memory_barrier ();
+
   /* For DImode on 32-bit, we can use the FPU to perform the load.  */
   if (<MODE>mode == DImode && !TARGET_64BIT)
     emit_insn (gen_atomic_loaddi_fpu
@@ -173,6 +178,8 @@ (define_expand "atomic_load<mode>"
       if (dst != operands[0])
 	emit_move_insn (operands[0], dst);
     }
+  if (!is_mm_relaxed (model))
+    expand_asm_memory_barrier ();
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr80640-2.c b/gcc/testsuite/gcc.target/i386/pr80640-2.c
new file mode 100644
index 0000000..f0a050c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80640-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-ter" } */
+/* { dg-final { scan-assembler-not "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" } } */
+
+int f(int *p, volatile _Bool *p1, _Atomic _Bool *p2)
+{
+  int v = *p;
+  *p1 = !v;
+  while (__atomic_load_n (p2, __ATOMIC_SEQ_CST));
+  return v - *p;
+}