From patchwork Wed Mar 8 16:35:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 736647 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vdfKX3H2nz9sNw for ; Thu, 9 Mar 2017 03:35:56 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="xg5WZmLQ"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=AkR4kwLD1lEBoBbujuexoam1pl/kmpeOBHp9TvqF3O3 JAbSMcbkkhKqjzKZ9r+YPFYXH4pdY4JC8ZB838Q9O7ko4lDuks6845URh55GKp7H KQF3z/VuXmv0f68ZhoFTYuCiKfguzh+Bc83ybSqlBBcu7iFaLbjbTHzcO4SEZ9M8 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=FXnkOcogpsxhKxzuDjflHzk7aMk=; b=xg5WZmLQ4do0UI4Ih AwkaR9U4QatM7D78XwUkMyDbYlpFz9sMG3iUqMnIA7ucUQot9m5wpzY3jNAd482a PeTfM5yYtobLUNpAdkhx/2R2vC5GTl60boUzCkiAcD6WzOMAafRKNfNj6ps5vkn1 01AZVfZJXdf8MRd/KS/kbonxHQ= Received: (qmail 35268 invoked by alias); 8 Mar 2017 16:35:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 33173 invoked by uid 89); 8 Mar 2017 16:35:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=__sync X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 08 Mar 2017 16:35:42 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E40F7C28; Wed, 8 Mar 2017 08:35:40 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1A9CD3F575; Wed, 8 Mar 2017 08:35:39 -0800 (PST) Message-ID: <58C032DA.2070004@foss.arm.com> Date: Wed, 08 Mar 2017 16:35:38 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Emit tighter strong atomic compare-exchange loop when comparing against zero Hi all, For the testcase in this patch where the value of x is zero we currently generate: foo: mov w1, 4 .L2: ldaxr w2, [x0] cmp w2, 0 bne .L3 stxr w3, w1, [x0] cbnz w3, .L2 .L3: cset w0, eq ret We currently cannot merge the cmp and b.ne inside the loop into a cbnz because we need the condition flags set for the return value of the function (i.e. the cset at the end). But if we re-jig the sequence in that case we can generate a tighter loop: foo: mov w1, 4 .L2: ldaxr w2, [x0] cbnz w2, .L3 stxr w3, w1, [x0] cbnz w3, .L2 .L3: cmp w2, 0 cset w0, eq ret So we add an explicit compare after the loop and inside the loop we use the fact that we're comparing against zero to emit a CBNZ. This means we may re-do the comparison twice (once inside the CBNZ, once at the CMP at the end), but there is now less code inside the loop. I've seen this sequence appear in glibc locking code so maybe it's worth adding the extra bit of complexity to the compare-exchange splitter to catch this case. Bootstrapped and tested on aarch64-none-linux-gnu. In previous iterations of the patch where I had gotten some logic wrong it would cause miscompiles of libgomp leading to timeouts in its testsuite but this version passes everything cleanly. Ok for GCC 8? (I know it's early, but might as well get it out in case someone wants to try it out) Thanks, Kyrill 2017-03-08 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Emit CBNZ inside loop when doing a strong exchange and comparing against zero. Generate the CC flags after the loop. 2017-03-08 Kyrylo Tkachov * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 76a2de20dfcd4ea38fb7c58a9e8612509c5987bd..5fa8e197328ce4cb1718ff7d99b1ea95e02129a4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12095,6 +12095,17 @@ aarch64_split_compare_and_swap (rtx operands[]) mode = GET_MODE (mem); model = memmodel_from_int (INTVAL (model_rtx)); + /* When OLDVAL is zero and we want the strong version we can emit a tighter + loop: + .label1: + LD[A]XR rval, [mem] + CBNZ rval, .label2 + ST[L]XR scratch, newval, [mem] + CBNZ scratch, .label1 + .label2: + CMP rval, 0. */ + bool strong_zero_p = !is_weak && oldval == const0_rtx; + label1 = NULL; if (!is_weak) { @@ -12111,11 +12122,21 @@ aarch64_split_compare_and_swap (rtx operands[]) else aarch64_emit_load_exclusive (mode, rval, mem, model_rtx); - cond = aarch64_gen_compare_reg (NE, rval, oldval); - x = gen_rtx_NE (VOIDmode, cond, const0_rtx); - x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, - gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); - aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + if (strong_zero_p) + { + x = gen_rtx_NE (VOIDmode, rval, const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + } + else + { + cond = aarch64_gen_compare_reg (NE, rval, oldval); + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + } aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx); @@ -12134,7 +12155,15 @@ aarch64_split_compare_and_swap (rtx operands[]) } emit_label (label2); - + /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL + to set the condition flags. If this is not used it will be removed by + later passes. */ + if (strong_zero_p) + { + cond = gen_rtx_REG (CCmode, CC_REGNUM); + x = gen_rtx_COMPARE (CCmode, rval, const0_rtx); + emit_insn (gen_rtx_SET (cond, x)); + } /* Emit any final barrier needed for a __sync operation. */ if (is_mm_sync (model)) aarch64_emit_post_barrier (model); diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c new file mode 100644 index 0000000000000000000000000000000000000000..b14a7c294376f03cd13077d18d865f83a04bd04e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +foo (int *a) +{ + int x = 0; + return __atomic_compare_exchange_n (a, &x, 4, 0, + __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE); +} + +/* { dg-final { scan-assembler-times "cbnz\\tw\[0-9\]+" 2 } } */