From patchwork Sun Feb 19 17:33:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 729578 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vRDPw45CFz9s7m for ; Mon, 20 Feb 2017 04:33:38 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="EFfm/ADM"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; q=dns; s=default; b=aVNoU44AQWBVxpO 1bSaupBf90opP4+ScL5NGvj+scXK5LuZIHN208muuz0UbN+34lynxmuJus3Y26JA OohHKRmLfQuKLWinsioT+u48aTbpn78g/8NvdHf1PhLiENoj7XKW9v1ec7WVSlSz Eo6DzJJEqTpQykE0z54FHCQy342c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; s=default; bh=v32CvSkEBbqQ/MBOtmZpv FntRkA=; b=EFfm/ADMg9N+Cc5Ki0zj84s+JOMAew1tFyIxt+9p6G3gSF+pP8iMS NKGxQMY2fjhKo2YVe9K/eqVhLM28WnVOKydnRlLvzDJM6XOrHgS/c+4AR71T3FFS ya/CWC3yJhRgcnYezIiIu+0air8x6dryyvJoOKK1qQoNezlPjpdr1E= Received: (qmail 29924 invoked by alias); 19 Feb 2017 17:33:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29905 invoked by uid 89); 19 Feb 2017 17:33:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPAM_SUBJECT1, SPF_PASS autolearn=ham version=3.3.2 spammy=H*f:sk:wqarJt0, H*f:CAFULd4bkusi, H*i:CAFULd4bkusi, H*i:sk:wqarJt0 X-HELO: mail-ua0-f173.google.com Received: from mail-ua0-f173.google.com (HELO mail-ua0-f173.google.com) (209.85.217.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 19 Feb 2017 17:33:29 +0000 Received: by mail-ua0-f173.google.com with SMTP id 35so53871321uak.1 for ; Sun, 19 Feb 2017 09:33:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vG7qdJWjJ3R34J6R2qhoBwe40t3jQAGnzcF7EWRX98w=; b=g5luVMNDK1uObp4XfWWfEuiIioQwVhwZLVafttvVb5ke5s+5i4tCAqLIjEg59FVE6x VKValv1GdiXJl+SnYrhtgfHQBS42xjzyBOL5NW47lhjNX5iwYhPZo4tfXL2N+/gLF4FV vYA+DwKp63dBjl3TKHLiweMxoro7ltIr0wd/vOd2waumrg23IJoVy2v3xPv9wrwKpWX2 C6aj5rWHd5JqgO3lMOcvJljGu/Y0juF62LffGLJVGe3fsN3N/dXEb2APJTPbCU34fSFn QhNUf+bswTFuEo5ssiwGAwyZ0j5qof/uFsSrw1cHYQPHTuKcCR7HDKec/30iyxFkkSx+ pbUw== X-Gm-Message-State: AMke39kp3Z4It21GMpDm3yBfYKE5KyWGaQcwSLDv2gHBYfRvSIoYoQUycNf+T5ug2CJzabjFbYXcj2vGriy3gw== X-Received: by 10.176.71.234 with SMTP id w42mr575052uac.141.1487525607804; Sun, 19 Feb 2017 09:33:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.87.11 with HTTP; Sun, 19 Feb 2017 09:33:27 -0800 (PST) In-Reply-To: References: <20170217163022.GK1849@tucnak> From: Uros Bizjak Date: Sun, 19 Feb 2017 18:33:27 +0100 Message-ID: Subject: Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse To: Jakub Jelinek Cc: "gcc-patches@gcc.gnu.org" , peter@cordes.ca On Fri, Feb 17, 2017 at 5:59 PM, Uros Bizjak wrote: > On Fri, Feb 17, 2017 at 5:30 PM, Jakub Jelinek wrote: >> On Sun, May 29, 2016 at 11:10:15PM +0200, Uros Bizjak wrote: >>> As explained in PR71245, comment #3 [1], it is better to use offset -4 >>> to a %esp to implement a non-SSE memory fence instruction: >>> >>> -q- >>> >>> I guess it costs a code byte for a disp8 in the addressing mode, but >>> it avoids adding a lot of latency to a critical path involving a >>> spill/reload to (%esp), in functions where there is something at >>> (%esp). >>> >>> If it's an object larger than 4B, the lock orl could even cause a >>> store-forwarding stall when the object is reloaded. (e.g. a double or >>> a vector). >>> >>> Ideally we could do the lock orl on some padding between two locals, >>> or on something in memory that wasn't going to be loaded soon, to >>> avoid touching more stack memory (which might be in the next page >>> down). But we still want to do it on a cache line that's hot, so >>> going way up above our own stack frame isn't good either. >> >> Unfortunately this makes valgrind unhappy about that: >> https://bugzilla.redhat.com/show_bug.cgi?id=1423434 >> I assume it will complain now on anything pre-SSE2 that contains the memory >> barrier in 32-bit code. >> Perhaps we should decrement and increment %esp around it or something >> similar (or push/pop)? Of course, that would mean we need to take care >> of async unwind info. > > Or, we can simply revert the patch? Not that the barrier performance > of non-SSE 32bit targets matter... Attached patch was committed to mainline to revert 2016-05-30 change. 2017-02-19 Uros Bizjak Revert: 2016-05-30 Uros Bizjak * config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)". Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} Uros. --cut here-- Index: config/i386/sync.md =================================================================== --- config/i386/sync.md (revision 245574) +++ config/i386/sync.md (working copy) @@ -98,7 +98,7 @@ (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE)) (clobber (reg:CC FLAGS_REG))] "!(TARGET_64BIT || TARGET_SSE2)" - "lock{%;} or{l}\t{$0, -4(%%esp)|DWORD PTR [esp-4], 0}" + "lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0}" [(set_attr "memory" "unknown")]) --cut here--