From patchwork Tue May 20 15:24:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 350755 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 403EC14008A for ; Wed, 21 May 2014 01:24:21 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=H3+HgwXqlArfs5uM4q OWldVRvDs8iApQDHUIXfLmWz6XnZ+PU7J5dTO8/o+YxHPAxzVQCXXOtCTGYXK2Ld dNZQ+KysrlfTtU1vTACMHyZOJCEWRztURygPjT4mvjYe421uFRqmL2SP4zdrbiJc Nstna3py+46oAwLmgP0xekohU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=RuNDSYLkUQ1d7WiXbSmGi91r 6g4=; b=gK+FEmSfmTaWk1gOifNmykro4ND2zMo8lvx3sLYAudhCvMdyIt09HpGC HbRJub0zJ7e3U9ZOWuI7KqVS3j9PWpFGV7v+ajiOWeXGczk6Gkqp6gdO8pSBYjNC hCeETmoNPAebAhEF6zeQL7v185AQUcYyZLODJ4KVWlrrdAnwaLI= Received: (qmail 11764 invoked by alias); 20 May 2014 15:24:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 11744 invoked by uid 89); 20 May 2014 15:24:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f180.google.com Received: from mail-ob0-f180.google.com (HELO mail-ob0-f180.google.com) (209.85.214.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 20 May 2014 15:24:09 +0000 Received: by mail-ob0-f180.google.com with SMTP id va2so661545obc.25 for ; Tue, 20 May 2014 08:24:07 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.62.178 with SMTP id z18mr45049807oer.61.1400599447849; Tue, 20 May 2014 08:24:07 -0700 (PDT) Received: by 10.76.151.198 with HTTP; Tue, 20 May 2014 08:24:07 -0700 (PDT) In-Reply-To: <20140520120024.GA52607@msticlxl57.ims.intel.com> References: <201401031220.34808.ebotcazou@adacore.com> <20140103115939.GF892@tucnak.redhat.com> <20140519044801.GA12624@atrey.karlin.mff.cuni.cz> <20140520120024.GA52607@msticlxl57.ims.intel.com> Date: Tue, 20 May 2014 08:24:07 -0700 Message-ID: Subject: Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks. From: "H.J. Lu" To: Kirill Yukhin Cc: Uros Bizjak , Jan Hubicka , Jakub Jelinek , Eric Botcazou , "gcc-patches@gcc.gnu.org" , Richard Henderson X-IsSubscribed: yes On Tue, May 20, 2014 at 5:00 AM, Kirill Yukhin wrote: > Hello, > On 19 May 09:58, H.J. Lu wrote: >> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak wrote: >> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote: >> > >> >>>> Uros, >> >>>> I am looking into libreoffice size and the data alignment seems to make huge >> >>>> difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 4.9, >> >>>> while clang produces 5.2MB. >> >>>> >> >>>> The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but >> >>>> But perhaps we want to revisit the alignment rules. The optimization manuals >> >>>> usually care only about performance critical loops. Perhaps we can make the >> >>>> rules to align only bigger datastructures, or so at least for -O2. >> >>> >> >>> Based on the above quote, "Misaligned data access can incur >> >>> significant performance penalties." and the fact that this particular >> >>> alignment rule has some compatibility issues with previous versions of >> >>> gcc (these were later fixed by Jakub), I'd rather leave this rule as >> >>> is. However, if the access is from the cold section, we can perhaps >> >>> avoid extra alignment, while avoiding those compatibility issues. >> >>> >> >> >> >> It is excessive to align >> >> >> >> struct foo >> >> { >> >> int x1; >> >> int x2; >> >> char x3; >> >> int x4; >> >> int x5; >> >> char x6; >> >> int x7; >> >> int x8; >> >> }; >> >> >> >> to 32 bytes and align >> >> >> >> struct foo >> >> { >> >> int x1; >> >> int x2; >> >> char x3; >> >> int x4; >> >> int x5; >> >> char x6; >> >> int x7[9]; >> >> int x8; >> >> }; >> >> >> >> to 64 bytes. What performance gain does it provide? >> > >> > Avoids "significant performance penalties," perhaps? >> > >> >> Kirill, do we have performance data for excessive alignment >> vs ABI alignment? > Nope, we have no actual data showing performance impact on such changes, > sorry. > > We may try such a change on HSW machine (on Spec 2006), will it be useful? > > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -26576,7 +26576,7 @@ ix86_data_alignment (tree type, int align, bool opt) > used to assume. */ > > int max_align_compat > - = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT); > + = optimize_size ? BITS_PER_WORD : MIN (128, MAX_OFILE_ALIGNMENT); > > /* A data structure, equal or greater than the size of a cache line > (64 bytes in the Pentium 4 and other recent Intel processors, including > > ABI alignment should be sufficient for correctness. Bigger alignments are supposed to give better performance. Can you try this patch on HSW and SLM to see if it has any impact on performance? diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c0a46ed..4879110 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -26568,39 +26568,6 @@ ix86_constant_alignment (tree exp, int align) int ix86_data_alignment (tree type, int align, bool opt) { - /* GCC 4.8 and earlier used to incorrectly assume this alignment even - for symbols from other compilation units or symbols that don't need - to bind locally. In order to preserve some ABI compatibility with - those compilers, ensure we don't decrease alignment from what we - used to assume. */ - - int max_align_compat - = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT); - - /* A data structure, equal or greater than the size of a cache line - (64 bytes in the Pentium 4 and other recent Intel processors, including - processors based on Intel Core microarchitecture) should be aligned - so that its base address is a multiple of a cache line size. */ - - int max_align - = MIN ((unsigned) ix86_tune_cost->prefetch_block * 8, MAX_OFILE_ALIGNMENT); - - if (max_align < BITS_PER_WORD) - max_align = BITS_PER_WORD; - - if (opt - && AGGREGATE_TYPE_P (type) - && TYPE_SIZE (type) - && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) - { - if (wi::geu_p (TYPE_SIZE (type), max_align_compat) - && align < max_align_compat) - align = max_align_compat; - if (wi::geu_p (TYPE_SIZE (type), max_align) - && align < max_align) - align = max_align; - } - /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ if (TARGET_64BIT) @@ -26616,6 +26583,9 @@ ix86_data_alignment (tree type, int align, bool opt) if (!opt) return align; + if (align < BITS_PER_WORD) + align = BITS_PER_WORD; + if (TREE_CODE (type) == ARRAY_TYPE) { if (TYPE_MODE (TREE_TYPE (type)) == DFmode && align < 64)