From patchwork Tue May 20 15:24:07 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "H.J. Lu" <hjl.tools@gmail.com>
X-Patchwork-Id: 350755
Return-Path: 
 <gcc-patches-return-368020-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 403EC14008A
	for <incoming@patchwork.ozlabs.org>;
	Wed, 21 May 2014 01:24:21 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; q=dns; s=default; b=H3+HgwXqlArfs5uM4q
	OWldVRvDs8iApQDHUIXfLmWz6XnZ+PU7J5dTO8/o+YxHPAxzVQCXXOtCTGYXK2Ld
	dNZQ+KysrlfTtU1vTACMHyZOJCEWRztURygPjT4mvjYe421uFRqmL2SP4zdrbiJc
	Nstna3py+46oAwLmgP0xekohU=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; s=default; bh=RuNDSYLkUQ1d7WiXbSmGi91r
	6g4=; b=gK+FEmSfmTaWk1gOifNmykro4ND2zMo8lvx3sLYAudhCvMdyIt09HpGC
	HbRJub0zJ7e3U9ZOWuI7KqVS3j9PWpFGV7v+ajiOWeXGczk6Gkqp6gdO8pSBYjNC
	hCeETmoNPAebAhEF6zeQL7v185AQUcYyZLODJ4KVWlrrdAnwaLI=
Received: (qmail 11764 invoked by alias); 20 May 2014 15:24:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 11744 invoked by uid 89); 20 May 2014 15:24:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-ob0-f180.google.com
Received: from mail-ob0-f180.google.com (HELO mail-ob0-f180.google.com)
	(209.85.214.180) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Tue, 20 May 2014 15:24:09 +0000
Received: by mail-ob0-f180.google.com with SMTP id va2so661545obc.25 for
	<gcc-patches@gcc.gnu.org>; Tue, 20 May 2014 08:24:07 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.60.62.178 with SMTP id z18mr45049807oer.61.1400599447849;
	Tue, 20 May 2014 08:24:07 -0700 (PDT)
Received: by 10.76.151.198 with HTTP; Tue, 20 May 2014 08:24:07 -0700 (PDT)
In-Reply-To: <20140520120024.GA52607@msticlxl57.ims.intel.com>
References: <201401031220.34808.ebotcazou@adacore.com>
	<CAFULd4ZvCFhW=VhhQ89Zp6KYPVjjDET6f71cu-iEFCBDmTFBtQ@mail.gmail.com>
	<20140103115939.GF892@tucnak.redhat.com>
	<CAFULd4bhLUho1Yj9m5=vvpEFvyk5XGEhY5SdTjrzgDxN6s2Oqw@mail.gmail.com>
	<CAFULd4a8g2GCLYkBpXoszsofmCbienNZzqNHxOqEB_n3rjCFpw@mail.gmail.com>
	<20140519044801.GA12624@atrey.karlin.mff.cuni.cz>
	<CAFULd4Ymfa8wx2Pgi=t_zh9DwAPXYdbhM86=dTWdD4xR8bv9xw@mail.gmail.com>
	<CAMe9rOqvged6m_RGXLf2BiLjKqtb5LZ0ni_n6sS=8VYOsaXFOg@mail.gmail.com>
	<CAFULd4YhGtNSNXDZdU5oAGmhRhFqkKWJ3XPAcEesXT4yNvEgqA@mail.gmail.com>
	<CAMe9rOqLxNULACB+ZFtnmXvhh8v5b14UioLhCweryX5_tM9fsA@mail.gmail.com>
	<20140520120024.GA52607@msticlxl57.ims.intel.com>
Date: Tue, 20 May 2014 08:24:07 -0700
Message-ID: 
 <CAMe9rOrbq9d9HaT5D8ix-gsSEGzPifWkZUs5iR+xgFs6jAeiSA@mail.gmail.com>
Subject: Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Kirill Yukhin <kirill.yukhin@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, Jan Hubicka <hubicka@ucw.cz>,
	Jakub Jelinek <jakub@redhat.com>, Eric Botcazou <ebotcazou@adacore.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Richard Henderson <rth@redhat.com>
X-IsSubscribed: yes

On Tue, May 20, 2014 at 5:00 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> On 19 May 09:58, H.J. Lu wrote:
>> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> >>>> Uros,
>> >>>> I am looking into libreoffice size and the data alignment seems to make huge
>> >>>> difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 4.9,
>> >>>> while clang produces 5.2MB.
>> >>>>
>> >>>> The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but
>> >>>> But perhaps we want to revisit the alignment rules.  The optimization manuals
>> >>>> usually care only about performance critical loops.  Perhaps we can make the
>> >>>> rules to align only bigger datastructures, or so at least for -O2.
>> >>>
>> >>> Based on the above quote, "Misaligned data access can incur
>> >>> significant performance penalties." and the fact that this particular
>> >>> alignment rule has some compatibility issues with previous versions of
>> >>> gcc (these were later fixed by Jakub), I'd rather leave this rule as
>> >>> is. However, if the access is from the cold section, we can perhaps
>> >>> avoid extra alignment, while avoiding those compatibility issues.
>> >>>
>> >>
>> >> It is excessive to align
>> >>
>> >> struct foo
>> >> {
>> >>   int x1;
>> >>   int x2;
>> >>   char x3;
>> >>   int x4;
>> >>   int x5;
>> >>   char x6;
>> >>   int x7;
>> >>   int x8;
>> >> };
>> >>
>> >> to 32 bytes and align
>> >>
>> >> struct foo
>> >> {
>> >>   int x1;
>> >>   int x2;
>> >>   char x3;
>> >>   int x4;
>> >>   int x5;
>> >>   char x6;
>> >>   int x7[9];
>> >>   int x8;
>> >> };
>> >>
>> >> to 64 bytes.  What performance gain does it provide?
>> >
>> > Avoids "significant performance penalties," perhaps?
>> >
>>
>> Kirill, do we have performance data for excessive alignment
>> vs ABI alignment?
> Nope, we have no actual data showing performance impact on such changes,
> sorry.
>
> We may try such a change on HSW machine (on Spec 2006), will it be useful?
>
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -26576,7 +26576,7 @@ ix86_data_alignment (tree type, int align, bool opt)
>       used to assume.  */
>
>    int max_align_compat
> -    = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
> +    = optimize_size ? BITS_PER_WORD : MIN (128, MAX_OFILE_ALIGNMENT);
>
>    /* A data structure, equal or greater than the size of a cache line
>       (64 bytes in the Pentium 4 and other recent Intel processors, including
>
>

ABI alignment should be sufficient for correctness. Bigger alignments
are supposed to give better performance.  Can you try this patch on
HSW and SLM to see if it has any impact on performance?

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c0a46ed..4879110 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -26568,39 +26568,6 @@ ix86_constant_alignment (tree exp, int align)
 int
 ix86_data_alignment (tree type, int align, bool opt)
 {
-  /* GCC 4.8 and earlier used to incorrectly assume this alignment even
-     for symbols from other compilation units or symbols that don't need
-     to bind locally.  In order to preserve some ABI compatibility with
-     those compilers, ensure we don't decrease alignment from what we
-     used to assume.  */
-
-  int max_align_compat
-    = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
-
-  /* A data structure, equal or greater than the size of a cache line
-     (64 bytes in the Pentium 4 and other recent Intel processors, including
-     processors based on Intel Core microarchitecture) should be aligned
-     so that its base address is a multiple of a cache line size.  */
-
-  int max_align
-    = MIN ((unsigned) ix86_tune_cost->prefetch_block * 8, MAX_OFILE_ALIGNMENT);
-
-  if (max_align < BITS_PER_WORD)
-    max_align = BITS_PER_WORD;
-
-  if (opt
-      && AGGREGATE_TYPE_P (type)
-      && TYPE_SIZE (type)
-      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
-    {
-      if (wi::geu_p (TYPE_SIZE (type), max_align_compat)
-  && align < max_align_compat)
- align = max_align_compat;
-       if (wi::geu_p (TYPE_SIZE (type), max_align)
-   && align < max_align)
- align = max_align;
-    }
-
   /* x86-64 ABI requires arrays greater than 16 bytes to be aligned
      to 16byte boundary.  */
   if (TARGET_64BIT)
@@ -26616,6 +26583,9 @@ ix86_data_alignment (tree type, int align, bool opt)
   if (!opt)
     return align;

+  if (align < BITS_PER_WORD)
+    align = BITS_PER_WORD;
+
   if (TREE_CODE (type) == ARRAY_TYPE)
     {
       if (TYPE_MODE (TREE_TYPE (type)) == DFmode && align < 64)