diff mbox

[i386,5/8,AVX-512] Extend vectorizer hooks.

Message ID CAMe9rOrbq9d9HaT5D8ix-gsSEGzPifWkZUs5iR+xgFs6jAeiSA@mail.gmail.com
State New
Headers show

Commit Message

H.J. Lu May 20, 2014, 3:24 p.m. UTC
On Tue, May 20, 2014 at 5:00 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> On 19 May 09:58, H.J. Lu wrote:
>> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> >>>> Uros,
>> >>>> I am looking into libreoffice size and the data alignment seems to make huge
>> >>>> difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 4.9,
>> >>>> while clang produces 5.2MB.
>> >>>>
>> >>>> The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but
>> >>>> But perhaps we want to revisit the alignment rules.  The optimization manuals
>> >>>> usually care only about performance critical loops.  Perhaps we can make the
>> >>>> rules to align only bigger datastructures, or so at least for -O2.
>> >>>
>> >>> Based on the above quote, "Misaligned data access can incur
>> >>> significant performance penalties." and the fact that this particular
>> >>> alignment rule has some compatibility issues with previous versions of
>> >>> gcc (these were later fixed by Jakub), I'd rather leave this rule as
>> >>> is. However, if the access is from the cold section, we can perhaps
>> >>> avoid extra alignment, while avoiding those compatibility issues.
>> >>>
>> >>
>> >> It is excessive to align
>> >>
>> >> struct foo
>> >> {
>> >>   int x1;
>> >>   int x2;
>> >>   char x3;
>> >>   int x4;
>> >>   int x5;
>> >>   char x6;
>> >>   int x7;
>> >>   int x8;
>> >> };
>> >>
>> >> to 32 bytes and align
>> >>
>> >> struct foo
>> >> {
>> >>   int x1;
>> >>   int x2;
>> >>   char x3;
>> >>   int x4;
>> >>   int x5;
>> >>   char x6;
>> >>   int x7[9];
>> >>   int x8;
>> >> };
>> >>
>> >> to 64 bytes.  What performance gain does it provide?
>> >
>> > Avoids "significant performance penalties," perhaps?
>> >
>>
>> Kirill, do we have performance data for excessive alignment
>> vs ABI alignment?
> Nope, we have no actual data showing performance impact on such changes,
> sorry.
>
> We may try such a change on HSW machine (on Spec 2006), will it be useful?
>
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -26576,7 +26576,7 @@ ix86_data_alignment (tree type, int align, bool opt)
>       used to assume.  */
>
>    int max_align_compat
> -    = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
> +    = optimize_size ? BITS_PER_WORD : MIN (128, MAX_OFILE_ALIGNMENT);
>
>    /* A data structure, equal or greater than the size of a cache line
>       (64 bytes in the Pentium 4 and other recent Intel processors, including
>
>

ABI alignment should be sufficient for correctness. Bigger alignments
are supposed to give better performance.  Can you try this patch on
HSW and SLM to see if it has any impact on performance?

Comments

Kirill Yukhin May 22, 2014, 9:01 a.m. UTC | #1
Hello,
On 20 May 08:24, H.J. Lu wrote:
> ABI alignment should be sufficient for correctness. Bigger alignments
> are supposed to give better performance.  Can you try this patch on
> HSW and SLM to see if it has any impact on performance?

Here is perf. data of your patch.

Only HSW so far

HSW, 64 bits, base

Test Previous Current Ratio(%)
400.perlbench     37.7000    37.7000 +0%
401.bzip2         24.8000    24.7000 -0.40%
403.gcc           35.1000    35.2000 +0.28%
429.mcf           41.7000    42.0000 +0.71%
445.gobmk         26.9000    27.0000 +0.37%
456.hmmer         27.2000    27.2000 +0%
458.sjeng         30.2000    30.1000 -0.33%
462.libquantum    77.4000    76.7000 -0.90%
464.h264ref       52.5000    52.8000 +0.57%
471.omnetpp       23.8000    23.7000 -0.42%
473.astar         23.2000    23.1000 -0.43%
483.xalancbmk     39.8000    40.1000 +0.75%
410.bwaves        78.4000    78.5000 +0.12%
416.gamess        33.9000    33.9000 +0%
433.milc          34.7000    34.8000 +0.28%
434.zeusmp        38.6000    38.4000 -0.51%
435.gromacs       26.9000    27.0000 +0.37%
436.cactusADM     54.7000    62.0000 +13.34%
437.leslie3d      45.3000    45.3000 +0%
444.namd          27.2000    27.1000 -0.36%
447.dealII        56.7000    56.7000 +0%
450.soplex        39.3000    39.3000 +0%
453.povray        49.0000    49.1000 +0.20%
454.calculix      28.8000    29.3000 +1.73%
459.GemsFDTD      38.9000    39.0000 +0.25%
465.tonto         23.1000    23.3000 +0.86%
470.lbm           55.3000    55.6000 +0.54%
481.wrf           40.8000    40.8000 +0%
482.sphinx3       47.8000    47.9000 +0.20%

HSW, 64 bits, o2

Test Previous Current Ratio(%)
400.perlbench     39.7000    39.7000 +0%
401.bzip2         25.1000    25.1000 +0%
403.gcc           33.7000    33.7000 +0%
429.mcf           40.1000    39.9000 -0.49%
445.gobmk         26.5000    26.4000 -0.37%
456.hmmer         24.8000    24.8000 +0%
458.sjeng         28.4000    28.5000 +0.35%
462.libquantum    74.4000    74.4000 +0%
464.h264ref       50.1000    50.3000 +0.39%
471.omnetpp       22.6000    22.5000 -0.44%
473.astar         20.7000    21.0000 +1.44%
483.xalancbmk     37.0000    37.4000 +1.08%
410.bwaves        60.1000    60.1000 +0%
416.gamess        35.5000    35.4000 -0.28%
433.milc          29.9000    29.8000 -0.33%
434.zeusmp        34.8000    34.6000 -0.57%
435.gromacs       27.4000    27.5000 +0.36%
436.cactusADM     32.3000    33.8000 +4.64%
437.leslie3d      32.6000    32.6000 +0%
444.namd          26.9000    26.9000 +0%
447.dealII        45.2000    45.1000 -0.22%
450.soplex        42.5000    42.6000 +0.23%
453.povray        45.9000    46.1000 +0.43%
454.calculix      12.9000    12.9000 +0%
459.GemsFDTD      38.6000    38.7000 +0.25%
465.tonto         23.7000    23.8000 +0.42%
470.lbm           56.7000    56.7000 +0%
481.wrf           28.9000    28.9000 +0%
482.sphinx3       43.9000    43.8000 -0.22%

--
Thanks, K
H.J. Lu May 22, 2014, 5:38 p.m. UTC | #2
On Thu, May 22, 2014 at 2:01 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> On 20 May 08:24, H.J. Lu wrote:
>> ABI alignment should be sufficient for correctness. Bigger alignments
>> are supposed to give better performance.  Can you try this patch on
>> HSW and SLM to see if it has any impact on performance?
>
> Here is perf. data of your patch.
>
> Only HSW so far
>
> HSW, 64 bits, base
>
> Test Previous Current Ratio(%)
> 400.perlbench     37.7000    37.7000 +0%
> 401.bzip2         24.8000    24.7000 -0.40%
> 403.gcc           35.1000    35.2000 +0.28%
> 429.mcf           41.7000    42.0000 +0.71%
> 445.gobmk         26.9000    27.0000 +0.37%
> 456.hmmer         27.2000    27.2000 +0%
> 458.sjeng         30.2000    30.1000 -0.33%
> 462.libquantum    77.4000    76.7000 -0.90%
> 464.h264ref       52.5000    52.8000 +0.57%
> 471.omnetpp       23.8000    23.7000 -0.42%
> 473.astar         23.2000    23.1000 -0.43%
> 483.xalancbmk     39.8000    40.1000 +0.75%
> 410.bwaves        78.4000    78.5000 +0.12%
> 416.gamess        33.9000    33.9000 +0%
> 433.milc          34.7000    34.8000 +0.28%
> 434.zeusmp        38.6000    38.4000 -0.51%
> 435.gromacs       26.9000    27.0000 +0.37%
> 436.cactusADM     54.7000    62.0000 +13.34%
> 437.leslie3d      45.3000    45.3000 +0%
> 444.namd          27.2000    27.1000 -0.36%
> 447.dealII        56.7000    56.7000 +0%
> 450.soplex        39.3000    39.3000 +0%
> 453.povray        49.0000    49.1000 +0.20%
> 454.calculix      28.8000    29.3000 +1.73%
> 459.GemsFDTD      38.9000    39.0000 +0.25%
> 465.tonto         23.1000    23.3000 +0.86%
> 470.lbm           55.3000    55.6000 +0.54%
> 481.wrf           40.8000    40.8000 +0%
> 482.sphinx3       47.8000    47.9000 +0.20%
>
> HSW, 64 bits, o2
>
> Test Previous Current Ratio(%)
> 400.perlbench     39.7000    39.7000 +0%
> 401.bzip2         25.1000    25.1000 +0%
> 403.gcc           33.7000    33.7000 +0%
> 429.mcf           40.1000    39.9000 -0.49%
> 445.gobmk         26.5000    26.4000 -0.37%
> 456.hmmer         24.8000    24.8000 +0%
> 458.sjeng         28.4000    28.5000 +0.35%
> 462.libquantum    74.4000    74.4000 +0%
> 464.h264ref       50.1000    50.3000 +0.39%
> 471.omnetpp       22.6000    22.5000 -0.44%
> 473.astar         20.7000    21.0000 +1.44%
> 483.xalancbmk     37.0000    37.4000 +1.08%
> 410.bwaves        60.1000    60.1000 +0%
> 416.gamess        35.5000    35.4000 -0.28%
> 433.milc          29.9000    29.8000 -0.33%
> 434.zeusmp        34.8000    34.6000 -0.57%
> 435.gromacs       27.4000    27.5000 +0.36%
> 436.cactusADM     32.3000    33.8000 +4.64%
> 437.leslie3d      32.6000    32.6000 +0%
> 444.namd          26.9000    26.9000 +0%
> 447.dealII        45.2000    45.1000 -0.22%
> 450.soplex        42.5000    42.6000 +0.23%
> 453.povray        45.9000    46.1000 +0.43%
> 454.calculix      12.9000    12.9000 +0%
> 459.GemsFDTD      38.6000    38.7000 +0.25%
> 465.tonto         23.7000    23.8000 +0.42%
> 470.lbm           56.7000    56.7000 +0%
> 481.wrf           28.9000    28.9000 +0%
> 482.sphinx3       43.9000    43.8000 -0.22%
>

So extra alignment doesn't have any performance impact
on HSW with SPEC CPU 2006.  So the question is if using
alignment specified by ABI will cause any correctness issue.
I am concerned about the comment:

  /* GCC 4.8 and earlier used to incorrectly assume this alignment even
     for symbols from other compilation units or symbols that don't need
     to bind locally.  In order to preserve some ABI compatibility with
     those compilers, ensure we don't decrease alignment from what we
     used to assume.  */

Jakub,  will we into any correctness issue if ix86_data_alignment
always returns ABI alignment?
H.J. Lu May 30, 2014, 5:23 p.m. UTC | #3
On Thu, May 22, 2014 at 10:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, May 22, 2014 at 2:01 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Hello,
>> On 20 May 08:24, H.J. Lu wrote:
>>> ABI alignment should be sufficient for correctness. Bigger alignments
>>> are supposed to give better performance.  Can you try this patch on
>>> HSW and SLM to see if it has any impact on performance?
>>
>> Here is perf. data of your patch.
>>
>> Only HSW so far
>>
>> HSW, 64 bits, base
>>
>> Test Previous Current Ratio(%)
>> 400.perlbench     37.7000    37.7000 +0%
>> 401.bzip2         24.8000    24.7000 -0.40%
>> 403.gcc           35.1000    35.2000 +0.28%
>> 429.mcf           41.7000    42.0000 +0.71%
>> 445.gobmk         26.9000    27.0000 +0.37%
>> 456.hmmer         27.2000    27.2000 +0%
>> 458.sjeng         30.2000    30.1000 -0.33%
>> 462.libquantum    77.4000    76.7000 -0.90%
>> 464.h264ref       52.5000    52.8000 +0.57%
>> 471.omnetpp       23.8000    23.7000 -0.42%
>> 473.astar         23.2000    23.1000 -0.43%
>> 483.xalancbmk     39.8000    40.1000 +0.75%
>> 410.bwaves        78.4000    78.5000 +0.12%
>> 416.gamess        33.9000    33.9000 +0%
>> 433.milc          34.7000    34.8000 +0.28%
>> 434.zeusmp        38.6000    38.4000 -0.51%
>> 435.gromacs       26.9000    27.0000 +0.37%
>> 436.cactusADM     54.7000    62.0000 +13.34%
>> 437.leslie3d      45.3000    45.3000 +0%
>> 444.namd          27.2000    27.1000 -0.36%
>> 447.dealII        56.7000    56.7000 +0%
>> 450.soplex        39.3000    39.3000 +0%
>> 453.povray        49.0000    49.1000 +0.20%
>> 454.calculix      28.8000    29.3000 +1.73%
>> 459.GemsFDTD      38.9000    39.0000 +0.25%
>> 465.tonto         23.1000    23.3000 +0.86%
>> 470.lbm           55.3000    55.6000 +0.54%
>> 481.wrf           40.8000    40.8000 +0%
>> 482.sphinx3       47.8000    47.9000 +0.20%
>>
>> HSW, 64 bits, o2
>>
>> Test Previous Current Ratio(%)
>> 400.perlbench     39.7000    39.7000 +0%
>> 401.bzip2         25.1000    25.1000 +0%
>> 403.gcc           33.7000    33.7000 +0%
>> 429.mcf           40.1000    39.9000 -0.49%
>> 445.gobmk         26.5000    26.4000 -0.37%
>> 456.hmmer         24.8000    24.8000 +0%
>> 458.sjeng         28.4000    28.5000 +0.35%
>> 462.libquantum    74.4000    74.4000 +0%
>> 464.h264ref       50.1000    50.3000 +0.39%
>> 471.omnetpp       22.6000    22.5000 -0.44%
>> 473.astar         20.7000    21.0000 +1.44%
>> 483.xalancbmk     37.0000    37.4000 +1.08%
>> 410.bwaves        60.1000    60.1000 +0%
>> 416.gamess        35.5000    35.4000 -0.28%
>> 433.milc          29.9000    29.8000 -0.33%
>> 434.zeusmp        34.8000    34.6000 -0.57%
>> 435.gromacs       27.4000    27.5000 +0.36%
>> 436.cactusADM     32.3000    33.8000 +4.64%
>> 437.leslie3d      32.6000    32.6000 +0%
>> 444.namd          26.9000    26.9000 +0%
>> 447.dealII        45.2000    45.1000 -0.22%
>> 450.soplex        42.5000    42.6000 +0.23%
>> 453.povray        45.9000    46.1000 +0.43%
>> 454.calculix      12.9000    12.9000 +0%
>> 459.GemsFDTD      38.6000    38.7000 +0.25%
>> 465.tonto         23.7000    23.8000 +0.42%
>> 470.lbm           56.7000    56.7000 +0%
>> 481.wrf           28.9000    28.9000 +0%
>> 482.sphinx3       43.9000    43.8000 -0.22%
>>
>
> So extra alignment doesn't have any performance impact
> on HSW with SPEC CPU 2006.  So the question is if using
> alignment specified by ABI will cause any correctness issue.
> I am concerned about the comment:
>
>   /* GCC 4.8 and earlier used to incorrectly assume this alignment even
>      for symbols from other compilation units or symbols that don't need
>      to bind locally.  In order to preserve some ABI compatibility with
>      those compilers, ensure we don't decrease alignment from what we
>      used to assume.  */
>
> Jakub,  will we into any correctness issue if ix86_data_alignment
> always returns ABI alignment?
>

The ABI issue is

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564

That is we shouldn't use alignment beyond ABI if the definition
may come from a different compilation unit.   But DATA_ALIGNMENT
is only for optimization purpose.  I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61296

for excessive x86 data alignment.  It looks like that started as
an accident from:

https://gcc.gnu.org/ml/gcc-patches/2000-06/msg00871.html

which aligned struct >= 32 bytes to 32 bytes.  But alignment beyond
natural alignment doesn't provide performance benefit.  Should
we limit DATA_ALIGNMENT to MAX (ABI alignment, natural alignment)?
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c0a46ed..4879110 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -26568,39 +26568,6 @@  ix86_constant_alignment (tree exp, int align)
 int
 ix86_data_alignment (tree type, int align, bool opt)
 {
-  /* GCC 4.8 and earlier used to incorrectly assume this alignment even
-     for symbols from other compilation units or symbols that don't need
-     to bind locally.  In order to preserve some ABI compatibility with
-     those compilers, ensure we don't decrease alignment from what we
-     used to assume.  */
-
-  int max_align_compat
-    = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
-
-  /* A data structure, equal or greater than the size of a cache line
-     (64 bytes in the Pentium 4 and other recent Intel processors, including
-     processors based on Intel Core microarchitecture) should be aligned
-     so that its base address is a multiple of a cache line size.  */
-
-  int max_align
-    = MIN ((unsigned) ix86_tune_cost->prefetch_block * 8, MAX_OFILE_ALIGNMENT);
-
-  if (max_align < BITS_PER_WORD)
-    max_align = BITS_PER_WORD;
-
-  if (opt
-      && AGGREGATE_TYPE_P (type)
-      && TYPE_SIZE (type)
-      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
-    {
-      if (wi::geu_p (TYPE_SIZE (type), max_align_compat)
-  && align < max_align_compat)
- align = max_align_compat;
-       if (wi::geu_p (TYPE_SIZE (type), max_align)
-   && align < max_align)
- align = max_align;
-    }
-
   /* x86-64 ABI requires arrays greater than 16 bytes to be aligned
      to 16byte boundary.  */
   if (TARGET_64BIT)
@@ -26616,6 +26583,9 @@  ix86_data_alignment (tree type, int align, bool opt)
   if (!opt)
     return align;

+  if (align < BITS_PER_WORD)
+    align = BITS_PER_WORD;
+
   if (TREE_CODE (type) == ARRAY_TYPE)
     {
       if (TYPE_MODE (TREE_TYPE (type)) == DFmode && align < 64)