Patchwork Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware.

login
register
mail settings
Submitter Fang, Changpeng
Date June 27, 2011, 10:33 p.m.
Message ID <D4C76825A6780047854A11E93CDE84D005980DC71A@SAUSEXMBP01.amd.com>
Download mbox | patch
Permalink /patch/102287/
State New
Headers show

Comments

Fang, Changpeng - June 27, 2011, 10:33 p.m.
Hi,

Attached are the patches we propose to backport to gcc 4.6 branch which are related to avx256 unaligned load/store splitting.
As we mentioned before,  The combined effect of these patches are positive on both AMD and Intel CPUs on cpu2006 and
polyhedron 2005.

0001-Split-32-byte-AVX-unaligned-load-store.patch
Initial patch that implements unaligned load/store splitting

0001-Don-t-assert-unaligned-256bit-load-store.patch
Remove the assert.

0001-Fix-a-typo-in-mavx256-split-unaligned-store.patch
Fix a typo.

0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch
Disable unaligned load splitting for bdver1.

All these patches are in 4.7 trunk.

Bootstrap and tests are on-going in gcc 4.6 branch.

Is It OK to commit to 4.6 branch as long as the tests pass?

Thanks,

Changpeng
Richard Guenther - June 28, 2011, 8:48 a.m.
On Tue, Jun 28, 2011 at 12:33 AM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
> Hi,
>
> Attached are the patches we propose to backport to gcc 4.6 branch which are related to avx256 unaligned load/store splitting.
> As we mentioned before,  The combined effect of these patches are positive on both AMD and Intel CPUs on cpu2006 and
> polyhedron 2005.
>
> 0001-Split-32-byte-AVX-unaligned-load-store.patch
> Initial patch that implements unaligned load/store splitting
>
> 0001-Don-t-assert-unaligned-256bit-load-store.patch
> Remove the assert.
>
> 0001-Fix-a-typo-in-mavx256-split-unaligned-store.patch
> Fix a typo.
>
> 0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch
> Disable unaligned load splitting for bdver1.
>
> All these patches are in 4.7 trunk.
>
> Bootstrap and tests are on-going in gcc 4.6 branch.
>
> Is It OK to commit to 4.6 branch as long as the tests pass?

Yes, if they have been approved and checked in for trunk.

Thanks,
Richard.

> Thanks,
>
> Changpeng
>
>
>
> ________________________________________
> From: Jagasia, Harsha
> Sent: Monday, June 20, 2011 12:03 PM
> To: 'H.J. Lu'
> Cc: 'gcc-patches@gcc.gnu.org'; 'hubicka@ucw.cz'; 'ubizjak@gmail.com'; 'hongjiu.lu@intel.com'; Fang, Changpeng
> Subject: RE: Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware.
>
>> On Mon, Jun 20, 2011 at 9:58 AM,  <harsha.jagasia@amd.com> wrote:
>> > Is it ok to backport patches, with Changelogs below, already in trunk
>> to gcc
>> > 4.6? These patches are for AVX-256bit load store splitting. These
>> patches
>> > make significant performance difference >=3% to several CPU2006 and
>> > Polyhedron benchmarks on latest AMD and Intel hardware. If ok, I will
>> post
>> > backported patches for commit approval.
>> >
>> > AMD plans to submit additional patches on AVX-256 load/store
>> splitting to
>> > trunk. We will send additional backport requests for those later once
>> they
>> > are accepted/comitted to trunk.
>> >
>>
>> Since we will make some changes on trunk, I would prefer to to do
>> the backport after trunk change is finished.
>
> Ok, thanks. Adding Changpeng who is working on the trunk changes.
>
> Harsha
>
>

Patch

From 50310fc367348b406fc88d54c3ab54d1a304ad52 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@huainan.(none)>
Date: Mon, 13 Jun 2011 13:13:32 -0700
Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load/store only when beneficial

	* config/i386/i386.c (avx256_split_unaligned_load): New definition.
	  (avx256_split_unaligned_store): New definition.
	  (ix86_option_override_internal): Enable avx256 unaligned load(store)
	  splitting only when avx256_split_unaligned_load(store) is set.
---
 gcc/config/i386/i386.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b266b9..3bc0b53 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2121,6 +2121,12 @@  static const unsigned int x86_arch_always_fancy_math_387
   = m_PENT | m_ATOM | m_PPRO | m_AMD_MULTIPLE | m_PENT4
     | m_NOCONA | m_CORE2I7 | m_GENERIC;
 
+static const unsigned int x86_avx256_split_unaligned_load
+  = m_COREI7 | m_GENERIC;
+
+static const unsigned int x86_avx256_split_unaligned_store
+  = m_COREI7 | m_BDVER1 | m_GENERIC;
+
 /* In case the average insn count for single function invocation is
    lower than this constant, emit fast (but longer) prologue and
    epilogue code.  */
@@ -4194,9 +4200,11 @@  ix86_option_override_internal (bool main_args_p)
 	  if (flag_expensive_optimizations
 	      && !(target_flags_explicit & MASK_VZEROUPPER))
 	    target_flags |= MASK_VZEROUPPER;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
+	  if ((x86_avx256_split_unaligned_load & ix86_tune_mask)
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
+	  if ((x86_avx256_split_unaligned_store & ix86_tune_mask)
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
 	}
     }
-- 
1.7.0.4