diff mbox

[PR,49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

Message ID D4C76825A6780047854A11E93CDE84D005980DC6F8@SAUSEXMBP01.amd.com
State New
Headers show

Commit Message

Fang, Changpeng June 16, 2011, 11:54 p.m. UTC
Hi, 

 I modify the patch to disable unaligned load splitting only for bdver1 at this moment.
Unaligned load splitting degrades CFP2006 by 1.3% in geomean for both -mtune=bdver1 and
-mtune=generic on Bulldozer. However, we agree with H.J's suggestion to determine
the optimal optimization sets for modern cpus.

Is is OK to commit the attached patch?

Thanks,

Changpeng



>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>>
>>

>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>change and did suggest a different approach for -mtune=generic.


.

Comments

H.J. Lu June 17, 2011, 12:53 a.m. UTC | #1
On Thu, Jun 16, 2011 at 4:54 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
>  I modify the patch to disable unaligned load splitting only for bdver1 at this moment.
> Unaligned load splitting degrades CFP2006 by 1.3% in geomean for both -mtune=bdver1 and
> -mtune=generic on Bulldozer. However, we agree with H.J's suggestion to determine
> the optimal optimization sets for modern cpus.
>
> Is is OK to commit the attached patch?
>
> Thanks,

Why not just move AVX256_SPLIT_UNALIGNED_STORE
and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?

H.J.
> Changpeng
>
>
>
>>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>>>
>>>
>
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
>
> .
>
>
Fang, Changpeng June 17, 2011, 5:45 p.m. UTC | #2
>Why not just move AVX256_SPLIT_UNALIGNED_STORE
>and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?

I would like to keep the -m option so that at least we can explicitly turn
off the splittings when regressions are found!

By the way, I can add an index for store splitting, if you want.

Thanks,

Changpeng




H.J.
> Changpeng
>
>
>
>>> So, is it OK to commit this patch to trunk, and H.J's original patch + this to 4.6 branch?
>>>
>>>
>
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
>
> .
>
>



--
H.J.
H.J. Lu June 17, 2011, 6:08 p.m. UTC | #3
On Fri, Jun 17, 2011 at 10:45 AM, Fang, Changpeng
<Changpeng.Fang@amd.com> wrote:
>>Why not just move AVX256_SPLIT_UNALIGNED_STORE
>>and AVX256_SPLIT_UNALIGNED_LOAD to ix86_tune_indices?
>
> I would like to keep the -m option so that at least we can explicitly turn
> off the splittings when regressions are found!

I prefer to implement it the same way as:

x86_accumulate_outgoing_args
x86_arch_always_fancy_math_387

> By the way, I can add an index for store splitting, if you want.
>

Yes, please.
diff mbox

Patch

From 913a31b425759ac3427a365646de866161a7908a Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@huainan.(none)>
Date: Mon, 13 Jun 2011 13:13:32 -0700
Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load only when beneficial

	* config/i386/i386.h (ix86_tune_indices): Introduce
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL.
	  (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL): New definition.

	* config/i386/i386.c (ix86_tune_features): Add entry for
	  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL.
	  (ix86_option_override_internal): Enable avx256 unaligned load splitting
	  only when TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL is set.
---
 gcc/config/i386/i386.c |   10 ++++++++--
 gcc/config/i386/i386.h |    3 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b266b9..82e6d3e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2088,7 +2088,12 @@  static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching
      at -O3.  For the moment, the prefetching seems badly tuned for Intel
      chips.  */
-  m_K6_GEODE | m_AMD_MULTIPLE
+  m_K6_GEODE | m_AMD_MULTIPLE,
+
+  /* X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL: Enable splitting 256-bit
+     unaligned load.  It hurts the performance on Bulldozer. We need to
+     re-tune the generic options for current cpus!  */
+  m_COREI7 | m_GENERIC
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -4194,7 +4199,8 @@  ix86_option_override_internal (bool main_args_p)
 	  if (flag_expensive_optimizations
 	      && !(target_flags_explicit & MASK_VZEROUPPER))
 	    target_flags |= MASK_VZEROUPPER;
-	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
+	  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL
+	      && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
 	  if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	    target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8badcbb..b2a1bc8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -312,6 +312,7 @@  enum ix86_tune_indices {
   X86_TUNE_OPT_AGU,
   X86_TUNE_VECTORIZE_DOUBLE,
   X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL,
+  X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL,
 
   X86_TUNE_LAST
 };
@@ -410,6 +411,8 @@  extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 	ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
 #define TARGET_SOFTWARE_PREFETCHING_BENEFICIAL \
 	ix86_tune_features[X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL]
+#define TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL \
+	ix86_tune_features[X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
-- 
1.7.0.4