Patchwork PATCH: PR target/45483: probably wrong optimization options chosen by "-march=native"

login
register
mail settings
Submitter H.J. Lu
Date Sept. 6, 2010, 3:06 p.m.
Message ID <20100906150656.GA11565@intel.com>
Download mbox | patch
Permalink /patch/63929/
State New
Headers show

Comments

H.J. Lu - Sept. 6, 2010, 3:06 p.m.
We may guess best options for -march=native.  We may leave out SSE3 or
SSSE3.  This patch adds it.  OK for trunk and 4.5?

Thanks.


H.J.
----
2010-09-06  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/45483
	* config/i386/driver-i386.c (host_detect_local_cpu): Add -mssse3
	or -msse3 if needed.
Uros Bizjak - Sept. 7, 2010, 6:06 a.m.
On Mon, Sep 6, 2010 at 5:06 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:

> We may guess best options for -march=native.  We may leave out SSE3 or
> SSSE3.  This patch adds it.  OK for trunk and 4.5?

Why? -march=native should enable all features, otherwise -mtune should be used.

Uros.
H.J. Lu - Sept. 7, 2010, 12:57 p.m.
On Mon, Sep 6, 2010 at 11:06 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Sep 6, 2010 at 5:06 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>
>> We may guess best options for -march=native.  We may leave out SSE3 or
>> SSSE3.  This patch adds it.  OK for trunk and 4.5?
>
> Why? -march=native should enable all features, otherwise -mtune should be used.
>

When we guess, we do

              if (has_ssse3)
                /* If it is an unknown CPU with SSSE3, assume Core 2.  */
                cpu = "core2";
              else if (has_sse3)
                /* It is Core Duo.  */
                cpu = "pentium-m";

Core Duo has SSE3, but Pentium M only enables SSE2. When you use
-march=native on Core Duo, SSE3 isn't enabled. My patch fixes it.

BTW, Core 2 enables SSSE3.

Patch

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 8a76857..d132582 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -549,6 +549,9 @@  const char *host_detect_local_cpu (int argc, const char **argv)
 	case 0x26:
 	  /* Atom.  */
 	  cpu = "atom";
+	  /* No need to add -mssse3, -msse3.  */
+	  has_sse3 = 0;
+	  has_ssse3 = 0;
 	  break;
 	case 0x1a:
 	case 0x1e:
@@ -556,20 +559,34 @@  const char *host_detect_local_cpu (int argc, const char **argv)
 	case 0x2e:
 	  /* FIXME: Optimize for Nehalem.  */
 	  cpu = "core2";
+	  /* No need to add -mssse3, -msse3.
+	     FIXME: No need for -msse4.2.  */
+	  has_sse3 = 0;
+	  has_ssse3 = 0;
 	  break;
 	case 0x25:
 	case 0x2f:
 	  /* FIXME: Optimize for Westmere.  */
 	  cpu = "core2";
+	  /* No need to add -mssse3, -msse3.
+	     FIXME: No need for -msse4.2.  */
+	  has_sse3 = 0;
+	  has_ssse3 = 0;
 	  break;
 	case 0x17:
 	case 0x1d:
 	  /* Penryn.  FIXME: -mtune=core2 is slower than -mtune=generic  */
 	  cpu = "core2";
+	  /* No need to add -mssse3, -msse3.  */
+	  has_sse3 = 0;
+	  has_ssse3 = 0;
 	  break;
 	case 0x0f:
 	  /* Merom.  FIXME: -mtune=core2 is slower than -mtune=generic  */
 	  cpu = "core2";
+	  /* No need to add -mssse3, -msse3.  */
+	  has_sse3 = 0;
+	  has_ssse3 = 0;
 	  break;
 	default:
 	  if (arch)
@@ -606,6 +623,8 @@  const char *host_detect_local_cpu (int argc, const char **argv)
 	    cpu = "nocona";
 	  else
 	    cpu = "prescott";
+	  /* No need to add -msse3.  */
+	  has_sse3 = 0;
 	}
       else
 	cpu = "pentium4";
@@ -693,6 +712,10 @@  const char *host_detect_local_cpu (int argc, const char **argv)
 	options = concat (options, " -msse4.2", NULL);
       else if (has_sse4_1)
 	options = concat (options, " -msse4.1", NULL);
+      else if (has_ssse3)
+	options = concat (options, " -mssse3", NULL);
+      else if (has_sse3)
+	options = concat (options, " -msse3", NULL);
     }
 
 done: