Patchwork [Darwin,I386] enable mtune=core2 on darwin and make it the default.

login
register
mail settings
Submitter IainS
Date Aug. 13, 2010, 4:39 p.m.
Message ID <627BFF31-95EE-4FAD-9E70-B5EA21BCFB70@sandoe-acoustics.co.uk>
Download mbox | patch
Permalink /patch/61702/
State New
Headers show

Comments

IainS - Aug. 13, 2010, 4:39 p.m.
Hi,

This brings the default arch on darwin to core2 - as per the OSX 4.2.1  
system compiler.
(which makes our codegen much neater and takes a big chunk off the  
size of cc1*).

---

In order to do that it was necessary to get -mtune=core2 to work ..
... this had a couple of residual cases where the use of movq is  
incompatible with the darwin  assembler.

The use of movd for  r<->x   and r<->Yi seems to be consistent with  
the decisions made elsewhere in PRs (i.e. I am not disturbing the  
status quo, merely making it consistent in a couple of missed places.)

I have bootstrapped this on x86_64 (Core 2 Duo), i686-darwin8 (Xeon)  
and i686-darwin8 (Core Duo).
Note that the Darwin8 default for i686 from apple is nocona/gcc-4.0.1,  
but the use of core2 here does not appear to create any problems
(the processor cannot execute m64 code anyway).

OK for trunk and 4.5?
Iain

  	lto_binary_reader=lto-macho
Richard Henderson - Aug. 13, 2010, 5:21 p.m.
On 08/13/2010 09:39 AM, IainS wrote:
> OK for trunk and 4.5?

Ok, with appropriate changelog.



r~
H.J. Lu - Aug. 13, 2010, 5:22 p.m.
On Fri, Aug 13, 2010 at 9:39 AM, IainS <developer@sandoe-acoustics.co.uk> wrote:
> Hi,
>
> This brings the default arch on darwin to core2 - as per the OSX 4.2.1
> system compiler.
> (which makes our codegen much neater and takes a big chunk off the size of
> cc1*).
>
> ---
>
> In order to do that it was necessary to get -mtune=core2 to work ..
> ... this had a couple of residual cases where the use of movq is
> incompatible with the darwin  assembler.

Do you know -mtune=core generates slower code than -mtune=generic?

> The use of movd for  r<->x   and r<->Yi seems to be consistent with the
> decisions made elsewhere in PRs (i.e. I am not disturbing the status quo,
> merely making it consistent in a couple of missed places.)

You should post a separate patch for this.  Also there is no ChangeLog.

> I have bootstrapped this on x86_64 (Core 2 Duo), i686-darwin8 (Xeon) and
> i686-darwin8 (Core Duo).
> Note that the Darwin8 default for i686 from apple is nocona/gcc-4.0.1, but
> the use of core2 here does not appear to create any problems
> (the processor cannot execute m64 code anyway).
>
> OK for trunk and 4.5?
> Iain
IainS - Aug. 13, 2010, 5:36 p.m.
Hi HJ,

On 13 Aug 2010, at 18:22, H.J. Lu wrote:

> On Fri, Aug 13, 2010 at 9:39 AM, IainS <developer@sandoe-acoustics.co.uk 
> > wrote:
>> In order to do that it was necessary to get -mtune=core2 to work ..
>> ... this had a couple of residual cases where the use of movq is
>> incompatible with the darwin  assembler.
>
> Do you know -mtune=core generates slower code than -mtune=generic?

I have not benchmarked - unfortunately, we volunteers do not have  
access to SPEC &c.

... but I observe that the code is circa 7% smaller with -mtune=core2  
c.f. generic.
... and we replace all the _pc_thunk calls with a local call and pop -  
which makes quite an impact on our asm.

If you can provide me with a realistic way to perform some suitable  
benchmarks, I will happily do so.
..  otherwise, for the platform, I'm simply making our default the  
same as the vendor's.

It does not, of course, prevent someone from bootstrapping with --with- 
cpu=generic if they wish to.

thanks
Iain
Jack Howarth - Aug. 13, 2010, 7:25 p.m.
On Fri, Aug 13, 2010 at 06:36:12PM +0100, IainS wrote:
> Hi HJ,
>
> On 13 Aug 2010, at 18:22, H.J. Lu wrote:
>
>> On Fri, Aug 13, 2010 at 9:39 AM, IainS 
>> <developer@sandoe-acoustics.co.uk> wrote:
>>> In order to do that it was necessary to get -mtune=core2 to work ..
>>> ... this had a couple of residual cases where the use of movq is
>>> incompatible with the darwin  assembler.
>>
>> Do you know -mtune=core generates slower code than -mtune=generic?
>
> I have not benchmarked - unfortunately, we volunteers do not have access 
> to SPEC &c.

Iain,
     I can repeat the benchmarks this weekend but the last time I looked
at the performance of the Polyhedron 2005 benchmarks on x86_64-apple-darwin10,
I found -mtune=core2 was slower than -mtune=generic.

http://gcc.gnu.org/ml/gcc-patches/2010-02/msg01272.html

There were some earlier messages suggesting that improved cost models would
be under development this summer.

http://gcc.gnu.org/ml/gcc/2010-05/msg00279.html
http://gcc.gnu.org/ml/gcc/2010-05/msg00370.html
http://gcc.gnu.org/ml/gcc/2010-05/msg00427.html

If these do materialize and make it into gcc 4.6, it would make
sense to revisit this issue.
                 Jack

>
> ... but I observe that the code is circa 7% smaller with -mtune=core2  
> c.f. generic.
> ... and we replace all the _pc_thunk calls with a local call and pop -  
> which makes quite an impact on our asm.
>
> If you can provide me with a realistic way to perform some suitable  
> benchmarks, I will happily do so.
> ..  otherwise, for the platform, I'm simply making our default the same 
> as the vendor's.
>
> It does not, of course, prevent someone from bootstrapping with --with- 
> cpu=generic if they wish to.
>
> thanks
> Iain
IainS - Aug. 13, 2010, 7:37 p.m.
On 13 Aug 2010, at 20:25, Jack Howarth wrote:

> On Fri, Aug 13, 2010 at 06:36:12PM +0100, IainS wrote:
>> Hi HJ,
>>
>> On 13 Aug 2010, at 18:22, H.J. Lu wrote:
>>
>>> On Fri, Aug 13, 2010 at 9:39 AM, IainS
>>> <developer@sandoe-acoustics.co.uk> wrote:
>>>> In order to do that it was necessary to get -mtune=core2 to work ..
>>>> ... this had a couple of residual cases where the use of movq is
>>>> incompatible with the darwin  assembler.
>>>
>>> Do you know -mtune=core generates slower code than -mtune=generic?
>>
>> I have not benchmarked - unfortunately, we volunteers do not have  
>> access
>> to SPEC &c.


>     I can repeat the benchmarks this weekend but the last time I  
> looked
> at the performance of the Polyhedron 2005 benchmarks on x86_64-apple- 
> darwin10,
> I found -mtune=core2 was slower than -mtune=generic.

Do you have any reasonable body of c/c++ benchmarks?
I have the Polyhedron fortran ones.

Odd that the vendor would choose that default if it's genuinely less  
performance...
I guess I could look and see if there are changed tuning params...

... anyway I'm not going to nail colors to the mast over this one -  
it's easily changed.

... remember the old BYTE article "Lies, Damned Lies & Benchmarks" ?

cheers,

Iain

Patch

Index: gcc/config/i386/mmx.md
===================================================================
--- gcc/config/i386/mmx.md	(revision 163221)
+++ gcc/config/i386/mmx.md	(working copy)
@@ -81,8 +81,8 @@ 
      %vpxor\t%0, %d0
      %vmovq\t{%1, %0|%0, %1}
      %vmovq\t{%1, %0|%0, %1}
-    %vmovq\t{%1, %0|%0, %1}
-    %vmovq\t{%1, %0|%0, %1}"
+    %vmovd\t{%1, %0|%0, %1}
+    %vmovd\t{%1, %0|%0, %1}"
    [(set_attr "type"  
"imov 
,imov 
,mmx,mmxmov,mmxmov,ssecvt,ssecvt,sselog1,ssemov,ssemov,ssemov,ssemov")
     (set_attr "unit" "*,*,*,*,*,mmx,mmx,*,*,*,*,*")
     (set_attr "prefix_rep" "*,*,*,*,*,1,1,*,1,*,*,*")
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 163221)
+++ gcc/config/i386/sse.md	(working copy)
@@ -7709,7 +7709,7 @@ 
    "@
     pinsrq\t{$0x1, %2, %0|%0, %2, 0x1}
     movq\t{%1, %0|%0, %1}
-   movq\t{%1, %0|%0, %1}
+   movd\t{%1, %0|%0, %1}
     movq2dq\t{%1, %0|%0, %1}
     punpcklqdq\t{%2, %0|%0, %2}
     movlhps\t{%2, %0|%0, %2}
@@ -7728,7 +7728,7 @@ 
    "TARGET_64BIT && TARGET_SSE"
    "@
     movq\t{%1, %0|%0, %1}
-   movq\t{%1, %0|%0, %1}
+   movd\t{%1, %0|%0, %1}
     movq2dq\t{%1, %0|%0, %1}
     punpcklqdq\t{%2, %0|%0, %2}
     movlhps\t{%2, %0|%0, %2}
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 163221)
+++ gcc/config.gcc	(working copy)
@@ -1127,17 +1127,13 @@  hppa[12]*-*-hpux11*)
  i[34567]86-*-darwin*)
  	need_64bit_hwint=yes
  	need_64bit_isa=yes
-
-	# This is so that '.../configure && make' doesn't fail due to
-	# config.guess deciding that the configuration is i386-*-darwin* and
-	# then this file using that to set --with-cpu=i386 which has no -m64
-	# support.
-	with_cpu=${with_cpu:-generic}
+	# Baseline choice for a machine that allows m64 support.
+	with_cpu=${with_cpu:-core2}
  	tmake_file="${tmake_file} t-slibgcc-darwin i386/t-crtpc i386/t-crtfm"
  	lto_binary_reader=lto-macho
  	;;
  x86_64-*-darwin*)
-	with_cpu=${with_cpu:-generic}
+	with_cpu=${with_cpu:-core2}
  	tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc-darwin  
i386/t-crtpc i386/t-crtfm"
  	tm_file="${tm_file} ${cpu_type}/darwin64.h"