From patchwork Tue Jun 15 21:34:34 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fang, Changpeng" X-Patchwork-Id: 55800 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id D719FB7D86 for ; Wed, 16 Jun 2010 07:34:52 +1000 (EST) Received: (qmail 450 invoked by alias); 15 Jun 2010 21:34:50 -0000 Received: (qmail 440 invoked by uid 22791); 15 Jun 2010 21:34:50 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from tx2ehsobe004.messaging.microsoft.com (HELO TX2EHSOBE008.bigfish.com) (65.55.88.14) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 15 Jun 2010 21:34:45 +0000 Received: from mail30-tx2-R.bigfish.com (10.9.14.240) by TX2EHSOBE008.bigfish.com (10.9.40.28) with Microsoft SMTP Server id 8.1.340.0; Tue, 15 Jun 2010 21:34:42 +0000 Received: from mail30-tx2 (localhost.localdomain [127.0.0.1]) by mail30-tx2-R.bigfish.com (Postfix) with ESMTP id B1DF312E079A; Tue, 15 Jun 2010 21:34:42 +0000 (UTC) X-SpamScore: -1 X-BigFish: VPS-1(z21aejz4015Lzz1202hzzz32i2a8h34h43h61h) X-Spam-TCS-SCL: 0:0 Received: from mail30-tx2 (localhost.localdomain [127.0.0.1]) by mail30-tx2 (MessageSwitch) id 1276637681182546_15452; Tue, 15 Jun 2010 21:34:41 +0000 (UTC) Received: from TX2EHSMHS014.bigfish.com (unknown [10.9.14.238]) by mail30-tx2.bigfish.com (Postfix) with ESMTP id 1D1B313004F; Tue, 15 Jun 2010 21:34:41 +0000 (UTC) Received: from ausb3extmailp01.amd.com (163.181.251.8) by TX2EHSMHS014.bigfish.com (10.9.99.114) with Microsoft SMTP Server (TLS) id 14.0.482.44; Tue, 15 Jun 2010 21:34:40 +0000 Received: from ausb3twp02.amd.com ([163.181.250.38]) by ausb3extmailp01.amd.com (Switch-3.2.7/Switch-3.2.7) with SMTP id o5FLFsBf030560; Tue, 15 Jun 2010 16:15:57 -0500 X-M-MSG: Received: from sausexhtp02.amd.com (sausexhtp02.amd.com [163.181.3.152]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Tumbleweed MailGate 3.7.2) with ESMTP id 23BB2C8775; Tue, 15 Jun 2010 16:34:33 -0500 (CDT) Received: from SAUSEXMBP01.amd.com ([163.181.3.198]) by sausexhtp02.amd.com ([163.181.3.152]) with mapi; Tue, 15 Jun 2010 16:34:35 -0500 From: "Fang, Changpeng" To: "gcc-patches@gcc.gnu.org" , "rguenther@suse.de" CC: "sebpop@gmail.com" , Zdenek Dvorak , "Fang, Changpeng" Date: Tue, 15 Jun 2010 16:34:34 -0500 Subject: [PATCH] Enabling Software Prefetching by Default at -O3 Message-ID: MIME-Version: 1.0 X-Reverse-DNS: unknown Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, This patch serves as a proposal to turn on software prefetching by default at -O3. Software prefetching pass has been introduced into gcc for a long time. It has been observed that the prefetch pass is quite stable now and can noticeably improve program performance where locality is a concern. As a result, we think it is time to enable it under -O3 so that we can get the benefit out of it. We have collected the data that shows the impact of prefetch on cpu2006 performance under -O3 with gcc 4.6. There is a ~2% improvement on both integer and floating programs and there is no apparent degradation. Programs listed below are those that have at least (+/-)1% variation: 434.zeusmp (+1.77%), 436.cactusADM (+2.4%), 450.soplex (+1.28%), 459.GemsFDTD (+5.48%), 470.lbm (+31.84%), 482.sphnix3 (+1.01%), 458.sjeng (-1.27%), 462.libquantum (+19.23%). The patch passed bootstrapping with -O3 -g. We have observed 2 failues in regression tests: (1) gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer (internal compiler error) This is bug 44503. It is a CFG problem associate with _builtin_prefetch and we have had a fix for it. (2) gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )a," 2 With prefetching turned on, the memory count is different. We can easily fix the test case after this patch is accepted. We realize that prefetch in gcc still has room for improvement. But we think it is good enough now to be turned on. Is it acceptable to commit this patch? Thanks, Changpeng From 50ef9b1dd700ace9854f88814488b7807fcbae1c Mon Sep 17 00:00:00 2001 From: Changpeng Fang Date: Thu, 10 Jun 2010 14:52:15 -0700 Subject: [PATCH 2/2] Enable -fprefetch-loop-arrays under -O3 *opts.c (decode_options): Turn on flag_prefetch_loop_arrays under -O3. *doc/invoke.texi(@item -O3): Say that -O3 includes -fprefetch-loop-arrays. (@item -fprefetch-loop-arrays): Say that -fprefetch-loop-arrays is enabled by default under -O3. --- gcc/doc/invoke.texi | 6 ++++-- gcc/opts.c | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d8c0c22..1c28a91 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -5883,7 +5883,8 @@ invoking @option{-O2} on programs that use computed gotos. Optimize yet more. @option{-O3} turns on all optimizations specified by @option{-O2} and also turns on the @option{-finline-functions}, @option{-funswitch-loops}, @option{-fpredictive-commoning}, -@option{-fgcse-after-reload} and @option{-ftree-vectorize} options. +@option{-fgcse-after-reload}, @option{-ftree-vectorize} and +@option{-fprefetch-loop-arrays} options. @item -O0 @opindex O0 @@ -7037,7 +7038,8 @@ memory to improve the performance of loops that access large arrays. This option may generate better or worse code; results are highly dependent on the structure of loops within the source code. -Disabled at level @option{-Os}. +This flag is enabled by default at @option{-O3} and disabled at +level @option{-Os}. @item -fno-peephole @itemx -fno-peephole2 diff --git a/gcc/opts.c b/gcc/opts.c index 8699ec3..7814341 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -951,6 +951,7 @@ decode_options (unsigned int argc, const char **argv) flag_unswitch_loops = opt3; flag_gcse_after_reload = opt3; flag_tree_vectorize = opt3; + flag_prefetch_loop_arrays = opt3; flag_ipa_cp_clone = opt3; if (flag_ipa_cp_clone) flag_ipa_cp = 1; -- 1.6.3.3