From patchwork Mon Oct 1 15:50:33 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kumar, Venkataramanan" X-Patchwork-Id: 188320 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 7112A2C0106 for ; Tue, 2 Oct 2012 01:51:04 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1349711466; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Received:Received:Received:Received:Received: From:To:CC:Message-ID:Subject:Date:MIME-Version:Content-Type: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=uvnh78U0Q/4azmyRv1XP XPVDJak=; b=pm+Mml/feb/YBPVwECnAP09E1ZWLyG7GTTcQ6+UIWR3ygBvavvIi XFV7hFZAd/qSwajCVqBnF2/kSQAbUSEeK9TLcVPCzxtDP+V33c3PqtGzjh6EEGa6 HgGoBihcUa2lGlkxpjwvH0bzhOCq4zsgh7FjnTEacq2BsKEvU2YEFqQ= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:X-Forefront-Antispam-Report:X-SpamScore:X-BigFish:Received:Received:Received:X-M-MSG:Received:Received:Received:Received:From:To:CC:Message-ID:Subject:Date:MIME-Version:Content-Type:X-OriginatorOrg:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=AZWM6GkeMnerEEHBCa8xKmoFq7xs+scDAefh9XUAFTtVp+8D154eG94ZWO6W8n 1W/gomsL2SLTUMvP0xv3zs1VwOWxybJ1+eJoFvIg93mZXs8ss+gd/ITwQuW5adc8 t6TpJlYlt+kRuxeSg+L6aKvao9iVtjhJuAdKDsNPkdJZc=; Received: (qmail 32033 invoked by alias); 1 Oct 2012 15:50:53 -0000 Received: (qmail 32015 invoked by uid 22791); 1 Oct 2012 15:50:50 -0000 X-SWARE-Spam-Status: No, hits=-4.2 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_NO, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from va3ehsobe010.messaging.microsoft.com (HELO va3outboundpool.messaging.microsoft.com) (216.32.180.30) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 01 Oct 2012 15:50:42 +0000 Received: from mail8-va3-R.bigfish.com (10.7.14.243) by VA3EHSOBE002.bigfish.com (10.7.40.22) with Microsoft SMTP Server id 14.1.225.23; Mon, 1 Oct 2012 15:50:40 +0000 Received: from mail8-va3 (localhost [127.0.0.1]) by mail8-va3-R.bigfish.com (Postfix) with ESMTP id 61988260060; Mon, 1 Oct 2012 15:50:40 +0000 (UTC) X-Forefront-Antispam-Report: CIP:163.181.249.108; KIP:(null); UIP:(null); IPV:NLI; H:ausb3twp01.amd.com; RD:none; EFVD:NLI X-SpamScore: -5 X-BigFish: VPS-5(zz936eI154dMd6f1izz1202h1d1ah1d2ahzz17326ah8275bhz2dh668h839hd24hf0ah107ah1288h12a5h12a9h12bdh12e5h137ah13b6h1155h) Received: from mail8-va3 (localhost.localdomain [127.0.0.1]) by mail8-va3 (MessageSwitch) id 1349106638907706_30129; Mon, 1 Oct 2012 15:50:38 +0000 (UTC) Received: from VA3EHSMHS036.bigfish.com (unknown [10.7.14.237]) by mail8-va3.bigfish.com (Postfix) with ESMTP id CFF86140160; Mon, 1 Oct 2012 15:50:38 +0000 (UTC) Received: from ausb3twp01.amd.com (163.181.249.108) by VA3EHSMHS036.bigfish.com (10.7.99.46) with Microsoft SMTP Server id 14.1.225.23; Mon, 1 Oct 2012 15:50:35 +0000 X-M-MSG: Received: from sausexedgep02.amd.com (sausexedgep02-ext.amd.com [163.181.249.73]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp01.amd.com (Axway MailGate 3.8.1) with ESMTP id 21FE31028013; Mon, 1 Oct 2012 10:50:34 -0500 (CDT) Received: from sausexhtp02.amd.com (163.181.3.152) by sausexedgep02.amd.com (163.181.36.59) with Microsoft SMTP Server (TLS) id 8.3.192.1; Mon, 1 Oct 2012 10:50:39 -0500 Received: from sausexmb1.amd.com (163.181.3.156) by sausexhtp02.amd.com (163.181.3.152) with Microsoft SMTP Server id 8.3.213.0; Mon, 1 Oct 2012 10:50:33 -0500 Received: from adcelk01.amd.com ([163.181.21.26]) by sausexmb1.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 1 Oct 2012 10:50:33 -0500 From: To: CC: , Message-ID: <20121001155033.14499.64527.sendpatchset@adcelk01.amd.com> Subject: [Patch] Fix PR53397 Date: Mon, 1 Oct 2012 10:50:33 -0500 MIME-Version: 1.0 X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, The below patch fixes the FFT/Scimark regression caused by useless prefetch generation. This fix tries to make prefetch less aggressive by prefetching arrays in the inner loop, when the step is invariant in the entire loop nest. GCC currently tries to prefetch invariant steps when they are in the inner loop. But does not check if the step is variant in outer loops. In the scimark FFT case, the trip count of the inner loop varies by a non constant step, which is invariant in the inner loop. But the step variable is varying in outer loop. This makes inner loop trip count small (at run time varies sometimes as small as 1 iteration) Prefetching ahead x iteration when the inner loop trip count is smaller than x leads to useless prefetches. Flag used: -O3 -march=amdfam10 Before ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 550.50 FFT Mflops: 38.66 (N=1024) SOR Mflops: 617.61 (100 x 100) MonteCarlo: Mflops: 173.74 Sparse matmult Mflops: 675.63 (N=1000, nz=5000) LU Mflops: 1246.88 (M=100, N=100) After ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 639.20 FFT Mflops: 479.19 (N=1024) SOR Mflops: 617.61 (100 x 100) MonteCarlo: Mflops: 173.18 Sparse matmult Mflops: 679.13 (N=1000, nz=5000) LU Mflops: 1246.88 (M=100, N=100) GCC regression "make check -k" passes with x86_64-unknown-linux-gnu New tests that PASS: gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c scan-tree-dump aprefetch "Issued prefetch" gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c scan-tree-dump aprefetch "loop variant step" gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching" gcc.dg/pr53397-2.c (test for excess errors) Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted. Ok to commit in trunk? regards, Venkat gcc/ChangeLog +2012-10-01 Venkataramanan Kumar + + * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$ + Perform non constant step prefetching in inner loop, only $ + when it is invariant in the entire loop nest. $ + * testsuite/gcc.dg/pr53397-1.c: New test case $ + Checks we are prefecthing for loop invariant steps$ + * testsuite/gcc.dg/pr53397-2.c: New test case$ + Checks we are not prefecthing for loop variant steps + Index: gcc/testsuite/gcc.dg/pr53397-1.c =================================================================== --- gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) +++ gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) @@ -0,0 +1,28 @@ +/* Prefetching when the step is loop invariant. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } */ + + +double data[16384]; +void prefetch_when_non_constant_step_is_invariant(int step, int n) +{ + int a; + int b; + for (a = 1; a < step; a++) { + for (b = 0; b < n; b += 2 * step) { + + int i = 2*(b + a); + int j = 2*(b + a + step); + + + data[j] = data[i]; + data[j+1] = data[i+1]; + } + } +} + +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */ +/* { dg-final { scan-assembler "prefetcht0" } } */ + +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ Index: gcc/testsuite/gcc.dg/pr53397-2.c =================================================================== --- gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) +++ gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) @@ -0,0 +1,29 @@ +/* Not prefetching when the step is loop variant. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } */ + + +double data[16384]; +void donot_prefetch_when_non_constant_step_is_variant(int step, int n) +{ + int a; + int b; + for (a = 1; a < step; a++,step*=2) { + for (b = 0; b < n; b += 2 * step) { + + int i = 2*(b + a); + int j = 2*(b + a + step); + + + data[j] = data[i]; + data[j+1] = data[i+1]; + } + } +} + +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */ +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } */ + +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ + Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c (revision 191642) +++ gcc/tree-ssa-loop-prefetch.c (working copy) @@ -523,6 +523,7 @@ tree base, step; HOST_WIDE_INT delta; struct mem_ref_group *agrp; + loop_p ploop; if (get_base_address (ref) == NULL) return false; @@ -537,10 +538,50 @@ if (may_be_nonaddressable_p (base)) return false; - /* Limit non-constant step prefetching only to the innermost loops. */ - if (!cst_and_fits_in_hwi (step) && loop->inner != NULL) - return false; + /* Limit non-constant step prefetching only to the innermost loops and + only when the step is invariant in the entire loop nest. */ + if (!cst_and_fits_in_hwi (step)) + { + if( loop->inner != NULL) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Reference %p:\n", (void *) ref); + fprintf (dump_file, "(base " ); + print_generic_expr (dump_file, base, TDF_SLIM); + fprintf (dump_file, ", step "); + print_generic_expr (dump_file, step, TDF_TREE); + fprintf (dump_file, ")\n"); + fprintf (dump_file, "Ignoring %p, non-constant step prefetching\ + is limited to inner most loops \n",(void *) ref); + } + return false; + } + else + { + ploop = loop; + while (loop_outer (ploop)) + { + if (!expr_invariant_in_loop_p (ploop , step)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Reference %p:\n", (void *) ref); + fprintf (dump_file, "(base " ); + print_generic_expr (dump_file, base, TDF_SLIM); + fprintf (dump_file, ", step "); + print_generic_expr (dump_file, step, TDF_TREE); + fprintf (dump_file, ")\n"); + fprintf (dump_file, "Not prefetching, ignoring %p due to loop variant step\n",(void *) ref); + } + return false; + } + ploop = loop_outer (ploop); + } + } + } + /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP are integer constants. */ agrp = find_or_create_group (refs, base, step);