From patchwork Mon Sep 30 11:17:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 279115 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 3F4DF2C00A0 for ; Mon, 30 Sep 2013 21:17:25 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=kwaRCuZOrmYjNm6k+3tiKi9p2g/4M9LXQvlJlAqgMcOr8YmTEjd76 6zg8O/lAbJpsC8hU3hI5L5hQmlZffH4tPX/9Dj76csXSUkwhD13RN3sApZ/Ye88w JSxbbQYCLiS8QUtPB3+3mTLLF+m8fvtHb/7BXY4eOIpm1nLND1fkeU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=NhPsoJeD3dLf/1RpAEmvDWFPAv4=; b=lHR+n+w2ju+9HB0lRbTt jW2feptSHDu0n4x1MAhJTvvhIfluOaTQsa0bFia8qRzxcTRPWG+H+dBjnFY7kN8z AtzJcpJ5/6a1nY1U9OE/iSAURS3jajQ6jp3qSgcprnJn7BQjr91J8JxG9kxXRDcc IyPDil8zZCDkSSmEeOZv1qA= Received: (qmail 665 invoked by alias); 30 Sep 2013 11:17:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 652 invoked by uid 89); 30 Sep 2013 11:17:17 -0000 Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 30 Sep 2013 11:17:17 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00, NO_RELAYS autolearn=ham version=3.3.2 X-HELO: nikam.ms.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 01E365410AE; Mon, 30 Sep 2013 13:17:12 +0200 (CEST) Date: Mon, 30 Sep 2013 13:17:12 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, Ganesh.Gopalasubramanian@amd.com, hjl.tools@gmail.com Subject: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips Message-ID: <20130930111712.GA25208@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Hi, while looking into schedules produced for Buldozer and Core I noticed that they do not seem to match reality. This is because ix86_issue_rate limits those CPUs into 3 instructions per cycle, while they are designed to do 4 and somewhat confused ix86_adjust_cost. I also added stack engine into modern chips even though scheduler doesn't really understand that multiple push operations can happen in one cycle. At least it gets the stack updates in sequences of push/pop operations. I did not updated buldozer issue rates yet. The current scheduler model won't allow it to execute more than 3 instructions per cycle (and 2 for version 3). I think bdver1.md/bdver3.md needs to be updated first. I am testing x86_64-linux and will commit if there are no complains. Honza * i386.c (ix86_issue_rate): Pentium4/Nocona issue 2 instructions per cycle, Core/CoreI7/Haswell 4 instructions per cycle. (ix86_adjust_cost): Add stack engine to modern AMD chips; fix for core; remove Atom that mistakely shared code with AMD. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 203011) +++ config/i386/i386.c (working copy) @@ -24435,17 +24435,14 @@ ix86_issue_rate (void) case PROCESSOR_SLM: case PROCESSOR_K6: case PROCESSOR_BTVER2: + case PROCESSOR_PENTIUM4: + case PROCESSOR_NOCONA: return 2; case PROCESSOR_PENTIUMPRO: - case PROCESSOR_PENTIUM4: - case PROCESSOR_CORE2: - case PROCESSOR_COREI7: - case PROCESSOR_HASWELL: case PROCESSOR_ATHLON: case PROCESSOR_K8: case PROCESSOR_AMDFAM10: - case PROCESSOR_NOCONA: case PROCESSOR_GENERIC: case PROCESSOR_BDVER1: case PROCESSOR_BDVER2: @@ -24453,6 +24450,11 @@ ix86_issue_rate (void) case PROCESSOR_BTVER1: return 3; + case PROCESSOR_CORE2: + case PROCESSOR_COREI7: + case PROCESSOR_HASWELL: + return 4; + default: return 1; } @@ -24709,10 +24711,15 @@ ix86_adjust_cost (rtx insn, rtx link, rt case PROCESSOR_BDVER3: case PROCESSOR_BTVER1: case PROCESSOR_BTVER2: - case PROCESSOR_ATOM: case PROCESSOR_GENERIC: memory = get_attr_memory (insn); + /* Stack engine allows to execute push&pop instructions in parall. */ + if (((insn_type == TYPE_PUSH || insn_type == TYPE_POP) + && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP)) + && (ix86_tune != PROCESSOR_ATHLON && ix86_tune != PROCESSOR_K8)) + return 0; + /* Show ability of reorder buffer to hide latency of load by executing in parallel with previous instruction in case previous instruction is not needed to compute the address. */ @@ -24737,6 +24744,29 @@ ix86_adjust_cost (rtx insn, rtx link, rt else cost = 0; } + break; + + case PROCESSOR_CORE2: + case PROCESSOR_COREI7: + case PROCESSOR_HASWELL: + memory = get_attr_memory (insn); + + /* Stack engine allows to execute push&pop instructions in parall. */ + if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP) + && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP)) + return 0; + + /* Show ability of reorder buffer to hide latency of load by executing + in parallel with previous instruction in case + previous instruction is not needed to compute the address. */ + if ((memory == MEMORY_LOAD || memory == MEMORY_BOTH) + && !ix86_agi_dependent (dep_insn, insn)) + { + if (cost >= 4) + cost -= 4; + else + cost = 0; + } break; case PROCESSOR_SLM: