From patchwork Thu Nov 2 13:56:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 833359 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-465749-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="dUCuWIQT"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ySRV74tvxz9t2M for ; Fri, 3 Nov 2017 00:57:19 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=vU2UJVc89vj7TeEWkwAwIzqS2VQ3D4WdQ7mgrMgdGg6ZAcOK1loXt VHkjOAfuuFT7G7Fj9fEN4yNjZhS/s9mg6+K4VVDaop2P8lsZ+Z5+PnRvCamUxGGq GN57gcwCdUviDVgg6+3ma52iZhZJlViENm8xDh1+ti2ljIgZufEvmg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=QCYjQSUWX99S+IG+DJz3fTOCi6I=; b=dUCuWIQTeiz35lQiUos+ crC4igTJg4suGkLHbYkab3+HIykMP5rOYSXZf3elnxxgTOT7XKa9Yyv/VVVv5WjI h9gjGkqQOf9c8sYjWKzNwTQ802tE+GpaYgoG6JImi08cscrFYC6l1QoeX7YL9UUy G7RiLphtnVlPyH/+DnwRoRc= Received: (qmail 77832 invoked by alias); 2 Nov 2017 13:57:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 76553 invoked by uid 89); 2 Nov 2017 13:57:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Ivy, Sandy, sandy, measurable X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 02 Nov 2017 13:56:59 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 1A1925491A1; Thu, 2 Nov 2017 14:56:57 +0100 (CET) Date: Thu, 2 Nov 2017 14:56:57 +0100 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: Enable inc/dec generation on Haswell+ Message-ID: <20171102135656.GE91830@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Hi, core2 used to have quite large penalty for partial flag registers store done by INCDEC. This was improved on Sandybridge where extra merging uop is produced and more at Haswell where there is no extra uop unless there is instruction accessing both. For this reason we can use inc/dec again on modern variants of core. Bootstrapped/regtested x86_64-linux and tested on Haswell spec2k/spec2k6 with no measurable performance impact. Honza * x86-tune.def (X86_TUNE_USE_INCDEC): Enable for Haswell+. Index: config/i386/x86-tune.def =================================================================== --- config/i386/x86-tune.def (revision 254199) +++ config/i386/x86-tune.def (working copy) @@ -220,10 +220,15 @@ DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall as "add mem, reg". */ DEF_TUNE (X86_TUNE_READ_MODIFY, "read_modify", ~(m_PENT | m_LAKEMONT | m_PPRO)) -/* X86_TUNE_USE_INCDEC: Enable use of inc/dec instructions. */ +/* X86_TUNE_USE_INCDEC: Enable use of inc/dec instructions. + + Core2 and nehalem has stall of 7 cycles for partial flag register stalls. + Sandy bridge and Ivy bridge generate extra uop. On Haswell this extra uop + is output only when the values needs to be really merged, which is not + done by GCC generated code. */ DEF_TUNE (X86_TUNE_USE_INCDEC, "use_incdec", - ~(m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL - | m_KNL | m_KNM | m_GENERIC)) + ~(m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE + | m_BONNELL | m_SILVERMONT | m_INTEL | m_KNL | m_KNM | m_GENERIC)) /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred for DFmode copies */