From patchwork Thu Oct 19 15:22:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 828160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-464551-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="sHR2kipM"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yHt3B6rH5z9t6K for ; Fri, 20 Oct 2017 02:22:46 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=ZcPVY0kaJfri+pMeTKObTJbT1lFhCbZTQW0Hm8351UiiVGMaZDh+B 7FDt4YV94N+VSErKGNzpVCt4tR2iyTPlP8QmWI24s42NtjkLm/1IULl+m9Pt2z2Q yN2iZEc7k4qhupmh1ZdO99kwnOmS2Ep4A+Nw1M+6Obrd+fDCrrTylk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=0M34JwWpqunGC+L7usWewm9+nWk=; b=sHR2kipM8veFvUOiyha+ nszVCTkqke0PcEe1utL8ZxxrZ3KBv5hJbzFr8FQQ9fhYVvUGihjjmCbu4a7GmFsN buKxGo+rsB0F+RDDweCWlJBaprKjfh6k2aRl7KMiiF8clbqnsiZs8Cc5d6XP2LjE tFLHWJ462RsZbw+dLNXncUY= Received: (qmail 98581 invoked by alias); 19 Oct 2017 15:22:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 98572 invoked by uid 89); 19 Oct 2017 15:22:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=pentium4, periodic, our, tomorrow X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 19 Oct 2017 15:22:35 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 6A9095491C8; Thu, 19 Oct 2017 17:22:32 +0200 (CEST) Date: Thu, 19 Oct 2017 17:22:32 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: Correct cost of SSE and x87 instructions for generic and core Message-ID: <20171019152231.GB81559@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Hi, core and generic costs of x87 and SSE instructions seems to follow pentium4 settings which is not very realistic. This patch sets them according to latencies. I have tested this on haswell as part of the vectorizer cost metric patch (where we want to have sane values to get sane decisions) and it did not cause any regressions on spec2k/2k6 and C++ benchmarks. I am not 100% sure what of the improvements seen can be attributed to the cost change alone and what to the vectorizer metric but we will see tomorrow from our periodic testers. Note that atom and other cost tables seems off too. I will send separate patch for this but I have no way to benchmark it. Bootstrapped/regtested x86_linux, will commit it shortly. Honza * x86-tune-costs.h (generic_cost, core_cost): Correct costs of x87 and SSE instructions. Index: config/i386/x86-tune-costs.h =================================================================== --- config/i386/x86-tune-costs.h (revision 253824) +++ config/i386/x86-tune-costs.h (working copy) @@ -2196,8 +2196,7 @@ static stringop_algs generic_memset[2] = static const struct processor_costs generic_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ - /* On all chips taken into consideration lea is 2 cycles and more. With - this cost however our current implementation of synth_mult results in + /* Setting cost to 2 makes our current implementation of synth_mult result in use of unnecessary temporary registers causing regression on several SPECfp benchmarks. */ COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ @@ -2246,23 +2245,23 @@ struct processor_costs generic_cost = { /* Benchmarks shows large regressions on K8 sixtrack benchmark when this value is increased to perhaps more appropriate value of 5. */ 3, /* Branch cost */ - COSTS_N_INSNS (8), /* cost of FADD and FSUB insns. */ - COSTS_N_INSNS (8), /* cost of FMUL instruction. */ + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */ + COSTS_N_INSNS (3), /* cost of FMUL instruction. */ COSTS_N_INSNS (20), /* cost of FDIV instruction. */ - COSTS_N_INSNS (8), /* cost of FABS instruction. */ - COSTS_N_INSNS (8), /* cost of FCHS instruction. */ + COSTS_N_INSNS (1), /* cost of FABS instruction. */ + COSTS_N_INSNS (1), /* cost of FCHS instruction. */ COSTS_N_INSNS (40), /* cost of FSQRT instruction. */ - COSTS_N_INSNS (8), /* cost of cheap SSE instruction. */ - COSTS_N_INSNS (8), /* cost of ADDSS/SD SUBSS/SD insns. */ - COSTS_N_INSNS (8), /* cost of MULSS instruction. */ - COSTS_N_INSNS (8), /* cost of MULSD instruction. */ - COSTS_N_INSNS (8), /* cost of FMA SS instruction. */ - COSTS_N_INSNS (8), /* cost of FMA SD instruction. */ - COSTS_N_INSNS (20), /* cost of DIVSS instruction. */ - COSTS_N_INSNS (20), /* cost of DIVSD instruction. */ - COSTS_N_INSNS (40), /* cost of SQRTSS instruction. */ - COSTS_N_INSNS (40), /* cost of SQRTSD instruction. */ + COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */ + COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */ + COSTS_N_INSNS (4), /* cost of MULSS instruction. */ + COSTS_N_INSNS (5), /* cost of MULSD instruction. */ + COSTS_N_INSNS (5), /* cost of FMA SS instruction. */ + COSTS_N_INSNS (5), /* cost of FMA SD instruction. */ + COSTS_N_INSNS (18), /* cost of DIVSS instruction. */ + COSTS_N_INSNS (32), /* cost of DIVSD instruction. */ + COSTS_N_INSNS (30), /* cost of SQRTSS instruction. */ + COSTS_N_INSNS (58), /* cost of SQRTSD instruction. */ 1, 2, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ generic_memcpy, generic_memset, @@ -2344,12 +2343,12 @@ struct processor_costs core_cost = { 6, /* number of parallel prefetches */ /* FIXME perhaps more appropriate value is 5. */ 3, /* Branch cost */ - COSTS_N_INSNS (8), /* cost of FADD and FSUB insns. */ - COSTS_N_INSNS (8), /* cost of FMUL instruction. */ - COSTS_N_INSNS (20), /* cost of FDIV instruction. */ - COSTS_N_INSNS (8), /* cost of FABS instruction. */ - COSTS_N_INSNS (8), /* cost of FCHS instruction. */ - COSTS_N_INSNS (40), /* cost of FSQRT instruction. */ + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */ + COSTS_N_INSNS (5), /* cost of FMUL instruction. */ + COSTS_N_INSNS (24), /* cost of FDIV instruction. */ + COSTS_N_INSNS (1), /* cost of FABS instruction. */ + COSTS_N_INSNS (1), /* cost of FCHS instruction. */ + COSTS_N_INSNS (24), /* cost of FSQRT instruction. */ COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */ COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */