From patchwork Thu Oct 11 07:46:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 982304 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-487307-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="YszqayXg"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42W32H2m6yz9s9N for ; Thu, 11 Oct 2018 18:46:46 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=AWqsOXY749ivuMTqtJn5fZ6Sfdncjv06mom9ou/s8NYnK54EJ1 wM2DjzlxG2Fn3JX5L6LStypk2YziU4/GNOHEMGCTSxtXHzz6FVgppdt4S0/M7vla 34jIQAJWjSM82zlQqsw6aZNPDWX5nvZvyd/aNr1vjWzwbP8pcay6+uFSk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=jnxvIyOEqCw4lZcyOnPcX+89jV4=; b=YszqayXgtp+fz71P5YIX PlusMaUn7LFKCbxQe1oBmQpXubRi4BdaSap4eZc0mpTxF+XDKCsOcQVnAIb5lUd/ iPPkkrA3mRrPjh2Px+YTlKUidzdjOTOJEmxUihSeVac/hj8Y77kwumnOERDV4uUi kvVFHkXLfh9Ss54lXiTLNL0= Received: (qmail 109022 invoked by alias); 11 Oct 2018 07:46:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 108340 invoked by uid 89); 11 Oct 2018 07:46:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.2 spammy=amended X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 11 Oct 2018 07:46:12 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1EF7CAC6C; Thu, 11 Oct 2018 07:46:10 +0000 (UTC) Date: Thu, 11 Oct 2018 09:46:09 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: Jan Hubicka Subject: [PATCH][i386] Fix vec_construct cost, remove unused ix86_vec_cost arg Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 The following fixes vec_construct cost calculation to properly consider that the inserts will happen to SSE regs thus forgo the multiplication done in ix86_vec_cost which is passed the wrong mode. This gets rid of the only call passing false to ix86_vec_cost (so consider the patch amended to remove the arg if approved). Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? I am considering to make the factor we apply in ix86_vec_cost which currently depends on X86_TUNE_AVX128_OPTIMAL and X86_TUNE_SSE_SPLIT_REGS part of the actual cost tables since the reason we apply them are underlying CPU architecture details. Was the original reason of doing the multiplication based on those tunings to be able to "share" the same basic cost table across architectures that differ in this important detail? I see X86_TUNE_SSE_SPLIT_REGS is only used for m_ATHLON_K8 and X86_TUNE_AVX128_OPTIMAL is used for m_BDVER, m_BTVER2 and m_ZNVER1. Those all have (multiple) exclusive processor_cost_table entries. As a first step I'd like to remove the use of ix86_vec_cost for the entries that already have entries for multiple modes (loads and stores) and apply the factor there. For example Zen can do two 128bit loads per cycle but only one 128bit store. With multiplying AVX256 costs by two we seem to cost sth like # instructions to dispatch * instruction latency which is an odd thing. I'd have expected # instructions to dispatch / instruction throughput * instruction latency - so a AVX256 add would cost the same as a AVX128 add, likewise for loads but stores would be more expensive because of the throughput issue. This all ignores resource utilization across multiple insns but that's how the cost model works ... Thanks, Richard. 2018-10-11 Richard Biener * config/i386/i386.c (ix86_vec_cost): Remove !parallel path and argument. (ix86_builtin_vectorization_cost): For vec_construct properly cost insertion into SSE regs. (...): Adjust calls to ix86_vec_cost. Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 265022) +++ gcc/config/i386/i386.c (working copy) @@ -39846,11 +39846,10 @@ ix86_set_reg_reg_cost (machine_mode mode static int ix86_vec_cost (machine_mode mode, int cost, bool parallel) { + gcc_assert (parallel); if (!VECTOR_MODE_P (mode)) return cost; - - if (!parallel) - return cost * GET_MODE_NUNITS (mode); + if (GET_MODE_BITSIZE (mode) == 128 && TARGET_SSE_SPLIT_REGS) return cost * 2; @@ -45190,8 +45189,9 @@ ix86_builtin_vectorization_cost (enum ve case vec_construct: { - /* N element inserts. */ - int cost = ix86_vec_cost (mode, ix86_cost->sse_op, false); + gcc_assert (VECTOR_MODE_P (mode)); + /* N element inserts into SSE vectors. */ + int cost = GET_MODE_NUNITS (mode) * ix86_cost->sse_op; /* One vinserti128 for combining two SSE vectors for AVX256. */ if (GET_MODE_BITSIZE (mode) == 256) cost += ix86_vec_cost (mode, ix86_cost->addss, true);