From patchwork Sun Dec 17 09:20:09 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Markus Trippelsdorf <markus@trippelsdorf.de>
X-Patchwork-Id: 849597
Return-Path: 
 <gcc-patches-return-469425-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-469425-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="qD0Vd7Yb"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3yzzFN3v3yz9sBW
	for <incoming@patchwork.ozlabs.org>;
	Sun, 17 Dec 2017 20:21:40 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:mime-version:content-type; q=dns;
	s=default; b=cDHHmovF2h9Vtxf5uTfu6YtKduQUGJPb+Y6Sk+N0um1vOYrAG4
	YzjLZA65gxjti/tVQCcps+q/1dKlHFGgh7GdBCYStGB1yyOBmrjiAmeO/Z0exijz
	U0VGutrKtT7LPliIsQHdv2BDqMM/Y+fNZLdJXs55GLMXTA5jmbat+Urk4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:mime-version:content-type; s=
	default; bh=2HN8nDJsdSQNoeZWlmRxNVSao1w=; b=qD0Vd7Ybl1ZETBjepu2T
	EYZF8jCGB9JCxqDB6ytfT1TBBsi3ticSwZh7ormCXPe1OEpqqyP+Wb8UuMNYxGB7
	YV+HR6ceQgnT6wIfNHZ+Hdv4V+k4LZ+A8AwGIo9s23laJ7Ol40D2w+GuIQsjeZ3g
	YdXFzXAE7Ngq9Dtc4pCPW0w=
Received: (qmail 105851 invoked by alias); 17 Dec 2017 09:21:32 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 9368 invoked by uid 89); 17 Dec 2017 09:20:16 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-25.7 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	RCVD_IN_DNSWL_LOW,
	SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=compensate,
	H*Ad:D*cz, Decrease
X-HELO: mail.ud10.udmedia.de
Received: from ud10.udmedia.de (HELO mail.ud10.udmedia.de) (194.117.254.50)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	ESMTP; Sun, 17 Dec 2017 09:20:13 +0000
Received: (qmail 2783 invoked from network); 17 Dec 2017 10:20:09 +0100
Received: from ip5b40576b.dynamic.kabel-deutschland.de (HELO x4)
	(ud10?360p3@91.64.87.107) by mail.ud10.udmedia.de with
	ESMTPSA (ECDHE-RSA-AES256-SHA encrypted, authenticated);
	17 Dec 2017 10:20:09 +0100
Date: Sun, 17 Dec 2017 10:20:09 +0100
From: Markus Trippelsdorf <markus@trippelsdorf.de>
To: gcc-patches@gcc.gnu.org
Cc: Julia Koval <julia.koval@intel.com>, Uros Bizjak <ubizjak@gmail.com>,
	Jan Hubicka <hubicka@ucw.cz>
Subject: [PATCH][i386] Correct imul (r64) latency for modern Intel CPUs
Message-ID: <20171217092009.GA16559@x4>
MIME-Version: 1.0
Content-Disposition: inline

Since Nehalem the 64bit multiplication latency is three cycles, not
four. So update the costs to reflect reality.

Tested on X86_64.
OK for trunk?

Thanks.

	* x86-tune-costs.h (skylake_cost, core_cost): Decrease r64 multiply
	latencies.

	* gcc.target/i386/wmul-3.c: New test.
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 648219338308..ddb47ba44056 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1538,8 +1538,8 @@ struct processor_costs skylake_cost = {
   {COSTS_N_INSNS (3),			/* cost of starting multiply for QI */
    COSTS_N_INSNS (4),			/*				 HI */
    COSTS_N_INSNS (3),			/*				 SI */
-   COSTS_N_INSNS (4),			/*				 DI */
-   COSTS_N_INSNS (4)},			/*			      other */
+   COSTS_N_INSNS (3),			/*				 DI */
+   COSTS_N_INSNS (3)},			/*			      other */
   0,					/* cost of multiply per each bit set */
   /* Expanding div/mod currently doesn't consider parallelism. So the cost
      model is not realistic. We compensate by increasing the latencies a bit.  */
@@ -2341,8 +2341,8 @@ struct processor_costs core_cost = {
   {COSTS_N_INSNS (3),			/* cost of starting multiply for QI */
    COSTS_N_INSNS (4),			/*				 HI */
    COSTS_N_INSNS (3),			/*				 SI */
-   COSTS_N_INSNS (4),			/*				 DI */
-   COSTS_N_INSNS (4)},			/*			      other */
+   COSTS_N_INSNS (3),			/*				 DI */
+   COSTS_N_INSNS (3)},			/*			      other */
   0,					/* cost of multiply per each bit set */
   /* Expanding div/mod currently doesn't consider parallelism. So the cost
      model is not realistic. We compensate by increasing the latencies a bit.  */
diff --git a/gcc/testsuite/gcc.target/i386/wmul-3.c b/gcc/testsuite/gcc.target/i386/wmul-3.c
new file mode 100644
index 000000000000..66c077c2cc0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/wmul-3.c
@@ -0,0 +1,66 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=haswell" } */
+
+#include <stdint.h>
+#include <string.h>
+
+static const char b100_tab[200] = {
+    '0', '0', '0', '1', '0', '2', '0', '3', '0', '4',
+    '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+    '1', '0', '1', '1', '1', '2', '1', '3', '1', '4',
+    '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+    '2', '0', '2', '1', '2', '2', '2', '3', '2', '4',
+    '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+    '3', '0', '3', '1', '3', '2', '3', '3', '3', '4',
+    '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+    '4', '0', '4', '1', '4', '2', '4', '3', '4', '4',
+    '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+    '5', '0', '5', '1', '5', '2', '5', '3', '5', '4',
+    '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+    '6', '0', '6', '1', '6', '2', '6', '3', '6', '4',
+    '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+    '7', '0', '7', '1', '7', '2', '7', '3', '7', '4',
+    '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+    '8', '0', '8', '1', '8', '2', '8', '3', '8', '4',
+    '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+    '9', '0', '9', '1', '9', '2', '9', '3', '9', '4',
+    '9', '5', '9', '6', '9', '7', '9', '8', '9', '9',
+};
+
+void uint64_to_ascii_ta7_32_base100(uint64_t val, char *dst) {
+  const int64_t POW10_10 = ((int64_t)10) * 1000 * 1000 * 1000;
+  const uint64_t POW2_57_DIV_POW100_4 =
+      ((int64_t)(1) << 57) / 100 / 100 / 100 / 100 + 1;
+  const uint64_t MASK32 = ((int64_t)(1) << 32) - 1;
+  int64_t hix = val / POW10_10;
+  int64_t lox = val % POW10_10;
+  int64_t lor = lox & (uint64_t)(-2);
+  uint64_t hi = hix * POW2_57_DIV_POW100_4;
+  uint64_t lo = lor * POW2_57_DIV_POW100_4;
+  memcpy(dst + 0 * 10 + 0, &b100_tab[(hi >> 57) * 2], 2);
+  memcpy(dst + 1 * 10 + 0, &b100_tab[(lo >> 57) * 2], 2);
+  hi = (hi >> 25) + 1;
+  lo = (lo >> 25) + 1;
+  hi = (hi & MASK32) * 100;
+  lo = (lo & MASK32) * 100;
+  memcpy(dst + 0 * 10 + 2, &b100_tab[(hi >> 32) * 2], 2);
+  hi = (hi & MASK32) * 100;
+  memcpy(dst + 1 * 10 + 2, &b100_tab[(lo >> 32) * 2], 2);
+  lo = (lo & MASK32) * 100;
+  memcpy(dst + 0 * 10 + 4, &b100_tab[(hi >> 32) * 2], 2);
+  hi = (hi & MASK32) * 100;
+  memcpy(dst + 1 * 10 + 4, &b100_tab[(lo >> 32) * 2], 2);
+  lo = (lo & MASK32) * 100;
+  memcpy(dst + 0 * 10 + 6, &b100_tab[(hi >> 32) * 2], 2);
+  hi = (hi & MASK32) * 100;
+  memcpy(dst + 1 * 10 + 6, &b100_tab[(lo >> 32) * 2], 2);
+  lo = (lo & MASK32) * 100;
+  hi >>= 32;
+  lo >>= 32;
+  lo = (lo & (-2)) | (lox & 1);
+  memcpy(dst + 0 * 10 + 8, &b100_tab[hi * 2], 2);
+  memcpy(dst + 1 * 10 + 8, &b100_tab[lo * 2], 2);
+  dst[2 * 10] = 0;
+}
+
+/* { dg-final { scan-assembler-times "imulq" 11 } } */