From patchwork Fri Jul 15 13:55:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 648823 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rrYxL603jz9s5Q for ; Fri, 15 Jul 2016 23:55:34 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Gav12fRV; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=SHBsV n6j/3JrGuT4B9CykP952j++kA3MQPJ+nMk2zUbx5XUhEvmxRwctzV+ZOBaXd7zUn nYBdoYdW7lNd9mccqqcCGuVdhjrDNdNvF29pmjzDnQD4ftrwffuubbCngPPD+bR/ 5yXuDJymMbAXbKB1Bi/A/fbErv4pEbRrIGy2hg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=4WaHzpaOC9L 6Sb7PxvdIoQxvs2o=; b=Gav12fRVvDPrO55sj0YKqoQjSq6oy4tJibOX92r1EDl TaVR37Zvbm8pBUFumGAkNqyGDiYddZpMbsoIqVdkDzFVytU/ayywE8zJjoyHNHIe ZWlD1/6uK4ZhVl9WMevrPQBJZmj9h94AqjBqAhwUFmwN2R94dTXv5s7Y4zkfk3QE = Received: (qmail 98166 invoked by alias); 15 Jul 2016 13:55:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 98149 invoked by uid 89); 15 Jul 2016 13:55:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=someday, H*UA:Mac, H*u:10.11, H*UA:10.11 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 15 Jul 2016 13:55:15 +0000 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u6FDsEfP108920 for ; Fri, 15 Jul 2016 09:55:12 -0400 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0b-001b2d01.pphosted.com with ESMTP id 246hembgra-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 15 Jul 2016 09:55:12 -0400 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jul 2016 09:55:12 -0400 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 15 Jul 2016 09:55:09 -0400 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: wschmidt@linux.vnet.ibm.com Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 66EC8C9004A; Fri, 15 Jul 2016 09:54:59 -0400 (EDT) Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u6FDtBnO56557750; Fri, 15 Jul 2016 13:55:13 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 40E47112081; Fri, 15 Jul 2016 09:55:09 -0400 (EDT) Received: from BigMac.local (unknown [9.80.216.234]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id 05C79112092; Fri, 15 Jul 2016 09:55:08 -0400 (EDT) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , richard.guenther@gmail.com From: Bill Schmidt Subject: [PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate Date: Fri, 15 Jul 2016 08:55:08 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16071513-0044-0000-0000-000000A98760 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16071513-0045-0000-0000-000004BFA927 Message-Id: <1b21afb4-a971-a95d-1084-53948c9c7f4c@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-07-15_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607150147 X-IsSubscribed: yes Hi, This patch is a follow-up to Richard's patch of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html. The cost of a vec_construct (initialization of an N-way vector by N scalars) is too low, which can cause too-aggressive vectorization in particular for N=8 or higher. Richard changed the default cost to N-1, which is generally sensible. For powerpc I am going with a slightly higher cost of N, which will keep us from being less conservative than the previous values when N=2. The whole cost model for powerpc needs more work (in particular we need to distinguish among processor models), but that's beyond the scope of this patch. One thing that I've called out in the comments is that a vec_construct can have wildly different costs depending on the scalar elements. If they are all the same small constant, then we only need a single splat-immediate instruction; but for V4SF the cost is potentially higher because of the need to do converts. For the splat case, we might want to teach the vectorizer in general to estimate the cost as just a vector_stmt rather than a vec_construct, but that requires some target knowledge of which constants can be duplicated with a splat-immediate. In any case, the purpose of this patch is simply to avoid vectorizing things we shouldn't when we've undercounted the cost of a vec_construct. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions (hence the vectorization decisions in the test suite have not changed). Is this ok for trunk? Thanks, Bill 2016-07-15 Bill Schmidt * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Improve vec_construct estimate. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 238312) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ tree vectype, int misalign) { unsigned elements; - tree elem_type; switch (type_of_cost) { @@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ return 2; case vec_construct: - elements = TYPE_VECTOR_SUBPARTS (vectype); - elem_type = TREE_TYPE (vectype); - /* 32-bit vectors loaded into registers are stored as double - precision, so we need n/2 converts in addition to the usual - n/2 merges to construct a vector of short floats from them. */ - if (SCALAR_FLOAT_TYPE_P (elem_type) - && TYPE_PRECISION (elem_type) == 32) - return elements + 1; - else - return elements / 2 + 1; + /* This is a rough approximation assuming non-constant elements + constructed into a vector via element insertion. FIXME: + vec_construct is not granular enough for uniformly good + decisions. If the initialization is a splat, this is + cheaper than we estimate. If we want to form four SF + values into a vector, it's more expensive (we need to + copy the four elements into two vector registers, + perform two conversions to single precision, and merge + the two result vectors). Improve this someday. */ + return TYPE_VECTOR_SUBPARTS (vectype); default: gcc_unreachable ();