From patchwork Fri Nov 11 00:21:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 693516 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tFLFP46Tsz9t1B for ; Fri, 11 Nov 2016 11:21:45 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="KFuinEBJ"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; q=dns; s= default; b=Dg8PzKbYG31+zWhZlOMqM9hZUUwENQBbDxF2q/pu/g9g0VIXuhX+l 3X6+prunu2n3nR4MGprkAQ7PDxpZFCWLRO5XloZU1ou7W4LmcGZx7+JR4V9Gwwnr PL2clZlZp9SDGnJVKVw9gvAzpNatgji0KNPelueYYvWPgNExe4frkg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; s= default; bh=sftwdJXOWlxmVdIJgO6PWYemUZ4=; b=KFuinEBJWumzw3cPL6+M Obvb5qGer3ZzJVLVm5WztAmqFm0M5VHIH/LYz7ftR42wrHRSCQVCpMU4NJUiY++S TTX7YpTUUXy+RGAtuomcO6ZspGAXTm5x1V81oKBzI0Vzstxk5fgbj+fEk3kR/o2h 6hkK4igxJP5oWMURmnphMAU= Received: (qmail 76577 invoked by alias); 11 Nov 2016 00:21:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 76557 invoked by uid 89); 11 Nov 2016 00:21:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=king, const0_rtx, v4simode, sk:get_mod X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 Nov 2016 00:21:26 +0000 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uAB0IaZD100167 for ; Thu, 10 Nov 2016 19:21:24 -0500 Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx0b-001b2d01.pphosted.com with ESMTP id 26mxqp71nb-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 10 Nov 2016 19:21:24 -0500 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 10 Nov 2016 17:21:23 -0700 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 10 Nov 2016 17:21:20 -0700 Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 626623E40044; Thu, 10 Nov 2016 17:21:17 -0700 (MST) Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uAB0LHMq15270386; Thu, 10 Nov 2016 17:21:17 -0700 Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1F0BF78038; Thu, 10 Nov 2016 17:21:17 -0700 (MST) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP id E968078041; Thu, 10 Nov 2016 17:21:16 -0700 (MST) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 402EE46D6F; Thu, 10 Nov 2016 19:21:16 -0500 (EST) Date: Thu, 10 Nov 2016 19:21:15 -0500 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Aaron Sawdey Subject: [PATCH] Fix PR 78243 on PowerPC Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Aaron Sawdey MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111100-0028-0000-0000-00000603A434 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006054; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00779299; UDB=6.00375448; IPR=6.00556642; BA=6.00004868; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013289; XFM=3.00000011; UTC=2016-11-11 00:21:22 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111100-0029-0000-0000-000030C17AB3 Message-Id: <20161111002115.GA30042@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-11-10_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611110004 X-IsSubscribed: yes Aaron Sawdey has been running the GCC testsuite on the power9 simulator and he noticed that: gcc.c-torture/execute/pr68532.c does not run, and opened bug 78243 for this failure. Now, if you compile pr68532 with normal options (-O2/-O3 and -mcpu=power8 or -mcpu=power9) it works because with the normal powerpc vectorization costs, the vectorize only generates scalar code. However, the options in the PR explicitly turns on -fno-vect-cost-model, which forces the loop to be be vectorized. In doing so, it generates pretty bad code. The vectorizer generates a vector add loop, and at the end it does a vector reduction to get the total added. When -mcpu=power9 is used it generates a VEXTRACTUH instruction to extract the HImode from the V8HImode vector. Unfortunately, on little endian (with little endian element ordering), it gets the wrong element. This patch fixes that problem. I did a bootstrap and regression test, and there were no regressions. I ran the test on the power9 simulator for both big endian and little endian options and it passed. I also ran the following executable tests from the testsuite which exercise vector init, set, and extract for each of the basic types: vec-init-1.c element type: int vec-init-2.c element type: long vec-init-4.c element type: short vec-init-5.c element type: signed char vec-init-8.c element type: float vec-init-9.c element type: double All tests passed on both little endian and big endian simulator runs. Can I check this patch into the trunk? 2016-11-10 Michael Meissner PR target/78243 * config/rs6000/vsx.md (vsx_extract__p9): Correct the element order for little endian ordering. * config/rs6000/altivec.md (reduc_plus_scal_): Use VECTOR_ELT_ORDER_BIG and not BYTES_BIG_ENDIAN to adjust element number. Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 242048) +++ gcc/config/rs6000/vsx.md (revision 242049) @@ -2542,10 +2542,13 @@ (define_insn "vsx_extract__p9" "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB && TARGET_VSX_SMALL_INTEGER" { - /* Note, the element number has already been adjusted for endianness, so we - don't have to adjust it here. */ - int unit_size = GET_MODE_UNIT_SIZE (mode); - HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]); + HOST_WIDE_INT elt = INTVAL (operands[2]); + HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG) + ? (GET_MODE_NUNITS (mode) - 1 - elt) + : elt); + + HOST_WIDE_INT unit_size = GET_MODE_UNIT_SIZE (mode); + HOST_WIDE_INT offset = unit_size * elt_adj; operands[2] = GEN_INT (offset); if (unit_size == 4) Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 242048) +++ gcc/config/rs6000/altivec.md (revision 242049) @@ -2785,7 +2785,7 @@ (define_expand "reduc_plus_scal_" rtx vtmp1 = gen_reg_rtx (V4SImode); rtx vtmp2 = gen_reg_rtx (mode); rtx dest = gen_lowpart (V4SImode, vtmp2); - int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 : 0; + int elt = VECTOR_ELT_ORDER_BIG ? GET_MODE_NUNITS (mode) - 1 : 0; emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); emit_insn (gen_altivec_vsum4ss (vtmp1, operands[1], vzero));