From patchwork Mon Jun 25 15:41:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 934412 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-480436-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="rvj13KJE"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41DthD47HZz9s01 for ; Tue, 26 Jun 2018 01:41:47 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:date:content-type:mime-version:message-id; q= dns; s=default; b=rd7cFeUm21iS03OT3wPbTrlq4Fr6JJGBvlMUqIA+46y72p SexqPWj+8u0dzvSZtPV3PmSbVggwILQ2zuPtY9Rk/TPrB+wV+t3q10u3o2dGe3qm wJdR0KNTcB8JIYjw2Ox4Lc2l7WQZ8kHiWR6Zv2Jp1eoGS+/ztG1rA1lJ/oc+c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:date:content-type:mime-version:message-id; s= default; bh=Fn7u2yrE83D4gAg8N5lp4PTjx7k=; b=rvj13KJE0MJFNby2soOn 3jjhfv16VkepLLkB+RL9GJTF8v1aOtlSpFK1YSa0SRkFTbxlRknrhh4e0ne59PBo GEsPGUWg2dJmzX68sUeauedqHzQQpCXKamxUh58nmi43bSRLmafYNgq6QfU003Jt CBYDhmM7lyjutKdDrcmvdHQ= Received: (qmail 82505 invoked by alias); 25 Jun 2018 15:41:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 82488 invoked by uid 89); 25 Jun 2018 15:41:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 25 Jun 2018 15:41:38 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5PFeTDk034976 for ; Mon, 25 Jun 2018 11:41:37 -0400 Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.152]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ju0vuqsyg-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 25 Jun 2018 11:41:36 -0400 Received: from localhost by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 25 Jun 2018 09:41:35 -0600 Received: from b03cxnp08028.gho.boulder.ibm.com (9.17.130.20) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 25 Jun 2018 09:41:34 -0600 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w5PFfXIN10355064 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Mon, 25 Jun 2018 08:41:33 -0700 Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 69C9178063 for ; Mon, 25 Jun 2018 09:41:33 -0600 (MDT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F0AE7805E for ; Mon, 25 Jun 2018 09:41:33 -0600 (MDT) Received: from ragesh3a (unknown [9.40.47.205]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP for ; Mon, 25 Jun 2018 09:41:33 -0600 (MDT) Subject: [PATCH, rs6000] don't use unaligned vsx for memset of less than 32 bytes From: Aaron Sawdey To: GCC Patches Date: Mon, 25 Jun 2018 10:41:32 -0500 Mime-Version: 1.0 x-cbid: 18062515-0016-0000-0000-000008FF41F6 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009253; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000266; SDB=6.01052159; UDB=6.00539363; IPR=6.00830084; MB=3.00021850; MTD=3.00000008; XFM=3.00000015; UTC=2018-06-25 15:41:35 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18062515-0017-0000-0000-00003F681828 Message-Id: <979a1eeceb7c4c3f7b2068e9b924970760d695ff.camel@linux.ibm.com> X-IsSubscribed: yes In gcc 8 I added support for unaligned vsx in the builtin expansion of memset(x,0,y). Turns out that for memset of less than 32 bytes, this doesn't really help much, and it also runs into an egregious load-hit- store case in CPU2006 components gcc and hmmer. This patch reverts to the previous (gcc 7) behavior for memset of 16-31 bytes, which is to use vsx stores only if the target is 16 byte aligned. For 32 bytes or more, unaligned vsx stores will still be used. Performance testing of the memset expansion shows that not much is given up by using scalar stores for 16-31 bytes, and CPU2006 runs show the performance regression is fixed. Regstrap passes on powerpc64le, ok for trunk and backport to 8? Thanks, Aaron 2018-06-25 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_block_clear): Don't use unaligned vsx for 16B memset. Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 261808) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -90,7 +90,9 @@ machine_mode mode = BLKmode; rtx dest; - if (bytes >= 16 && TARGET_ALTIVEC && (align >= 128 || TARGET_EFFICIENT_UNALIGNED_VSX)) + if (TARGET_ALTIVEC + && ((bytes >= 16 && align >= 128) + || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) { clear_bytes = 16; mode = V4SImode;