From patchwork Mon Nov 26 21:08:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 1003494 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-490925-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Ubp1cK/9"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 433ffS6hGFz9s3l for ; Tue, 27 Nov 2018 08:08:47 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=QDxAh cfkaWb/NB1xQRhofYxpTtTfOpUybgLo1RVUeboiX/67WuIVd3Sa8+Mol8yEJcKAc cjZL/FyQ/SwLkRwRizjZ+8OZ67Nq9+Cp+Dp46CtwWdKlmQiJ2lNu/8nbggt3EVyx 4oUY8pimERbHyMpGNOfNDvjC6DoqNm9VmolIts= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=anjWTiuL29t 61NxtViadYKEuxPA=; b=Ubp1cK/90yZlaE3AeUxK2l8asn/x1W43uQZtp3N0w11 Y+OptbKxbFIpRle8qxdSkCA20PqoYgTckScCS6k8f3+8j1vdCZuQZazPjaarDtiM pZssAJgtVgnOLteI+VgJKqvcPFD7ouPXC9Ff1jgFrD6VT8UFtLvjtThNpDbbkqB8 = Received: (qmail 56170 invoked by alias); 26 Nov 2018 21:08:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 56154 invoked by uid 89); 26 Nov 2018 21:08:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=cleared X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 26 Nov 2018 21:08:39 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wAQL5QbZ023956 for ; Mon, 26 Nov 2018 16:08:37 -0500 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0a-001b2d01.pphosted.com with ESMTP id 2p0qm8j7vk-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 26 Nov 2018 16:08:37 -0500 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 26 Nov 2018 21:08:36 -0000 Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 26 Nov 2018 21:08:34 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wAQL8XqY17039470 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 26 Nov 2018 21:08:33 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 92E9AB2065; Mon, 26 Nov 2018 21:08:33 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 08DC0B2067; Mon, 26 Nov 2018 21:08:32 +0000 (GMT) Received: from ragesh4.local (unknown [9.211.72.166]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 26 Nov 2018 21:08:32 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: Segher Boessenkool , Bill Schmidt , David Edelsohn From: Aaron Sawdey Subject: [PATCH][rs6000] better use of unaligned vsx in memset() expansion Date: Mon, 26 Nov 2018 15:08:32 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 x-cbid: 18112621-0064-0000-0000-0000037B8C55 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010125; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01123179; UDB=6.00583048; IPR=6.00903308; MB=3.00024341; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-26 21:08:36 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18112621-0065-0000-0000-00003B778E49 Message-Id: <0e5a2fa3-47df-47d4-89cb-5c421a1e366b@linux.ibm.com> X-IsSubscribed: yes When I previously added the use of unaligned vsx stores to inline expansion of memset, I didn't do a good job of managing boundary conditions. The intention was to only use unaligned vsx if the block being cleared was more than 32 bytes. What it actually did was to prevent the use of unaligned vsx for the last 32 bytes of any block being cleared. So this change puts the test up front so it is not affected by the decrement of bytes. OK for trunk if regstrap passes? Thanks! Aaron 2018-11-26 Aaron Sawdey * config/rs6000/rs6000-string.c (expand_block_clear): Change how we determine if unaligned vsx is ok. Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 266219) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -85,14 +85,14 @@ if (! optimize_size && bytes > 8 * clear_step) return 0; + bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); + for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) { machine_mode mode = BLKmode; rtx dest; - if (TARGET_ALTIVEC - && ((bytes >= 16 && align >= 128) - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) + if (TARGET_ALTIVEC && ((bytes >= 16 && align >= 128) || unaligned_vsx_ok)) { clear_bytes = 16; mode = V4SImode;