From patchwork Tue Jul 31 15:42:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Krebbel X-Patchwork-Id: 951685 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-482778-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="nyPoWQfR"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41g10x6CLfz9s2g for ; Wed, 1 Aug 2018 01:42:56 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=Vt1amRBpqi3V gdc6bl2EVyh7fVzzHKiM1x5US//EsxSXJcfHoQTdO7vmoFM4iFlEvm+EdgN9E10f raxEy/D1yqB/oaQDqg+1dZdWCd2YvNrBmwPCk73VcYy2IjN+cipH+e3+lOcQU5rV ywnrWUWwzACBP05KTfHBvOOmS/+O/Tk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=2KJjkpZhvtzPxi6jMB OYYaDem8Y=; b=nyPoWQfR/OZSI/Gc05JOfpUihwQeC+MZljleXotGP21eqxQ9iE y+pDWC8jxxgnvd51po2F1iX3UcpYbxPD7WRvMjzlvoxEd5aagnCntblIzQDXTG9e OSurAjyRSaG8hLEROJuTfsS+WdKbiA/iIzFO6nvRw4JAvfFBXLmtlVsCc= Received: (qmail 15576 invoked by alias); 31 Jul 2018 15:42:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 15480 invoked by uid 89); 31 Jul 2018 15:42:44 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=53, Krebbel X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Jul 2018 15:42:42 +0000 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w6VFYG27116916 for ; Tue, 31 Jul 2018 11:42:40 -0400 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2kjrdbxv05-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 31 Jul 2018 11:42:40 -0400 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 31 Jul 2018 16:42:37 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 31 Jul 2018 16:42:35 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w6VFgYZW39190640 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 31 Jul 2018 15:42:34 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2EFCCAE053; Tue, 31 Jul 2018 18:42:36 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 04D67AE04D; Tue, 31 Jul 2018 18:42:36 +0100 (BST) Received: from maggie.boeblingen.de.ibm.com (unknown [9.152.222.136]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Tue, 31 Jul 2018 18:42:35 +0100 (BST) From: Andreas Krebbel To: gcc-patches@gcc.gnu.org Cc: Andreas Krebbel Subject: [Committed] S/390: Don't emit prefetch instructions for clrmem Date: Tue, 31 Jul 2018 17:42:33 +0200 x-cbid: 18073115-4275-0000-0000-000002A162F4 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18073115-4276-0000-0000-000037A97562 Message-Id: <20180731154233.838-1-krebbel@linux.ibm.com> From: Andreas Krebbel gcc/ChangeLog: 2018-07-31 Andreas Krebbel * config/s390/s390.c (s390_expand_setmem): Make the unrolling to depend on whether prefetch instructions will be emitted or not. Use TARGET_SETMEM_PFD for checking whether prefetch instructions will be emitted or not. * config/s390/s390.h (TARGET_SETMEM_PREFETCH_DISTANCE) (TARGET_SETMEM_PFD): New macros. gcc/testsuite/ChangeLog: 2018-07-31 Andreas Krebbel * gcc.target/s390/memset-1.c: Improve testcase. --- gcc/config/s390/s390.c | 22 +++++---- gcc/config/s390/s390.h | 10 ++++ gcc/testsuite/gcc.target/s390/memset-1.c | 81 ++++++++++++++++++++++++-------- 3 files changed, 84 insertions(+), 29 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index a579e9d..ec588a2 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -5499,12 +5499,15 @@ s390_expand_setmem (rtx dst, rtx len, rtx val) /* Expand setmem/clrmem for a constant length operand without a loop if it will be shorter that way. - With a constant length and without pfd argument a - clrmem loop is 32 bytes -> 5.3 * xc - setmem loop is 36 bytes -> 3.6 * (mvi/stc + mvc) */ + clrmem loop (with PFD) is 30 bytes -> 5 * xc + clrmem loop (without PFD) is 24 bytes -> 4 * xc + setmem loop (with PFD) is 38 bytes -> ~4 * (mvi/stc + mvc) + setmem loop (without PFD) is 32 bytes -> ~4 * (mvi/stc + mvc) */ if (GET_CODE (len) == CONST_INT - && ((INTVAL (len) <= 256 * 5 && val == const0_rtx) - || INTVAL (len) <= 257 * 3) + && ((val == const0_rtx + && (INTVAL (len) <= 256 * 4 + || (INTVAL (len) <= 256 * 5 && TARGET_SETMEM_PFD(val,len)))) + || (val != const0_rtx && INTVAL (len) <= 257 * 4)) && (!TARGET_MVCLE || INTVAL (len) <= 256)) { HOST_WIDE_INT o, l; @@ -5618,12 +5621,11 @@ s390_expand_setmem (rtx dst, rtx len, rtx val) emit_label (loop_start_label); - if (TARGET_Z10 - && (GET_CODE (len) != CONST_INT || INTVAL (len) > 1024)) + if (TARGET_SETMEM_PFD (val, len)) { - /* Issue a write prefetch for the +4 cache line. */ - rtx prefetch = gen_prefetch (gen_rtx_PLUS (Pmode, dst_addr, - GEN_INT (1024)), + /* Issue a write prefetch. */ + rtx distance = GEN_INT (TARGET_SETMEM_PREFETCH_DISTANCE); + rtx prefetch = gen_prefetch (gen_rtx_PLUS (Pmode, dst_addr, distance), const1_rtx, const0_rtx); emit_insn (prefetch); PREFETCH_SCHEDULE_BARRIER_P (prefetch) = true; diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 71a12b8..c6aedcd 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -181,6 +181,16 @@ enum processor_flags #define TARGET_AVOID_CMP_AND_BRANCH (s390_tune == PROCESSOR_2817_Z196) +/* Issue a write prefetch for the +4 cache line. */ +#define TARGET_SETMEM_PREFETCH_DISTANCE 1024 + +/* Expand to a C expressions evaluating to true if a setmem to VAL of + length LEN should be emitted using prefetch instructions. */ +#define TARGET_SETMEM_PFD(VAL,LEN) \ + (TARGET_Z10 \ + && (s390_tune < PROCESSOR_2964_Z13 || (VAL) != const0_rtx) \ + && (!CONST_INT_P (LEN) || INTVAL ((LEN)) > TARGET_SETMEM_PREFETCH_DISTANCE)) + /* Run-time target specification. */ /* Defaults for option flags defined only on some subtargets. */ diff --git a/gcc/testsuite/gcc.target/s390/memset-1.c b/gcc/testsuite/gcc.target/s390/memset-1.c index 7b43b97c..3e201df 100644 --- a/gcc/testsuite/gcc.target/s390/memset-1.c +++ b/gcc/testsuite/gcc.target/s390/memset-1.c @@ -2,16 +2,23 @@ without loop statements. */ /* { dg-do compile } */ -/* { dg-options "-O3 -mzarch" } */ +/* { dg-options "-O3 -mzarch -march=z13" } */ -/* 1 mvc */ +/* 1 stc */ +void +*memset0(void *s, int c) +{ + return __builtin_memset (s, c, 1); +} + +/* 1 stc 1 mvc */ void *memset1(void *s, int c) { return __builtin_memset (s, c, 42); } -/* 3 mvc */ +/* 3 stc 3 mvc */ void *memset2(void *s, int c) { @@ -25,55 +32,62 @@ void return __builtin_memset (s, c, 0); } -/* mvc */ +/* 1 stc 1 mvc */ void *memset4(void *s, int c) { return __builtin_memset (s, c, 256); } -/* 2 mvc */ +/* 2 stc 2 mvc */ void *memset5(void *s, int c) { return __builtin_memset (s, c, 512); } -/* still 2 mvc through the additional first byte */ +/* 2 stc 2 mvc - still due to the stc bytes */ void *memset6(void *s, int c) { return __builtin_memset (s, c, 514); } -/* 3 mvc */ +/* 3 stc 2 mvc */ void *memset7(void *s, int c) { return __builtin_memset (s, c, 515); } -/* still 3 mvc through the additional first byte */ +/* 4 stc 4 mvc - 4 * 256 + 4 stc bytes */ void *memset8(void *s, int c) { - return __builtin_memset (s, c, 771); + return __builtin_memset (s, c, 1028); } -/* Use mvc loop: 2 mvc */ +/* 2 stc 1 pfd 2 mvc - start using mvc loop */ void *memset9(void *s, int c) { - return __builtin_memset (s, c, 772); + return __builtin_memset (s, c, 1029); } -/* 3 mvc with displacement overflow after the first */ +/* 2 stc 1 stcy 3 mvc - displacement overflow after the first */ void *memset10(void *s, int c) { return __builtin_memset ((char*)s + 4000, c, 700); } +/* 1 mvi */ +void +*clrmem0(void *s) +{ + return __builtin_memset (s, 0, 1); +} + /* 1 xc */ void *clrmem1(void *s) @@ -109,26 +123,55 @@ void return __builtin_memset (s, 0, 512); } -/* 3 xc */ +/* 4 xc */ void *clrmem6(void *s) { - return __builtin_memset (s, 0, 768); + return __builtin_memset (s, 0, 1024); } -/* start using xc loop */ +/* 2 xc - start using xc loop*/ void *clrmem7(void *s) { + return __builtin_memset (s, 0, 1025); +} + +/* 5 xc - on z10 PFD would be used in the loop body so the unrolled + variant would still be shorter. */ +__attribute__ ((target("tune=z10"))) +void +*clrmem7_z10(void *s) +{ + return __builtin_memset (s, 0, 1025); +} + +/* 5 xc */ +__attribute__ ((target("tune=z10"))) +void +*clrmem8_z10(void *s) +{ + return __builtin_memset (s, 0, 1280); +} + +/* 1 pfd 2 xc - start using xc loop also on z10 */ +__attribute__ ((target("tune=z10"))) +void +*clrmem9_z10(void *s) +{ return __builtin_memset (s, 0, 1281); } -/* 3 xc with displacement overflow after the first */ +/* 3 xc - displacement overflow after the first */ void -*clrmem8(void *s) +*clrmem10(void *s) { return __builtin_memset (s + 4000, 0, 700); } -/* { dg-final { scan-assembler-times "mvc" 19 } } */ -/* { dg-final { scan-assembler-times "xc" 15 } } */ +/* { dg-final { scan-assembler-times "mvi\\s" 1 } } */ +/* { dg-final { scan-assembler-times "mvc\\s" 20 } } */ +/* { dg-final { scan-assembler-times "xc\\s" 28 } } */ +/* { dg-final { scan-assembler-times "stc\\s" 21 } } */ +/* { dg-final { scan-assembler-times "stcy\\s" 1 } } */ +/* { dg-final { scan-assembler-times "pfd\\s" 2 } } */