From patchwork Wed Jul 1 22:43:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1320882 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=CCBv7jRu; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xx8z75N5z9sR4 for ; Thu, 2 Jul 2020 08:43:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 99780386F45A; Wed, 1 Jul 2020 22:43:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 99780386F45A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593643423; bh=vfJ1MROisSCHPod3S9ahkCBsrGeGOUjRAffV7c7aaSI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=CCBv7jRuGZeiMPhl0saXinVQfIYrutihlLtcNZJR7E69HYj6vji/eV3ZyJvoVVIRd gTEoSOJCmlHQ08NJTme0td/492OjbKl/CflamXv3doj0gPYTLh4m4WhFCpRritPvzN 0Gs5Z+q3ckBL5A9WuJ/q18AWVYDhzEroFA4pOXq4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 29D6F3851C11 for ; Wed, 1 Jul 2020 22:43:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 29D6F3851C11 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 061MXAXV150957; Wed, 1 Jul 2020 18:43:37 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 320s23kvux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 01 Jul 2020 18:43:37 -0400 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 061MYfv1158631; Wed, 1 Jul 2020 18:43:37 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 320s23kvus-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 01 Jul 2020 18:43:37 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 061Meeco000712; Wed, 1 Jul 2020 22:43:36 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma03wdc.us.ibm.com with ESMTP id 31wwr8uqey-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 01 Jul 2020 22:43:36 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 061Mha5F53346648 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 1 Jul 2020 22:43:36 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 53AB62807A; Wed, 1 Jul 2020 22:43:36 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3AA8128071; Wed, 1 Jul 2020 22:43:36 +0000 (GMT) Received: from localhost (unknown [9.32.77.177]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 1 Jul 2020 22:43:36 +0000 (GMT) To: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Michael Meissner Subject: [PATCH] PowerPC: Optimize DImode -> vector store. Date: Wed, 1 Jul 2020 18:43:32 -0400 Message-Id: <1593643412-21873-1-git-send-email-meissner@linux.ibm.com> X-Mailer: git-send-email 1.8.3.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-01_15:2020-07-01, 2020-07-01 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 impostorscore=0 phishscore=0 suspectscore=0 mlxlogscore=999 cotscore=-2147483648 mlxscore=0 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2007010157 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Cc: Bill Schmidt Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch fixes a PR that I noticed several years ago during power8 development. I noticed that the compiler would often create a two element vector and store the vector. Particularly for DImode on power8, this could involve two direct moves and a XXPERMDI to glue the two parts together. On power9, there a single direct move instruction that combines the two elements. Originally I had the optimization for DFmode as well as DImode. I found if the values were already in vector registers, that generally it was faster to do the XXPERMDI and vector store. So I rewrote this patch to only optimize the DImode where the assumption is the DImode values will be in GPRs. I have done bootstraps with/without the patch, and there were no regressions. I did the builds on a little endian power9 linux system and a big endian power8 system (both 32/64-bit support on big endian). Can I check this change into the master branch. gcc/ 2020-06-30 Michael Meissner PR target/81594 * config/rs6000/predicates.md (ds_form_memory): New predicate. * config/rs6000/vsx.md (concatv2di_store): New insn. (dupv2di_store): New insn. gcc/testsuite/ 2020-06-30 Michael Meissner PR target/81594 * gcc.target/powerpc/pr81594.c: New test. --- gcc/config/rs6000/predicates.md | 42 +++++++++++++++ gcc/config/rs6000/vsx.md | 84 ++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/powerpc/pr81594.c | 61 ++++++++++++++++++++++ 3 files changed, 187 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr81594.c diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 9762855..4f7e313 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1856,3 +1856,45 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a valid memory operand with an offsettable +;; address that can be split into 2 sub-addresses, each of which is a valid +;; DS-form (bottom 2 bits of the offset are 0). This is used to optimize +;; creating a vector of two DImode elements and then storing the vector. We +;; want to eliminate the direct moves from GPRs to form the vector and do the +;; store directly from the GPRs. + +(define_predicate "ds_form_memory" + (match_code "mem") +{ + if (!memory_operand (op, mode)) + return false; + + rtx addr = XEXP (op, 0); + + if (REG_P (addr) || SUBREG_P (addr)) + return true; + + if (GET_CODE (addr) != PLUS) + return false; + + if (!base_reg_operand (XEXP (addr, 0), Pmode)) + return false; + + rtx offset = XEXP (addr, 1); + if (!CONST_INT_P (offset)) + return false; + + HOST_WIDE_INT value = INTVAL (offset); + + if (TARGET_PREFIXED) + return SIGNED_34BIT_OFFSET_EXTRA_P (value, GET_MODE_SIZE (DImode)); + + /* If we don't support prefixed addressing, ensure that the two addresses + created would each be valid for doing a STD instruction (which is a + DS-form instruction that requires the bottom 2 bits to be 0). */ + if ((value & 0x3) != 0) + return false; + + return SIGNED_16BIT_OFFSET_EXTRA_P (value, GET_MODE_SIZE (DImode)); +}) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 732a548..a9ebd24 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2896,6 +2896,90 @@ (define_insn "*vsx_concat__3" } [(set_attr "type" "vecperm")]) +;; If the only use for a VEC_CONCAT is to store 2 64-bit values, replace it +;; with two stores. Only do this on DImode, since it saves doing 1 direct move +;; on power9, and 2 direct moves + XXPERMDI on power8 to form the vector so we +;; can do a vector store. This typically shows up with -O3 where two stores +;; are combined into a vector. +;; +;; Typically DFmode would generate XXPERMDI and a vector store. Benchmarks +;; like Spec show that is typically the same speed or faster than doing the two +;; scalar DFmode stores. +(define_insn_and_split "*concatv2di_store" + [(set (match_operand:V2DI 0 "memory_operand" "=m,m,m,m") + (vec_concat:V2DI + (match_operand:DI 1 "gpc_reg_operand" "r,wa,r,wa") + (match_operand:DI 2 "gpc_reg_operand" "r,wa,wa,r"))) + (clobber (match_scratch:DI 3 "=&b,&b,&b,&b"))] + "TARGET_DIRECT_MOVE_64BIT" + "#" + "&& 1" + [(set (match_dup 4) + (match_dup 5)) + (set (match_dup 6) + (match_dup 7))] +{ + rtx mem = operands[0]; + + /* If the address can't be used directly for both stores, copy it to the + temporary base register. */ + if (!ds_form_memory (mem, V2DImode)) + { + rtx old_addr = XEXP (mem, 0); + rtx new_addr = operands[3]; + if (GET_CODE (new_addr) == SCRATCH) + new_addr = gen_reg_rtx (Pmode); + + emit_move_insn (new_addr, old_addr); + mem = change_address (mem, VOIDmode, new_addr); + } + + /* Because we are creating scalar stores, we don't have to swap the order + of the elements and then swap the stores to get the right order on + little endian systems. */ + operands[4] = adjust_address (mem, DImode, 0); + operands[5] = operands[1]; + operands[6] = adjust_address (mem, DImode, 8); + operands[7] = operands[2]; +} + [(set_attr "length" "8") + (set_attr "type" "store,fpstore,fpstore,store")]) + +;; Optimize creating a vector with 2 duplicate DImode elements and storing it. +(define_insn_and_split "*dupv2di_store" + [(set (match_operand:V2DI 0 "memory_operand" "=m,m") + (vec_duplicate:V2DI + (match_operand:DI 1 "gpc_reg_operand" "r,wa"))) + (clobber (match_scratch:DI 2 "=&b,&b"))] + "TARGET_DIRECT_MOVE_64BIT" + "#" + "&& 1" + [(set (match_dup 3) + (match_dup 1)) + (set (match_dup 4) + (match_dup 1))] +{ + rtx mem = operands[0]; + + /* If the address can't be used directly for both stores, copy it to the + temporary base register. */ + if (!ds_form_memory (mem, V2DImode)) + { + rtx old_addr = XEXP (mem, 0); + rtx new_addr = operands[2]; + if (GET_CODE (new_addr) == SCRATCH) + new_addr = gen_reg_rtx (Pmode); + + emit_move_insn (new_addr, old_addr); + mem = change_address (mem, VOIDmode, new_addr); + } + + operands[3] = adjust_address (mem, DImode, 0); + operands[4] = adjust_address (mem, DImode, 8); +} + [(set_attr "length" "8") + (set_attr "type" "store,fpstore")]) + ;; Special purpose concat using xxpermdi to glue two single precision values ;; together, relying on the fact that internally scalar floats are represented ;; as doubles. This is used to initialize a V4SF vector with 4 floats diff --git a/gcc/testsuite/gcc.target/powerpc/pr81594.c b/gcc/testsuite/gcc.target/powerpc/pr81594.c new file mode 100644 index 0000000..35a9749 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr81594.c @@ -0,0 +1,61 @@ +/* { dg-do compile { target { powerpc-*-* && ilp64 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ + +/* PR target/81594. Optimize creating a vector of 2 64-bit elements and then + storing the vector into separate stores. */ + +void +store_v2di_0 (vector unsigned long long *p, + unsigned long long a, + unsigned long long b) +{ + *p = (vector unsigned long long) { a, b }; +} + +void +store_v2di_4 (vector unsigned long long *p, + unsigned long long a, + unsigned long long b) +{ + p[4] = (vector unsigned long long) { a, b }; +} + +void +store_v2di_splat_0 (vector unsigned long long *p, unsigned long long a) +{ + *p = (vector unsigned long) { a, a }; +} + +void +store_v2di_splat_8 (vector unsigned long long *p, unsigned long long a) +{ + p[8] = (vector unsigned long long) { a, a }; +} + +/* 2047 is the largest index that can be used with DS-form instructions. */ +void +store_v2di_2047 (vector unsigned long long *p, + unsigned long long a, + unsigned long long b) +{ + p[2047] = (vector unsigned long long) { a, b }; +} + +/* 2048 will require the constant to be loaded because we can't use a pair of + DS-form instructions. If we have prefixed addressing, a prefixed form will + be generated instead. Two separate stores should still be issued. */ +void +store_v2di_2048 (vector unsigned long long *p, + unsigned long long a, + unsigned long long b) +{ + p[2048] = (vector unsigned long long) { a, b }; +} + +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ +/* { dg-final { scan-assembler-not {\mstxvx\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mmtvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mmtvsrdd\M} } } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */