From patchwork Thu Jul 8 22:01:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Bergner X-Patchwork-Id: 1502776 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Y4g9HZAd; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GLVd302jyz9sV8 for ; Fri, 9 Jul 2021 08:01:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 90E9F3857C70 for ; Thu, 8 Jul 2021 22:01:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90E9F3857C70 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1625781714; bh=1BKu7tZmDj8aJgfI+C5eaG60+BWLjqqDJg36DGUo8Cc=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Y4g9HZAd8iNkSENOkZcoVu9x3nRN0wLuYG5xxy+TaKKqE4afuudUDQg1fvBdVMsT8 fRpmyDSJu9b7t4mQUvCz486tysYTUQ6AXzQ/CjK3XEAPID0/fpS4Owhc2p0Io6nZwL vYWStXWEdqimDZtuUl4a6fjGN+fVM0la0URFAuaY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 599E43857C69 for ; Thu, 8 Jul 2021 22:01:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 599E43857C69 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 168LXljb073874; Thu, 8 Jul 2021 18:01:09 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 39n287tb00-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jul 2021 18:01:09 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 168Lq5N4011035; Thu, 8 Jul 2021 22:01:08 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma04dal.us.ibm.com with ESMTP id 39jfheqd74-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jul 2021 22:01:08 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 168M16l935389938 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 8 Jul 2021 22:01:07 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 985B7AE25E; Thu, 8 Jul 2021 22:01:06 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2D2E1AE243; Thu, 8 Jul 2021 22:01:06 +0000 (GMT) Received: from [9.65.201.128] (unknown [9.65.201.128]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 8 Jul 2021 22:01:06 +0000 (GMT) To: Segher Boessenkool Subject: rs6000: Generate an lxvp instead of two adjacent lxv instructions Message-ID: Date: Thu, 8 Jul 2021 17:01:05 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 0JSnQQYH3ajY-9HohXSCCaRkKVRmAFpW X-Proofpoint-ORIG-GUID: 0JSnQQYH3ajY-9HohXSCCaRkKVRmAFpW X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-08_12:2021-07-08, 2021-07-08 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 mlxscore=0 priorityscore=1501 malwarescore=0 suspectscore=0 spamscore=0 phishscore=0 lowpriorityscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107080110 X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Peter Bergner via Gcc-patches From: Peter Bergner Reply-To: Peter Bergner Cc: GCC Patches Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The MMA build built-ins currently use individual lxv instructions to load up the registers of a __vector_pair or __vector_quad. If the memory addresses of the built-in operands are to adjacent locations, then we could use an lxvp in some cases to load up two registers at once. The patch below adds support for checking whether memory addresses are adjacent and emitting an lxvp instead of two lxv instructions. This passed bootstrap and regtesting on powerpc64le-linux with no regressions. Ok for trunk? This seems simple enough, that I'd like to backport this to GCC 11 after some burn in on trunk, if that is ok? Given the MMA redesign from GCC 10 to GCC 11, I have no plans to backport this to GCC 10. Peter gcc/ * config/rs6000/rs6000.c (consecutive_mem_locations): New function. (rs6000_split_multireg_move): Handle MMA build built-ins with operands in consecutive memory locations. (adjacent_mem_locations): Return the lower addressed memory rtx, if any. (power6_sched_reorder2): Update for adjacent_mem_locations change. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-9.c: New test. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 9a5db63d0ef..de36c5ecd91 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -293,6 +293,8 @@ bool cpu_builtin_p = false; don't link in rs6000-c.c, so we can't call it directly. */ void (*rs6000_target_modify_macros_ptr) (bool, HOST_WIDE_INT, HOST_WIDE_INT); +static bool consecutive_mem_locations (rtx, rtx); + /* Simplfy register classes into simpler classifications. We assume GPR_REG_TYPE - FPR_REG_TYPE are ordered so that we can use a simple range check for standard register classes (gpr/floating/altivec/vsx) and @@ -16841,8 +16843,35 @@ rs6000_split_multireg_move (rtx dst, rtx src) for (int i = 0; i < nvecs; i++) { int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; - rtx dst_i = gen_rtx_REG (reg_mode, reg + index); - emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); + int index_next = WORDS_BIG_ENDIAN ? index + 1 : index - 1; + rtx dst_i; + int regno = reg + i; + + /* If we are loading an even VSX register and our memory location + is adjacent to the next register's memory location (if any), + then we can load them both with one LXVP instruction. */ + if ((regno & 1) == 0 + && VSX_REGNO_P (regno) + && MEM_P (XVECEXP (src, 0, index)) + && MEM_P (XVECEXP (src, 0, index_next))) + { + rtx base = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index) + : XVECEXP (src, 0, index_next); + rtx next = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index_next) + : XVECEXP (src, 0, index); + + if (consecutive_mem_locations (base, next)) + { + dst_i = gen_rtx_REG (OOmode, regno); + emit_move_insn (dst_i, adjust_address (base, OOmode, 0)); + /* Skip the next register, since we just loaded it. */ + i++; + continue; + } + } + + dst_i = gen_rtx_REG (reg_mode, reg + i); + emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, index))); } /* We are writing an accumulator register, so we have to @@ -18427,23 +18456,37 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset, return true; } -/* The function returns true if the target storage location of - mem1 is adjacent to the target storage location of mem2 */ -/* Return 1 if memory locations are adjacent. */ +/* If the target storage locations of arguments MEM1 and MEM2 are + adjacent, then return the argument that has the lower address. + Otherwise, return NULL_RTX. */ -static bool +static rtx adjacent_mem_locations (rtx mem1, rtx mem2) { rtx reg1, reg2; HOST_WIDE_INT off1, size1, off2, size2; if (get_memref_parts (mem1, ®1, &off1, &size1) - && get_memref_parts (mem2, ®2, &off2, &size2)) - return ((REGNO (reg1) == REGNO (reg2)) - && ((off1 + size1 == off2) - || (off2 + size2 == off1))); + && get_memref_parts (mem2, ®2, &off2, &size2) + && REGNO (reg1) == REGNO (reg2)) + { + if (off1 + size1 == off2) + return mem1; + else if (off2 + size2 == off1) + return mem2; + } - return false; + return NULL_RTX; +} + +/* The function returns true if the target storage location of + MEM1 is adjacent to the target storage location of MEM2 and + MEM1 has a lower address then MEM2. */ + +static bool +consecutive_mem_locations (rtx mem1, rtx mem2) +{ + return adjacent_mem_locations (mem1, mem2) == mem1; } /* This function returns true if it can be determined that the two MEM @@ -19009,7 +19052,7 @@ power6_sched_reorder2 (rtx_insn **ready, int lastpos) first_store_pos = pos; if (is_store_insn (last_scheduled_insn, &str_mem2) - && adjacent_mem_locations (str_mem, str_mem2)) + && adjacent_mem_locations (str_mem, str_mem2) != NULL_RTX) { /* Found an adjacent store. Move it to the head of the ready list, and adjust it's priority so that it is diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c new file mode 100644 index 00000000000..397d0f1db35 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +typedef unsigned char vec_t __attribute__((vector_size(16))); + +void +foo (__vector_pair *dst, vec_t *src) +{ + __vector_pair pair; + /* Adjacent loads should be combined into one lxvp instruction. */ + __builtin_vsx_build_pair (&pair, src[0], src[1]); + *dst = pair; +} + +void +bar (__vector_quad *dst, vec_t *src) +{ + __vector_quad quad; + /* Adjacent loads should be combined into two lxvp instructions. */ + __builtin_mma_build_acc (&quad, src[0], src[1], src[2], src[3]); + *dst = quad; +} + +/* { dg-final { scan-assembler-not {\mlxv\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */