From patchwork Thu Nov 9 01:31:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 1861825 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=ohvedmvY; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQkw65FhXz1yQl for ; Thu, 9 Nov 2023 12:32:17 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 251073852769 for ; Thu, 9 Nov 2023 01:32:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 111B83857C71 for ; Thu, 9 Nov 2023 01:32:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 111B83857C71 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 111B83857C71 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699493524; cv=none; b=HGdk/qtP8CKs3O533ctBSd1nvDnSUWCVndshf6yHqBkHjtr9oA7eu/kUh2YYWMCeqg9wU0gVfIrMA25NFeoe5biQ5dzTR4Msw6P6mLNQ3YcsaOMLfYuU7DMPuFHTsDudd188joH9DliCez9ZgIC7lisAZxNbT5d8yOOyAzv6ARw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699493524; c=relaxed/simple; bh=gjbNhsy3/hZN6dwuPnmkjMFXtdpduh3gq7sje53NJ98=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=jg/v8S/fpeb5i1OSmg703dnHtnWXCZm281I2sUBuvUpch31DOl9ir9E1gg69YdUGSr/jpUqC3OLiYrBPPZ6o0J6L2BK+L/gOQ1BTaeSw8xzL8AKUIPvLjVhYIqlOfZegpk+7QuU3GHNU4S7WUDmnSA3ZzUrK8PSqU+DWhcErQEk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A91Jj4m010969; Thu, 9 Nov 2023 01:32:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=xA4iEGMQ/y/gEw0s2SOuSTnn9O9RAKIQgk6pGjzhVR0=; b=ohvedmvYxRdMXDMFifRWi2FqGjJKas5ahVAgXSLD0xPC2rLaGPOiVFoxgm+saT2UBvZr B7In1wyXIZPXxbmLwef5o2uuXfOtdOsR2HEjT9XNAvU8iFl2WyJCU1HNQavg9kbRoLuK nae6tbSZH33g2VaY83rInQkAZI4Y5NublOhDwas5vFuKG2Fx2ajYJ0i/ZSRjZrDOmp1n Ne6hojhvCuCyoB8EravYXPid+jVkL7BlkAvujfJAgOVWhTLopf8XQd6hYQIqDq2347Nz WLsccVTar88Ehl+gzZaA//6mhnPu1MhxCnb47bqZ4fKIZYUvzfWUxemtqsFXyPjlEa/I 2g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8nnqr82g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Nov 2023 01:32:00 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3A91KNGJ012426; Thu, 9 Nov 2023 01:31:59 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8nnqr82a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Nov 2023 01:31:59 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A91VipZ004144; Thu, 9 Nov 2023 01:31:58 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u7w210qec-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 Nov 2023 01:31:58 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A91Vt9B45023864 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Nov 2023 01:31:56 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D5EBB20049; Thu, 9 Nov 2023 01:31:55 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 112CB20040; Thu, 9 Nov 2023 01:31:54 +0000 (GMT) Received: from [9.200.103.64] (unknown [9.200.103.64]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 9 Nov 2023 01:31:53 +0000 (GMT) Message-ID: <7ce9bdb2-7603-4ab5-af7c-0f3deb1f75fa@linux.ibm.com> Date: Thu, 9 Nov 2023 09:31:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449] X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 2MfgbGTt4BQkOcQDRu3ZCmMs5MLjo8xZ X-Proofpoint-GUID: QPW5Ppu7ssDWp1wycH_AZnF07CFDsy_Z X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-09_01,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 mlxscore=0 suspectscore=0 malwarescore=0 phishscore=0 clxscore=1015 lowpriorityscore=0 spamscore=0 impostorscore=0 priorityscore=1501 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311090009 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. As the vector load/store might be unaligned, the 16-byte move and compare are only enabled when VSX and EFFICIENT_UNALIGNED_VSX are both enabled. This patch enables 16-byte by pieces move. As the vector mode is not enabled for by pieces move, TImode is used for the move. It caused 2 regression cases. The root cause is that now 16-byte length array can be constructed by one load instruction and not be put into LC0 so that SRA optimization will not be taken. Compared to previous version, the main change is to modify the guard of expand pattern and compiling options of the test case. Also the fix for two regression cases caused by 16-byte move enablement is moved to this patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable vector mode for by pieces equality compare This patch adds a new expand pattern - cbranchv16qi4 to enable vector mode by pieces equality compare on rs6000. The macro MOVE_MAX_PIECES (COMPARE_MAX_PIECES) is set to 16 bytes when VSX and EFFICIENT_UNALIGNED_VSX is enabled, otherwise keeps unchanged. The macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by default, so now it's explicitly defined and keeps unchanged. gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for V16QImode equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (STORE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-1.c: New. * gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc. * gcc.dg/tree-ssa/sra-18.c: Likewise. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index e8a596fb7e9..a1423c76451 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2605,6 +2605,48 @@ (define_insn "altivec_vupklpx" } [(set_attr "type" "vecperm")]) +/* The cbranch_optabs doesn't allow FAIL, so old cpus which are + inefficient on unaligned vsx are disabled as the cost is high + for unaligned load/store. */ +(define_expand "cbranchv16qi4" + [(use (match_operator 0 "equality_operator" + [(match_operand:V16QI 1 "reg_or_mem_operand") + (match_operand:V16QI 2 "reg_or_mem_operand")])) + (use (match_operand 3))] + "VECTOR_MEM_VSX_P (V16QImode) + && TARGET_EFFICIENT_UNALIGNED_VSX" +{ + /* Use direct move for P8 LE to skip double-word swap, as the byte + order doesn't matter for equality compare. If any operands are + altivec indexed or indirect operands, the load can be implemented + directly by altivec aligned load instruction and swap is no + need. */ + if (!TARGET_P9_VECTOR + && !BYTES_BIG_ENDIAN + && MEM_P (operands[1]) + && !altivec_indexed_or_indirect_operand (operands[1], V16QImode) + && MEM_P (operands[2]) + && !altivec_indexed_or_indirect_operand (operands[2], V16QImode)) + { + rtx reg_op1 = gen_reg_rtx (V16QImode); + rtx reg_op2 = gen_reg_rtx (V16QImode); + rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode); + rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode); + operands[1] = reg_op1; + operands[2] = reg_op2; + } + else + { + operands[1] = force_reg (V16QImode, operands[1]); + operands[2] = force_reg (V16QImode, operands[2]); + } + + rtx_code code = GET_CODE (operands[0]); + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]); + rs6000_emit_cbranch (V16QImode, operands); + DONE; +}) + ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index cc24dd5301e..10279052636 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) else emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); } + else if (mode == V16QImode) + { + gcc_assert (code == EQ || code == NE); + + rtx result_vector = gen_reg_rtx (V16QImode); + rtx cc_bit = gen_reg_rtx (SImode); + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); + emit_insn (gen_cr6_test_for_lt (cc_bit)); + emit_insn (gen_rtx_SET (compare_result, + gen_rtx_COMPARE (comp_mode, cc_bit, + const1_rtx))); + } else emit_insn (gen_rtx_SET (compare_result, gen_rtx_COMPARE (comp_mode, op0, op1))); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 22595f6ebd7..aed58e5c4e7 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1730,6 +1730,9 @@ typedef struct rs6000_args in one reasonably fast instruction. */ #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) #define MAX_MOVE_MAX 8 +#define MOVE_MAX_PIECES ((TARGET_VSX && TARGET_EFFICIENT_UNALIGNED_VSX) \ + ? 16 : (TARGET_POWERPC64 ? 8 : 4)) +#define STORE_MAX_PIECES (TARGET_POWERPC64 ? 8 : 4) /* Nonzero if access to memory by bytes is no faster than for words. Also nonzero if doing byte operations (specifically shifts) in registers diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c index 221d96b6cd9..b0d4811e77b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c index f5e6a21c2ae..2cdeae6e9e7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */ extern void abort (void); struct foo { long x; }; diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-1.c b/gcc/testsuite/gcc.target/powerpc/pr111449-1.c new file mode 100644 index 00000000000..0c9e176d288 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */ + +/* Ensure vector mode is used for 16-byte by pieces equality compare. */ + +int compare1 (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 16) == 0; +} + +int compare2 (const char* s1) +{ + return __builtin_memcmp (s1, "0123456789012345", 16) == 0; +} + +/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */ +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */