From patchwork Wed Apr 29 15:48:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Krebbel X-Patchwork-Id: 1279480 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ZFmRc3St; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49C2wl3Rpzz9sPF for ; Thu, 30 Apr 2020 01:48:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A166C389367A; Wed, 29 Apr 2020 15:48:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A166C389367A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1588175298; bh=plr19h3OJAucVOhdrMfRRRAKk+dvw4ryboP6RFxu/aQ=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ZFmRc3StoyVnWk814bPPMfdesqBrziaguot/Kbo44ByHXa8cFChYPddfkH2JjDzAq xZYiq//DzQNYMiRbxQm5x47b7Lz5/PG2LAbQxEkW7B3sKg3jg2FlnbzGnSyGcgWysa /uy4Xcd/KNVjKTf9QLr0PgwxeHMWvLcKBRvGnNzo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A72BE38930CB for ; Wed, 29 Apr 2020 15:48:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A72BE38930CB Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 03TFWH5f087132 for ; Wed, 29 Apr 2020 11:48:14 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 30mhc2jakv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 29 Apr 2020 11:48:14 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 03TFg2ZX005742 for ; Wed, 29 Apr 2020 15:48:11 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 30mcu70rt2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 29 Apr 2020 15:48:11 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 03TFm8EV53018704 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 29 Apr 2020 15:48:09 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D5C9142041 for ; Wed, 29 Apr 2020 15:48:08 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AC91942045 for ; Wed, 29 Apr 2020 15:48:08 +0000 (GMT) Received: from bart.boeblingen.de.ibm.com (unknown [9.145.80.228]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP for ; Wed, 29 Apr 2020 15:48:08 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/1] IBM Z: vec_store_len_r/vec_load_len_r fix Date: Wed, 29 Apr 2020 17:48:08 +0200 Message-Id: <20200429154808.29713-1-krebbel@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-29_07:2020-04-29, 2020-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=1 impostorscore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 malwarescore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290126 X-Spam-Status: No, score=-18.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Andreas Krebbel via Gcc-patches From: Andreas Krebbel Reply-To: Andreas Krebbel Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This fixes a problem with the vec_store_len_r intrinsic. The macros mapping the intrinsic to a GCC builtin had the wrong signature. With the patch an immediate length operand of vlrl/vstrl is handled the same way as if it was passed in a register to vlrlr/vstrlr. Values bigger than 15 always load the full vector. If it can be recognized that it is in effect a full vector register load or store it is now implemented with vl/vst instead. I'll commit it to mainline after successful bootstrap and regtest. gcc/ChangeLog: 2020-04-29 Andreas Krebbel * config/s390/constraints.md ("j>f", "jb4"): New constraints. * config/s390/vecintrin.h (vec_load_len_r, vec_store_len_r): Fix macro definitions. * config/s390/vx-builtins.md ("vlrlrv16qi", "vstrlrv16qi"): Add a separate expander. ("*vlrlrv16qi", "*vstrlrv16qi"): Add alternative for vl/vst. Change constraint for vlrl/vstrl to jb4. gcc/testsuite/ChangeLog: 2020-04-29 Andreas Krebbel * gcc.target/s390/zvector/vec_load_len_r.c: New test. * gcc.target/s390/zvector/vec_store_len_r.c: New test. --- gcc/config/s390/constraints.md | 14 ++- gcc/config/s390/vecintrin.h | 6 +- gcc/config/s390/vx-builtins.md | 58 +++++++++--- .../gcc.target/s390/zvector/vec_load_len_r.c | 94 +++++++++++++++++++ .../gcc.target/s390/zvector/vec_store_len_r.c | 94 +++++++++++++++++++ 5 files changed, 251 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec_load_len_r.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec_store_len_r.c diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md index 91e3db7a146..0b05c5ca729 100644 --- a/gcc/config/s390/constraints.md +++ b/gcc/config/s390/constraints.md @@ -38,6 +38,8 @@ ;; matching K constraint ;; jm6: An integer operand with the lowest order 6 bits all ones. ;; jdd: A constant operand that fits into the data section. +;; j>f: An integer operand whose lower 32 bits are greater than or equal to 15 +;; jb4: An unsigned constant 4 bit operand. ;; t -- Access registers 36 and 37. ;; v -- Vector registers v0-v31. ;; C -- A signed 8-bit constant (-128..127) @@ -425,7 +427,7 @@ ;; -;; Vector constraints follow. +;; Vector and scalar constraints for constant values follow. ;; (define_constraint "j00" @@ -462,6 +464,16 @@ "@internal An integer operand with the lowest order 6 bits all ones." (match_operand 0 "const_int_6bitset_operand")) +(define_constraint "j>f" + "@internal An integer operand whose lower 32 bits are greater than or equal to 15." + (and (match_code "const_int") + (match_test "(unsigned int)(ival & 0xffffffff) >= 15"))) + +(define_constraint "jb4" + "@internal Constant unsigned integer 4 bit value" + (and (match_code "const_int") + (match_test "ival >= 0 && ival <= 15"))) + ;; ;; Memory constraints follow. ;; diff --git a/gcc/config/s390/vecintrin.h b/gcc/config/s390/vecintrin.h index bd9355ec9d0..8ef4f44bb34 100644 --- a/gcc/config/s390/vecintrin.h +++ b/gcc/config/s390/vecintrin.h @@ -111,8 +111,10 @@ __lcbb(const void *ptr, int bndry) #define vec_round(X) __builtin_s390_vfi((X), 4, 4) #define vec_doublee(X) __builtin_s390_vfll((X)) #define vec_floate(X) __builtin_s390_vflr((X), 0, 0) -#define vec_load_len_r(X,Y) __builtin_s390_vlrl((Y),(X)) -#define vec_store_len_r(X,Y) __builtin_s390_vstrl((Y),(X)) +#define vec_load_len_r(X,L) \ + (__vector unsigned char)__builtin_s390_vlrlr((L),(X)) +#define vec_store_len_r(X,Y,L) \ + __builtin_s390_vstrlr((__vector signed char)(X),(L),(Y)) #define vec_all_nan(a) \ __extension__ ({ \ diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 0eed31923c5..6f1add02d0b 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -202,16 +202,34 @@ "vlbb\t%v0,%1,%2" [(set_attr "op_type" "VRX")]) -(define_insn "vlrlrv16qi" - [(set (match_operand:V16QI 0 "register_operand" "=v,v") - (unspec:V16QI [(match_operand:BLK 2 "memory_operand" "Q,Q") - (match_operand:SI 1 "nonmemory_operand" "d,C")] +; Vector load rightmost with length + +(define_expand "vlrlrv16qi" + [(set (match_operand:V16QI 0 "register_operand" "") + (unspec:V16QI [(match_operand:BLK 2 "memory_operand" "") + (match_operand:SI 1 "nonmemory_operand" "")] + UNSPEC_VEC_LOAD_LEN_R))] + "TARGET_VXE" +{ + /* vlrlr sets all length values beyond 15 to 15. Emulate the same + behavior for immediate length operands. vlrl would trigger a + SIGILL for too large immediate operands. */ + if (CONST_INT_P (operands[1]) + && (UINTVAL (operands[1]) & 0xffffffff) > 15) + operands[1] = GEN_INT (15); +}) + +(define_insn "*vlrlrv16qi" + [(set (match_operand:V16QI 0 "register_operand" "=v, v, v") + (unspec:V16QI [(match_operand:BLK 2 "memory_operand" "Q, R, Q") + (match_operand:SI 1 "nonmemory_operand" "d,j>f,jb4")] UNSPEC_VEC_LOAD_LEN_R))] "TARGET_VXE" "@ vlrlr\t%v0,%1,%2 + vl\t%v0,%2%A2 vlrl\t%v0,%2,%1" - [(set_attr "op_type" "VRS,VSI")]) + [(set_attr "op_type" "VRS,VRX,VSI")]) ; FIXME: The following two patterns might using vec_merge. But what is @@ -545,16 +563,32 @@ ; Vector store rightmost with length -(define_insn "vstrlrv16qi" - [(set (match_operand:BLK 2 "memory_operand" "=Q,Q") - (unspec:BLK [(match_operand:V16QI 0 "register_operand" "v,v") - (match_operand:SI 1 "nonmemory_operand" "d,C")] +(define_expand "vstrlrv16qi" + [(set (match_operand:BLK 2 "memory_operand" "") + (unspec:BLK [(match_operand:V16QI 0 "register_operand" "") + (match_operand:SI 1 "nonmemory_operand" "")] + UNSPEC_VEC_STORE_LEN_R))] + "TARGET_VXE" +{ + /* vstrlr sets all length values beyond 15 to 15. Emulate the same + behavior for immediate length operands. vstrl would trigger a + SIGILL for too large immediate operands. */ + if (CONST_INT_P (operands[1]) + && (UINTVAL (operands[1]) & 0xffffffff) > 15) + operands[1] = GEN_INT (15); +}) + +(define_insn "*vstrlrv16qi" + [(set (match_operand:BLK 2 "memory_operand" "=Q, R, Q") + (unspec:BLK [(match_operand:V16QI 0 "register_operand" "v, v, v") + (match_operand:SI 1 "nonmemory_operand" "d,j>f,jb4")] UNSPEC_VEC_STORE_LEN_R))] "TARGET_VXE" "@ - vstrlr\t%v0,%2,%1 - vstrl\t%v0,%1,%2" - [(set_attr "op_type" "VRS,VSI")]) + vstrlr\t%v0,%1,%2 + vst\t%v0,%2%A2 + vstrl\t%v0,%2,%1" + [(set_attr "op_type" "VRS,VRX,VSI")]) diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec_load_len_r.c b/gcc/testsuite/gcc.target/s390/zvector/vec_load_len_r.c new file mode 100644 index 00000000000..5d22bf61c7c --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/zvector/vec_load_len_r.c @@ -0,0 +1,94 @@ +/* { dg-do run } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-options "-O3 -mzarch -march=arch13 -mzvector --save-temps" } */ + +#include +#include + +typedef vector unsigned char uv16qi; + +const unsigned char test_vec[16] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 }; + +#define NUM_TEST_LENGTHS 3 + +unsigned int test_len[NUM_TEST_LENGTHS] = { 0, 12, 18 }; + + +/* Proceeding from left to right, the specified number (LEN+1) of + bytes from SOURCE are stored right-justified in TARGET. */ +void __attribute__((noinline, noclone, target ("arch=zEC12"))) +emul (const unsigned char *source, unsigned char *target, unsigned int len) +{ + int start = 15 - len; + if (start < 0) + start = 0; + for (int s = 0, t = start; t < 16; s++, t++) + target[t] = source[s]; +} + +uv16qi __attribute__((noinline, noclone)) +vec_load_len_r_reg (const unsigned char *s, unsigned int len) +{ + return vec_load_len_r (s, len); +} + +void __attribute__((noinline, noclone)) +vec_load_len_r_mem (const unsigned char *s, uv16qi *t, unsigned int *len) +{ + *t = vec_load_len_r (s, *len); +} + +#define GEN_CONST_FUNC(CONST) \ + static uv16qi inline \ + vec_load_len_r_const##CONST (const unsigned char *s) \ + { \ + return vec_load_len_r (s, CONST); \ + } + +#define GEN_CONST_TEST(CONST) \ + memset (exp_result, 0, 16); \ + emul (test_vec, exp_result, CONST); \ + result = (uv16qi) { 0 }; \ + result = vec_load_len_r_const##CONST (test_vec); \ + if (memcmp ((char*)&result, exp_result, 16) != 0) \ + __builtin_abort (); + +GEN_CONST_FUNC(0) +GEN_CONST_FUNC(12) +GEN_CONST_FUNC(18) + +int +main () +{ + unsigned char exp_result[16]; + uv16qi result; + + for (int i = 0; i < NUM_TEST_LENGTHS; i++) + { + memset (exp_result, 0, 16); + + emul (test_vec, exp_result, test_len[i]); + + result = (uv16qi) { 0 }; + result = vec_load_len_r_reg (test_vec, test_len[i]); + if (memcmp ((char*)&result, exp_result, 16) != 0) + __builtin_abort (); + + result = (uv16qi) { 0 }; + vec_load_len_r_mem (test_vec, &result, &test_len[i]); + if (memcmp ((char*)&result, exp_result, 16) != 0) + __builtin_abort (); + } + + GEN_CONST_TEST(0) + GEN_CONST_TEST(12) + GEN_CONST_TEST(18) + + return 0; +} + +/* vec_load_len_r_reg and vec_load_len_r_mem */ +/* { dg-final { scan-assembler-times "vlrlr\t" 2 } } */ + +/* For the 2 constants. The 3. should be implemented with vl. */ +/* { dg-final { scan-assembler-times "vlrl\t" 2 } } */ diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec_store_len_r.c b/gcc/testsuite/gcc.target/s390/zvector/vec_store_len_r.c new file mode 100644 index 00000000000..83ef90a2b10 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/zvector/vec_store_len_r.c @@ -0,0 +1,94 @@ +/* { dg-do run } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-options "-O3 -mzarch -march=arch13 -mzvector --save-temps" } */ + +#include +#include + +typedef vector unsigned char uv16qi; + +uv16qi test_vec = (uv16qi){ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 }; + +#define NUM_TEST_LENGTHS 3 + +unsigned int test_len[NUM_TEST_LENGTHS] = { 0, 12, 18 }; + + +/* Proceeding from left to right, the specified number (LEN+1) of + rightmost bytes from SOURCE are stored in TARGET. */ +void __attribute__((noinline, noclone, target ("arch=zEC12"))) +emul (unsigned char *source, unsigned char *target, unsigned int len) +{ + int start = 15 - len; + if (start < 0) + start = 0; + for (int s = start, t = 0; s < 16; s++, t++) + target[t] = source[s]; +} + +void __attribute__((noinline, noclone)) +vec_store_len_r_reg (uv16qi s, unsigned char *t, unsigned int len) +{ + vec_store_len_r (s, t, len); +} + +void __attribute__((noinline, noclone)) +vec_store_len_r_mem (uv16qi *s, unsigned char *t, unsigned int *len) +{ + vec_store_len_r (*s, t, *len); +} + +#define GEN_CONST_FUNC(CONST) \ + static void inline \ + vec_store_len_r_const##CONST (uv16qi s, unsigned char *t) \ + { \ + vec_store_len_r (s, t, CONST); \ + } + +#define GEN_CONST_TEST(CONST) \ + memset (exp_result, 0, 16); \ + emul ((unsigned char*)&test_vec, exp_result, CONST); \ + memset (result, 0, 16); \ + vec_store_len_r_const##CONST (test_vec, result); \ + if (memcmp (result, exp_result, 16) != 0) \ + __builtin_abort (); + +GEN_CONST_FUNC(0) +GEN_CONST_FUNC(12) +GEN_CONST_FUNC(18) + +int +main () +{ + unsigned char exp_result[16]; + unsigned char result[16]; + + for (int i = 0; i < NUM_TEST_LENGTHS; i++) + { + memset (exp_result, 0, 16); + + emul ((unsigned char*)&test_vec, exp_result, test_len[i]); + + memset (result, 0, 16); + vec_store_len_r_reg (test_vec, result, test_len[i]); + if (memcmp (result, exp_result, 16) != 0) + __builtin_abort (); + + memset (result, 0, 16); + vec_store_len_r_mem (&test_vec, result, &test_len[i]); + if (memcmp (result, exp_result, 16) != 0) + __builtin_abort (); + } + + GEN_CONST_TEST(0) + GEN_CONST_TEST(12) + GEN_CONST_TEST(18) + + return 0; +} + +/* vec_store_len_r_reg and vec_store_len_r_mem */ +/* { dg-final { scan-assembler-times "vstrlr\t" 2 } } */ + +/* For the 2 constants. The 3. should be implemented with vst. */ +/* { dg-final { scan-assembler-times "vstrl\t" 2 } } */