From patchwork Mon May 15 07:17:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 1781139 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ZorERSYk; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QKW1n60c1z20db for ; Mon, 15 May 2023 17:18:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D7695385482C for ; Mon, 15 May 2023 07:18:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D7695385482C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1684135115; bh=xXO5ePXZLhcfcHK1sg2Rk8O7FDEKgib3hiUzFx8LwWg=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=ZorERSYkykNWYU+4CNUUKSxH9mEbF8wYizRuWzzMUeOtViI/tWpjH99KGVZ/8fTdT s1hU/5BiMTUmIQto3xW8SNxluj7MSfvtklUJpNVktU2IpraiOKGaqYAsrQi+PS1MU8 9RFETzJk+xq8H/1m4im7Es9stpnS+5fhJOMJpGNQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id CAE023857BB2 for ; Mon, 15 May 2023 07:18:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CAE023857BB2 Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34F77pLx019560 for ; Mon, 15 May 2023 07:18:09 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qkdy9kn8w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 May 2023 07:18:09 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34F5Ydi1007919 for ; Mon, 15 May 2023 07:18:06 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3qj264rwhn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 May 2023 07:18:06 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34F7HwTZ38142586 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 May 2023 07:17:58 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7C28220049; Mon, 15 May 2023 07:17:58 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5628A20040; Mon, 15 May 2023 07:17:58 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 15 May 2023 07:17:58 +0000 (GMT) To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 1/3] s390: Refactor block operation cpymem Date: Mon, 15 May 2023 09:17:36 +0200 Message-Id: <20230515071738.563660-2-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515071738.563660-1-stefansf@linux.ibm.com> References: <20230515071738.563660-1-stefansf@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: zVlsZ6kO71pSlnFsXA3tvwjgKnH5ODMg X-Proofpoint-GUID: zVlsZ6kO71pSlnFsXA3tvwjgKnH5ODMg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-15_04,2023-05-05_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 adultscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 priorityscore=1501 impostorscore=0 bulkscore=0 lowpriorityscore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305150062 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Stefan Schulze Frielinghaus via Gcc-patches From: Stefan Schulze Frielinghaus Reply-To: Stefan Schulze Frielinghaus Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Do not perform a libc function call into memcpy in case the size is not a compile-time constant but bounded and the upper bound is less than or equal to 256 bytes. gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_cpymem): Change function signature. * config/s390/s390.cc (s390_expand_cpymem): For memcpy's less than or equal to 256 byte do not perform a libc call. (s390_expand_insv): Adapt new function signature of s390_expand_cpymem. * config/s390/s390.md: Change expander into a version which takes 8 operands. --- gcc/config/s390/s390-protos.h | 2 +- gcc/config/s390/s390.cc | 84 +++++++++++++++++++++++++++-------- gcc/config/s390/s390.md | 10 +++-- 3 files changed, 74 insertions(+), 22 deletions(-) diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 67fe09e732d..2c7495ca247 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -107,7 +107,7 @@ extern void s390_reload_symref_address (rtx , rtx , rtx , bool); extern void s390_expand_plus_operand (rtx, rtx, rtx); extern void emit_symbolic_move (rtx *); extern void s390_load_address (rtx, rtx); -extern bool s390_expand_cpymem (rtx, rtx, rtx); +extern bool s390_expand_cpymem (rtx, rtx, rtx, rtx, rtx); extern void s390_expand_setmem (rtx, rtx, rtx); extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx); extern void s390_expand_vec_strlen (rtx, rtx, rtx); diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 505de995da8..95ea5e8d009 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -5650,27 +5650,27 @@ legitimize_reload_address (rtx ad, machine_mode mode ATTRIBUTE_UNUSED, return NULL_RTX; } -/* Emit code to move LEN bytes from DST to SRC. */ +/* Emit code to move LEN bytes from SRC to DST. */ bool -s390_expand_cpymem (rtx dst, rtx src, rtx len) +s390_expand_cpymem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx) { - /* When tuning for z10 or higher we rely on the Glibc functions to - do the right thing. Only for constant lengths below 64k we will - generate inline code. */ - if (s390_tune >= PROCESSOR_2097_Z10 - && (GET_CODE (len) != CONST_INT || INTVAL (len) > (1<<16))) - return false; + /* Exit early in case nothing has to be done. */ + if (CONST_INT_P (len) && UINTVAL (len) == 0) + return true; + + unsigned HOST_WIDE_INT min_len = UINTVAL (min_len_rtx); + unsigned HOST_WIDE_INT max_len + = max_len_rtx ? UINTVAL (max_len_rtx) : HOST_WIDE_INT_M1U; /* Expand memcpy for constant length operands without a loop if it is shorter that way. With a constant length argument a memcpy loop (without pfd) is 36 bytes -> 6 * mvc */ - if (GET_CODE (len) == CONST_INT - && INTVAL (len) >= 0 - && INTVAL (len) <= 256 * 6 - && (!TARGET_MVCLE || INTVAL (len) <= 256)) + if (CONST_INT_P (len) + && UINTVAL (len) <= 6 * 256 + && (!TARGET_MVCLE || UINTVAL (len) <= 256)) { HOST_WIDE_INT o, l; @@ -5681,14 +5681,57 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len) emit_insn (gen_cpymem_short (newdst, newsrc, GEN_INT (l > 256 ? 255 : l - 1))); } + + return true; } - else if (TARGET_MVCLE) + else if (TARGET_MVCLE + && (s390_tune < PROCESSOR_2097_Z10 + || (CONST_INT_P (len) && UINTVAL (len) <= (1 << 16)))) { emit_insn (gen_cpymem_long (dst, src, convert_to_mode (Pmode, len, 1))); + return true; } - else + /* Non-constant length and no loop required. */ + else if (!CONST_INT_P (len) && max_len <= 256) + { + rtx_code_label *end_label; + + if (min_len == 0) + { + end_label = gen_label_rtx (); + emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX, + GET_MODE (len), 1, end_label, + profile_probability::very_unlikely ()); + } + + rtx lenm1 = expand_binop (GET_MODE (len), add_optab, len, constm1_rtx, + NULL_RTX, 1, OPTAB_DIRECT); + + /* Prefer a vectorized implementation over one which makes use of an + execute instruction since it is faster (although it increases register + pressure). */ + if (max_len <= 16 && TARGET_VX) + { + rtx tmp = gen_reg_rtx (V16QImode); + lenm1 = convert_to_mode (SImode, lenm1, 1); + emit_insn (gen_vllv16qi (tmp, lenm1, src)); + emit_insn (gen_vstlv16qi (tmp, lenm1, dst)); + } + else if (TARGET_Z15) + emit_insn (gen_mvcrl (dst, src, convert_to_mode (SImode, lenm1, 1))); + else + emit_insn ( + gen_cpymem_short (dst, src, convert_to_mode (Pmode, lenm1, 1))); + + if (min_len == 0) + emit_label (end_label); + + return true; + } + + else if (s390_tune < PROCESSOR_2097_Z10 || (CONST_INT_P (len) && UINTVAL (len) <= (1 << 16))) { rtx dst_addr, src_addr, count, blocks, temp; rtx_code_label *loop_start_label = gen_label_rtx (); @@ -5706,8 +5749,9 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len) blocks = gen_reg_rtx (mode); convert_move (count, len, 1); - emit_cmp_and_jump_insns (count, const0_rtx, - EQ, NULL_RTX, mode, 1, end_label); + if (min_len == 0) + emit_cmp_and_jump_insns (count, const0_rtx, EQ, NULL_RTX, mode, 1, + end_label); emit_move_insn (dst_addr, force_operand (XEXP (dst, 0), NULL_RTX)); emit_move_insn (src_addr, force_operand (XEXP (src, 0), NULL_RTX)); @@ -5767,8 +5811,11 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len) emit_insn (gen_cpymem_short (dst, src, convert_to_mode (Pmode, count, 1))); emit_label (end_label); + + return true; } - return true; + + return false; } /* Emit code to set LEN bytes at DST to VAL. @@ -6599,7 +6646,8 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src) dest = adjust_address (dest, BLKmode, 0); set_mem_size (dest, size); - s390_expand_cpymem (dest, src_mem, GEN_INT (size)); + rtx size_rtx = GEN_INT (size); + s390_expand_cpymem (dest, src_mem, size_rtx, size_rtx, size_rtx); return true; } diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 00d39608e1d..d9ce287ab85 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3341,11 +3341,15 @@ (define_expand "cpymem" [(set (match_operand:BLK 0 "memory_operand" "") ; destination (match_operand:BLK 1 "memory_operand" "")) ; source - (use (match_operand:GPR 2 "general_operand" "")) ; count - (match_operand 3 "" "")] + (use (match_operand:GPR 2 "general_operand" "")) ; size + (match_operand 3 "") ; align + (match_operand 4 "") ; expected align + (match_operand 5 "") ; expected size + (match_operand 6 "") ; minimal size + (match_operand 7 "")] ; maximal size "" { - if (s390_expand_cpymem (operands[0], operands[1], operands[2])) + if (s390_expand_cpymem (operands[0], operands[1], operands[2], operands[6], operands[7])) DONE; else FAIL; From patchwork Mon May 15 07:17:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 1781137 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=i2r/EpoC; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QKW1j0Gp5z20db for ; Mon, 15 May 2023 17:18:31 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5B91D38555BE for ; Mon, 15 May 2023 07:18:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5B91D38555BE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1684135107; bh=fXtD0uDZPL/SG8gS+wCBp0QA7MJI8gWnTbLP3vPnRJk=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=i2r/EpoCmQZeNi3wP83bJfGAr7zuC/YhrqPJmcVvKfXOF5Q9R0KhQzPAl+1jDtO3y BoSaIv/y/WncAuw3Az+9dn1kMVQwtC6sS0F8BJeEsEmCEmDdBsu+76l8fuVb4gAKMH SE4hDIbLzQPmNMhDSw8Dy7Ig8RMvsueEcRJcc9bc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 9E6273858D37 for ; Mon, 15 May 2023 07:18:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9E6273858D37 Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34F79LYm023872 for ; Mon, 15 May 2023 07:18:07 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qkfqtrw3m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 May 2023 07:18:06 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34F3uHjw031255 for ; Mon, 15 May 2023 07:18:05 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3qj264rrtd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 May 2023 07:18:05 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34F7I0vE28836136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 May 2023 07:18:01 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8AC920049; Mon, 15 May 2023 07:18:00 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9F23C20043; Mon, 15 May 2023 07:18:00 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 15 May 2023 07:18:00 +0000 (GMT) To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 2/3] s390: Add block operation movmem Date: Mon, 15 May 2023 09:17:37 +0200 Message-Id: <20230515071738.563660-3-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515071738.563660-1-stefansf@linux.ibm.com> References: <20230515071738.563660-1-stefansf@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: o0aLx8bi3cirq2Dx2eyu68AJTHwHmb4u X-Proofpoint-ORIG-GUID: o0aLx8bi3cirq2Dx2eyu68AJTHwHmb4u X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-15_04,2023-05-05_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 adultscore=0 mlxlogscore=999 phishscore=0 suspectscore=0 mlxscore=0 bulkscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305150062 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Stefan Schulze Frielinghaus via Gcc-patches From: Stefan Schulze Frielinghaus Reply-To: Stefan Schulze Frielinghaus Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_movmem): New. * config/s390/s390.cc (s390_expand_movmem): New. * config/s390/s390.md (movmem): New. (*mvcrl): New. (mvcrl): New. --- gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.cc | 88 +++++++++++++++++++++++++++++++++++ gcc/config/s390/s390.md | 35 ++++++++++++++ 3 files changed, 124 insertions(+) diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 2c7495ca247..65e4f97b41e 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -108,6 +108,7 @@ extern void s390_expand_plus_operand (rtx, rtx, rtx); extern void emit_symbolic_move (rtx *); extern void s390_load_address (rtx, rtx); extern bool s390_expand_cpymem (rtx, rtx, rtx, rtx, rtx); +extern bool s390_expand_movmem (rtx, rtx, rtx, rtx, rtx); extern void s390_expand_setmem (rtx, rtx, rtx); extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx); extern void s390_expand_vec_strlen (rtx, rtx, rtx); diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 95ea5e8d009..553273f23ff 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -5818,6 +5818,94 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx) return false; } +bool +s390_expand_movmem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx) +{ + /* Exit early in case nothing has to be done. */ + if (CONST_INT_P (len) && UINTVAL (len) == 0) + return true; + /* Exit early in case length is not upper bounded. */ + else if (max_len_rtx == NULL) + return false; + + unsigned HOST_WIDE_INT min_len = UINTVAL (min_len_rtx); + unsigned HOST_WIDE_INT max_len = UINTVAL (max_len_rtx); + + /* At most 16 bytes. */ + if (max_len <= 16 && TARGET_VX) + { + rtx_code_label *end_label; + + if (min_len == 0) + { + end_label = gen_label_rtx (); + emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX, + GET_MODE (len), 1, end_label, + profile_probability::very_unlikely ()); + } + + rtx lenm1; + if (CONST_INT_P (len)) + { + lenm1 = gen_reg_rtx (SImode); + emit_move_insn (lenm1, GEN_INT (UINTVAL (len) - 1)); + } + else + lenm1 + = expand_binop (SImode, add_optab, convert_to_mode (SImode, len, 1), + constm1_rtx, NULL_RTX, 1, OPTAB_DIRECT); + + rtx tmp = gen_reg_rtx (V16QImode); + emit_insn (gen_vllv16qi (tmp, lenm1, src)); + emit_insn (gen_vstlv16qi (tmp, lenm1, dst)); + + if (min_len == 0) + emit_label (end_label); + + return true; + } + + /* At most 256 bytes. */ + else if (max_len <= 256 && TARGET_Z15) + { + rtx_code_label *end_label = gen_label_rtx (); + + if (min_len == 0) + emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX, GET_MODE (len), + 1, end_label, + profile_probability::very_unlikely ()); + + rtx dst_addr = gen_reg_rtx (Pmode); + rtx src_addr = gen_reg_rtx (Pmode); + emit_move_insn (dst_addr, force_operand (XEXP (dst, 0), NULL_RTX)); + emit_move_insn (src_addr, force_operand (XEXP (src, 0), NULL_RTX)); + + rtx lenm1 = CONST_INT_P (len) + ? GEN_INT (UINTVAL (len) - 1) + : expand_binop (GET_MODE (len), add_optab, len, constm1_rtx, + NULL_RTX, 1, OPTAB_DIRECT); + + rtx_code_label *right_to_left_label = gen_label_rtx (); + emit_cmp_and_jump_insns (src_addr, dst_addr, LT, NULL_RTX, GET_MODE (len), + 1, right_to_left_label); + + // MVC + emit_insn ( + gen_cpymem_short (dst, src, convert_to_mode (Pmode, lenm1, 1))); + emit_jump (end_label); + + // MVCRL + emit_label (right_to_left_label); + emit_insn (gen_mvcrl (dst, src, convert_to_mode (SImode, lenm1, 1))); + + emit_label (end_label); + + return true; + } + + return false; +} + /* Emit code to set LEN bytes at DST to VAL. Make use of clrmem if VAL is zero. */ diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index d9ce287ab85..abe3bbc5cd9 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -61,6 +61,7 @@ UNSPEC_ROUND UNSPEC_ICM UNSPEC_TIE + UNSPEC_MVCRL ; Convert CC into a str comparison result and copy it into an ; integer register @@ -3496,6 +3497,40 @@ [(set_attr "length" "8") (set_attr "type" "vs")]) +(define_expand "movmem" + [(set (match_operand:BLK 0 "memory_operand") ; destination + (match_operand:BLK 1 "memory_operand")) ; source + (use (match_operand:GPR 2 "general_operand")) ; size + (match_operand 3 "") ; align + (match_operand 4 "") ; expected align + (match_operand 5 "") ; expected size + (match_operand 6 "") ; minimal size + (match_operand 7 "")] ; maximal size + "" +{ + if (s390_expand_movmem (operands[0], operands[1], operands[2], operands[6], operands[7])) + DONE; + else + FAIL; +}) + +(define_insn "*mvcrl" + [(set (match_operand:BLK 0 "memory_operand" "=Q") + (unspec:BLK [(match_operand:BLK 1 "memory_operand" "Q") + (reg:SI GPR0_REGNUM)] + UNSPEC_MVCRL))] + "TARGET_Z15" + "mvcrl\t%0,%1" + [(set_attr "op_type" "SSE")]) + +(define_expand "mvcrl" + [(set (reg:SI GPR0_REGNUM) (match_operand:SI 2 "general_operand")) + (set (match_operand:BLK 0 "memory_operand" "=Q") + (unspec:BLK [(match_operand:BLK 1 "memory_operand" "Q") + (reg:SI GPR0_REGNUM)] + UNSPEC_MVCRL))] + "TARGET_Z15" + "") ; ; Test data class. From patchwork Mon May 15 07:17:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 1781140 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=TR0JhT0y; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QKW2z2W0zz20db for ; Mon, 15 May 2023 17:19:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8356C3853565 for ; Mon, 15 May 2023 07:19:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8356C3853565 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1684135175; bh=8MfEWJ5+Y6tt7Ro5HotTrmejGt8Bz9PSTYjlZkS/A7E=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=TR0JhT0ySdbQ7wh3S3Dwj6BE672RzqElcyj5FZSV0xn9r9bYh/giyCg24RkwmoZZs GRuULdS/0saycSDovP8AtnXnUcyL1rZUI7bOTMPbVdjIrE43UToWnciyBB3QZ12FCC lPG2gNXDyjz+lx5QdVIv84g3Vc/UAkgtcppjApqM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 995C63858002 for ; Mon, 15 May 2023 07:18:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 995C63858002 Received: from pps.filterd (m0353727.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34F77ba5026351 for ; Mon, 15 May 2023 07:18:08 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qkd3mn7qd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 May 2023 07:18:08 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34F3uHjx031255 for ; Mon, 15 May 2023 07:18:06 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3qj264rrte-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 15 May 2023 07:18:05 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34F7I2YD54591830 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 May 2023 07:18:02 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 529432004B; Mon, 15 May 2023 07:18:02 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2605120040; Mon, 15 May 2023 07:18:02 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 15 May 2023 07:18:02 +0000 (GMT) To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 3/3] s390: Refactor block operation setmem Date: Mon, 15 May 2023 09:17:38 +0200 Message-Id: <20230515071738.563660-4-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515071738.563660-1-stefansf@linux.ibm.com> References: <20230515071738.563660-1-stefansf@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: bgdc_ALd_OWafJkIcWaQSzvJvBPtT5hg X-Proofpoint-ORIG-GUID: bgdc_ALd_OWafJkIcWaQSzvJvBPtT5hg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-15_04,2023-05-05_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 phishscore=0 suspectscore=0 adultscore=0 mlxlogscore=998 bulkscore=0 malwarescore=0 clxscore=1015 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305150062 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Stefan Schulze Frielinghaus via Gcc-patches From: Stefan Schulze Frielinghaus Reply-To: Stefan Schulze Frielinghaus Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Vectorize memset with a constant length of less than or equal to 64 bytes. Do not perform a libc function call into memset in case the size is not a compile-time constant but bounded and the upper bound is less than or equal to 256 bytes. gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_setmem): Change function signature. * config/s390/s390.cc (s390_expand_setmem): For memset's less than or equal to 256 byte do not perform a libc call. * config/s390/s390.md: Change expander into a version which takes 8 operands. gcc/testsuite/ChangeLog: * gcc.target/s390/memset-1.c: Test case memset1 makes use of vst, now. --- gcc/config/s390/s390-protos.h | 2 +- gcc/config/s390/s390.cc | 129 +++++++++++++++++++++-- gcc/config/s390/s390.md | 14 ++- gcc/testsuite/gcc.target/s390/memset-1.c | 7 +- 4 files changed, 132 insertions(+), 20 deletions(-) diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 65e4f97b41e..4a5263fccec 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -109,7 +109,7 @@ extern void emit_symbolic_move (rtx *); extern void s390_load_address (rtx, rtx); extern bool s390_expand_cpymem (rtx, rtx, rtx, rtx, rtx); extern bool s390_expand_movmem (rtx, rtx, rtx, rtx, rtx); -extern void s390_expand_setmem (rtx, rtx, rtx); +extern void s390_expand_setmem (rtx, rtx, rtx, rtx, rtx); extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx); extern void s390_expand_vec_strlen (rtx, rtx, rtx); extern void s390_expand_vec_movstr (rtx, rtx, rtx); diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 553273f23ff..b1cb54612b8 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -5910,20 +5910,62 @@ s390_expand_movmem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx) Make use of clrmem if VAL is zero. */ void -s390_expand_setmem (rtx dst, rtx len, rtx val) +s390_expand_setmem (rtx dst, rtx len, rtx val, rtx min_len_rtx, rtx max_len_rtx) { - if (GET_CODE (len) == CONST_INT && INTVAL (len) <= 0) + /* Exit early in case nothing has to be done. */ + if (CONST_INT_P (len) && UINTVAL (len) == 0) return; gcc_assert (GET_CODE (val) == CONST_INT || GET_MODE (val) == QImode); + unsigned HOST_WIDE_INT min_len = UINTVAL (min_len_rtx); + unsigned HOST_WIDE_INT max_len + = max_len_rtx ? UINTVAL (max_len_rtx) : HOST_WIDE_INT_M1U; + + /* Vectorize memset with a constant length + - if 0 < LEN < 16, then emit a vstl based solution; + - if 16 <= LEN <= 64, then emit a vst based solution + where the last two vector stores may overlap in case LEN%16!=0. Paying + the price for an overlap is negligible compared to an extra GPR which is + required for vstl. */ + if (CONST_INT_P (len) && UINTVAL (len) <= 64 && val != const0_rtx + && TARGET_VX) + { + rtx val_vec = gen_reg_rtx (V16QImode); + emit_move_insn (val_vec, gen_rtx_VEC_DUPLICATE (V16QImode, val)); + + if (UINTVAL (len) < 16) + { + rtx len_reg = gen_reg_rtx (SImode); + emit_move_insn (len_reg, GEN_INT (UINTVAL (len) - 1)); + emit_insn (gen_vstlv16qi (val_vec, len_reg, dst)); + } + else + { + unsigned HOST_WIDE_INT l = UINTVAL (len) / 16; + unsigned HOST_WIDE_INT r = UINTVAL (len) % 16; + unsigned HOST_WIDE_INT o = 0; + for (unsigned HOST_WIDE_INT i = 0; i < l; ++i) + { + rtx newdst = adjust_address (dst, V16QImode, o); + emit_move_insn (newdst, val_vec); + o += 16; + } + if (r != 0) + { + rtx newdst = adjust_address (dst, V16QImode, (o - 16) + r); + emit_move_insn (newdst, val_vec); + } + } + } + /* Expand setmem/clrmem for a constant length operand without a loop if it will be shorter that way. clrmem loop (with PFD) is 30 bytes -> 5 * xc clrmem loop (without PFD) is 24 bytes -> 4 * xc setmem loop (with PFD) is 38 bytes -> ~4 * (mvi/stc + mvc) setmem loop (without PFD) is 32 bytes -> ~4 * (mvi/stc + mvc) */ - if (GET_CODE (len) == CONST_INT + else if (GET_CODE (len) == CONST_INT && ((val == const0_rtx && (INTVAL (len) <= 256 * 4 || (INTVAL (len) <= 256 * 5 && TARGET_SETMEM_PFD(val,len)))) @@ -5968,6 +6010,70 @@ s390_expand_setmem (rtx dst, rtx len, rtx val) val)); } + /* Non-constant length and no loop required. */ + else if (!CONST_INT_P (len) && max_len <= 256) + { + rtx_code_label *end_label; + + if (min_len == 0) + { + end_label = gen_label_rtx (); + emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX, + GET_MODE (len), 1, end_label, + profile_probability::very_unlikely ()); + } + + rtx lenm1 = expand_binop (GET_MODE (len), add_optab, len, constm1_rtx, + NULL_RTX, 1, OPTAB_DIRECT); + + /* Prefer a vectorized implementation over one which makes use of an + execute instruction since it is faster (although it increases register + pressure). */ + if (max_len <= 16 && TARGET_VX) + { + rtx val_vec = gen_reg_rtx (V16QImode); + if (val == const0_rtx) + emit_move_insn (val_vec, CONST0_RTX (V16QImode)); + else + emit_move_insn (val_vec, gen_rtx_VEC_DUPLICATE (V16QImode, val)); + + lenm1 = convert_to_mode (SImode, lenm1, 1); + emit_insn (gen_vstlv16qi (val_vec, lenm1, dst)); + } + else + { + if (val == const0_rtx) + emit_insn ( + gen_clrmem_short (dst, convert_to_mode (Pmode, lenm1, 1))); + else + { + emit_move_insn (adjust_address (dst, QImode, 0), val); + + rtx_code_label *onebyte_end_label; + if (min_len <= 1) + { + onebyte_end_label = gen_label_rtx (); + emit_cmp_and_jump_insns ( + len, const1_rtx, EQ, NULL_RTX, GET_MODE (len), 1, + onebyte_end_label, profile_probability::very_unlikely ()); + } + + rtx dstp1 = adjust_address (dst, VOIDmode, 1); + rtx lenm2 + = expand_binop (GET_MODE (len), add_optab, len, GEN_INT (-2), + NULL_RTX, 1, OPTAB_DIRECT); + lenm2 = convert_to_mode (Pmode, lenm2, 1); + emit_insn (gen_cpymem_short (dstp1, dst, lenm2)); + + if (min_len <= 1) + emit_label (onebyte_end_label); + } + } + + if (min_len == 0) + emit_label (end_label); + } + else { rtx dst_addr, count, blocks, temp, dstp1 = NULL_RTX; @@ -5986,9 +6092,10 @@ s390_expand_setmem (rtx dst, rtx len, rtx val) blocks = gen_reg_rtx (mode); convert_move (count, len, 1); - emit_cmp_and_jump_insns (count, const0_rtx, - EQ, NULL_RTX, mode, 1, zerobyte_end_label, - profile_probability::very_unlikely ()); + if (min_len == 0) + emit_cmp_and_jump_insns (count, const0_rtx, EQ, NULL_RTX, mode, 1, + zerobyte_end_label, + profile_probability::very_unlikely ()); /* We need to make a copy of the target address since memset is supposed to return it unmodified. We have to make it here @@ -6003,10 +6110,10 @@ s390_expand_setmem (rtx dst, rtx len, rtx val) the mvc reading this value). */ set_mem_size (dst, 1); dstp1 = adjust_address (dst, VOIDmode, 1); - emit_cmp_and_jump_insns (count, - const1_rtx, EQ, NULL_RTX, mode, 1, - onebyte_end_label, - profile_probability::very_unlikely ()); + if (min_len <= 1) + emit_cmp_and_jump_insns (count, const1_rtx, EQ, NULL_RTX, mode, 1, + onebyte_end_label, + profile_probability::very_unlikely ()); } /* There is one unconditional (mvi+mvc)/xc after the loop @@ -6029,7 +6136,7 @@ s390_expand_setmem (rtx dst, rtx len, rtx val) emit_jump (loop_start_label); - if (val != const0_rtx) + if (val != const0_rtx && min_len <= 1) { /* The 1 byte != 0 special case. Not handled efficiently since we require two jumps for that. However, this diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index abe3bbc5cd9..9631b2a8c60 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3595,12 +3595,16 @@ ; (define_expand "setmem" - [(set (match_operand:BLK 0 "memory_operand" "") - (match_operand:QI 2 "general_operand" "")) - (use (match_operand:GPR 1 "general_operand" "")) - (match_operand 3 "" "")] + [(set (match_operand:BLK 0 "memory_operand" "") ; destination + (match_operand:QI 2 "general_operand" "")) ; value + (use (match_operand:GPR 1 "general_operand" "")) ; size + (match_operand 3 "") ; align + (match_operand 4 "") ; expected align + (match_operand 5 "") ; expected size + (match_operand 6 "") ; minimal size + (match_operand 7 "")] ; maximal size "" - "s390_expand_setmem (operands[0], operands[1], operands[2]); DONE;") + "s390_expand_setmem (operands[0], operands[1], operands[2], operands[6], operands[7]); DONE;") ; Clear a block that is up to 256 bytes in length. ; The block length is taken as (operands[1] % 256) + 1. diff --git a/gcc/testsuite/gcc.target/s390/memset-1.c b/gcc/testsuite/gcc.target/s390/memset-1.c index 9463a77208b..5eb96112f13 100644 --- a/gcc/testsuite/gcc.target/s390/memset-1.c +++ b/gcc/testsuite/gcc.target/s390/memset-1.c @@ -11,7 +11,7 @@ void return __builtin_memset (s, c, 1); } -/* 1 stc 1 mvc */ +/* 3 vst */ void *memset1(void *s, int c) { @@ -170,8 +170,9 @@ void } /* { dg-final { scan-assembler-times "mvi\\s" 1 } } */ -/* { dg-final { scan-assembler-times "mvc\\s" 20 } } */ +/* { dg-final { scan-assembler-times "mvc\\s" 19 } } */ /* { dg-final { scan-assembler-times "xc\\s" 28 } } */ -/* { dg-final { scan-assembler-times "stc\\s" 22 } } */ +/* { dg-final { scan-assembler-times "stc\\s" 21 } } */ /* { dg-final { scan-assembler-times "stcy\\s" 0 } } */ /* { dg-final { scan-assembler-times "pfd\\s" 2 } } */ +/* { dg-final { scan-assembler-times "vst\\s" 3 } } */