From patchwork Thu Jun 14 14:29:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 929485 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-479727-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="L/YMvExP"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4165cY6nVRz9s01 for ; Fri, 15 Jun 2018 00:30:03 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:date:content-type:mime-version:message-id; q=dns; s=default; b=X2H6XiqwT8Tql+RuTmItRLT5DH92dF1yj/rLoLiQ6dG iau2pvWgc0yFCsxIaXhK30DKHtoXue9cRiI2Bw0/p6yahx4H1WM4scsnubLEIypo 9uRTJFYEshKpMyKV9I9GKhGurt63+wYSooVOq/SD0w2u/pWGvS6FXzO5Dy5zMBdw = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:date:content-type:mime-version:message-id; s=default; bh=sgM5w/pNtuMA20H75ZnOYKouer0=; b=L/YMvExPlbpJs6stc MghwFPs/QD3yGp8VSpINS24cawvPQHPqOogtrrNMZ0F/qzscjRMNRFJoXObaxJLh RvWbblob1y13e84gSmOGVBxvAhAAKVJXvsPHQH1TwsZ2IHzbIEAH+zk2Al/HsvCS ZXjte7oeoZQ9H1d2pr4P9xCHTU= Received: (qmail 80445 invoked by alias); 14 Jun 2018 14:29:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 80434 invoked by uid 89); 14 Jun 2018 14:29:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 14 Jun 2018 14:29:52 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5EETZff029096 for ; Thu, 14 Jun 2018 10:29:50 -0400 Received: from e17.ny.us.ibm.com (e17.ny.us.ibm.com [129.33.205.207]) by mx0b-001b2d01.pphosted.com with ESMTP id 2jksxqh0wg-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 14 Jun 2018 10:29:49 -0400 Received: from localhost by e17.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 14 Jun 2018 10:29:47 -0400 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e17.ny.us.ibm.com (146.89.104.204) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 14 Jun 2018 10:29:44 -0400 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w5EEThxK9109990 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 14 Jun 2018 14:29:43 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6DA85AC05B; Thu, 14 Jun 2018 10:31:04 -0400 (EDT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E512DAC059; Thu, 14 Jun 2018 10:31:03 -0400 (EDT) Received: from ragesh3a (unknown [9.80.229.147]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 14 Jun 2018 10:31:03 -0400 (EDT) Subject: [PATCH, rs6000] cleanup/refactor in rs6000-string.c From: Aaron Sawdey To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , Bill Schmidt Date: Thu, 14 Jun 2018 09:29:42 -0500 Mime-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18061414-0040-0000-0000-000004404F05 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009189; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000265; SDB=6.01046871; UDB=6.00536189; IPR=6.00825844; MB=3.00021644; MTD=3.00000008; XFM=3.00000015; UTC=2018-06-14 14:29:45 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18061414-0041-0000-0000-00000846598B Message-Id: <4293e454215febc08c2851160758589227cbb096.camel@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-06-14_05:, , signatures=0 X-IsSubscribed: yes This patch cleans up and refactors some stuff in rs6000-string.c before I start working on adding vec/vsx support to str[n]cmp inline expansion. Also removes the * from vsx_mov_64bit in vsx.md because I'll be using that pattern to generate lxvd2x. Bootstrap/regtest passes on ppc64le power8 -- ok for trunk? Thanks! Aaron 2018-06-14 Aaron Sawdey * config/rs6000/rs6000-string.c (select_block_compare_mode): Check TARGET_EFFICIENT_OVERLAPPING_UNALIGNED here instead of in caller. (do_and3, do_and3_mask, do_compb3, do_rotl3): New functions. (expand_block_compare): Change select_block_compare_mode call. (expand_strncmp_align_check): Use new functions, fix comment. (emit_final_str_compare_gpr): New function. (expand_strn_compare): Refactor and clean up code. * config/rs6000/vsx.md (vsx_mov_64bit): Remove *. Index: rs6000-string.c =================================================================== --- rs6000-string.c (revision 261573) +++ rs6000-string.c (working copy) @@ -264,6 +264,7 @@ else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) + && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -271,6 +272,7 @@ unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD + && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -406,6 +408,70 @@ emit_insn (gen_addsi3 (dest, src1, src2)); } +/* Emit an and of the proper mode for DEST. + + DEST is the destination register for the and. + SRC1 is the first and input. + SRC2 is the second and input. + + Computes DEST = SRC1&SRC2. */ +static void +do_and3 (rtx dest, rtx src1, rtx src2) +{ + if (GET_MODE (dest) == DImode) + emit_insn (gen_anddi3 (dest, src1, src2)); + else + emit_insn (gen_andsi3 (dest, src1, src2)); +} + +/* Emit an and-mask of the proper mode for DEST. + + DEST is the destination register for the and. + SRC1 is the first and input. + SRC2 is the mask input. + + Computes DEST = SRC1&SRC2. */ +static void +do_and3_mask (rtx dest, rtx src1, rtx src2) +{ + if (GET_MODE (dest) == DImode) + emit_insn (gen_anddi3_mask (dest, src1, src2)); + else + emit_insn (gen_andsi3_mask (dest, src1, src2)); +} + +/* Emit an cmpb of the proper mode for DEST. + + DEST is the destination register for the cmpb. + SRC1 is the first input. + SRC2 is the second input. + + Computes cmpb of SRC1, SRC2. */ +static void +do_cmpb3 (rtx dest, rtx src1, rtx src2) +{ + if (GET_MODE (dest) == DImode) + emit_insn (gen_cmpbdi3 (dest, src1, src2)); + else + emit_insn (gen_cmpbsi3 (dest, src1, src2)); +} + +/* Emit a rotl of the proper mode for DEST. + + DEST is the destination register for the and. + SRC1 is the first and input. + SRC2 is the second and input. + + Computes DEST = SRC1 rotated left by SRC2. */ +static void +do_rotl3 (rtx dest, rtx src1, rtx src2) +{ + if (GET_MODE (dest) == DImode) + emit_insn (gen_rotldi3 (dest, src1, src2)); + else + emit_insn (gen_rotlsi3 (dest, src1, src2)); +} + /* Generate rtl for a load, shift, and compare of less than a full word. LOAD_MODE is the machine mode for the loads. @@ -1393,11 +1459,8 @@ while (bytes > 0) { unsigned int align = compute_current_alignment (base_align, offset); - if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) - load_mode = select_block_compare_mode (offset, bytes, align, - word_mode_ok); - else - load_mode = select_block_compare_mode (0, bytes, align, word_mode_ok); + load_mode = select_block_compare_mode (offset, bytes, + align, word_mode_ok); load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; @@ -1625,22 +1688,19 @@ return true; } -/* Generate alignment check and branch code to set up for +/* Generate page crossing check and branch code to set up for strncmp when we don't have DI alignment. STRNCMP_LABEL is the label to branch if there is a page crossing. - SRC is the string pointer to be examined. + SRC_ADDR is the string address to be examined. BYTES is the max number of bytes to compare. */ static void -expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes) +expand_strncmp_align_check (rtx strncmp_label, rtx src_addr, HOST_WIDE_INT bytes) { rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label); - rtx src_check = copy_addr_to_reg (XEXP (src, 0)); - if (GET_MODE (src_check) == SImode) - emit_insn (gen_andsi3 (src_check, src_check, GEN_INT (0xfff))); - else - emit_insn (gen_anddi3 (src_check, src_check, GEN_INT (0xfff))); + rtx src_pgoff = gen_reg_rtx (GET_MODE (src_addr)); + do_and3 (src_pgoff, src_addr, GEN_INT (0xfff)); rtx cond = gen_reg_rtx (CCmode); - emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check, + emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_pgoff, GEN_INT (4096 - bytes))); rtx cmp_rtx = gen_rtx_GE (VOIDmode, cond, const0_rtx); @@ -1652,6 +1712,76 @@ LABEL_NUSES (strncmp_label) += 1; } +/* Generate the final sequence that identifies the differing + byte and generates the final result, taking into account + zero bytes: + + cmpb cmpb_result1, src1, src2 + cmpb cmpb_result2, src1, zero + orc cmpb_result1, cmp_result1, cmpb_result2 + cntlzd get bit of first zero/diff byte + addi convert for rldcl use + rldcl rldcl extract diff/zero byte + subf subtract for final result + + STR1 is the reg rtx for data from string 1. + STR2 is the reg rtx for data from string 2. + RESULT is the reg rtx for the comparison result. */ + +static void +emit_final_str_compare_gpr (rtx str1, rtx str2, rtx result) +{ + machine_mode m = GET_MODE (str1); + rtx cmpb_diff = gen_reg_rtx (m); + rtx cmpb_zero = gen_reg_rtx (m); + rtx rot_amt = gen_reg_rtx (m); + rtx zero_reg = gen_reg_rtx (m); + + rtx rot1_1 = gen_reg_rtx (m); + rtx rot1_2 = gen_reg_rtx (m); + rtx rot2_1 = gen_reg_rtx (m); + rtx rot2_2 = gen_reg_rtx (m); + + if (m == SImode) + { + emit_insn (gen_cmpbsi3 (cmpb_diff, str1, str2)); + emit_insn (gen_movsi (zero_reg, GEN_INT (0))); + emit_insn (gen_cmpbsi3 (cmpb_zero, str1, zero_reg)); + emit_insn (gen_one_cmplsi2 (cmpb_diff,cmpb_diff)); + emit_insn (gen_iorsi3 (cmpb_diff, cmpb_diff, cmpb_zero)); + emit_insn (gen_clzsi2 (rot_amt, cmpb_diff)); + emit_insn (gen_addsi3 (rot_amt, rot_amt, GEN_INT (8))); + emit_insn (gen_rotlsi3 (rot1_1, str1, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_andsi3_mask (rot1_2, rot1_1, GEN_INT (0xff))); + emit_insn (gen_rotlsi3 (rot2_1, str2, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_andsi3_mask (rot2_2, rot2_1, GEN_INT (0xff))); + emit_insn (gen_subsi3 (result, rot1_2, rot2_2)); + } + else if (m == DImode) + { + emit_insn (gen_cmpbdi3 (cmpb_diff, str1, str2)); + emit_insn (gen_movdi (zero_reg, GEN_INT (0))); + emit_insn (gen_cmpbdi3 (cmpb_zero, str1, zero_reg)); + emit_insn (gen_one_cmpldi2 (cmpb_diff,cmpb_diff)); + emit_insn (gen_iordi3 (cmpb_diff, cmpb_diff, cmpb_zero)); + emit_insn (gen_clzdi2 (rot_amt, cmpb_diff)); + emit_insn (gen_adddi3 (rot_amt, rot_amt, GEN_INT (8))); + emit_insn (gen_rotldi3 (rot1_1, str1, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_anddi3_mask (rot1_2, rot1_1, GEN_INT (0xff))); + emit_insn (gen_rotldi3 (rot2_1, str2, + gen_lowpart (SImode, rot_amt))); + emit_insn (gen_anddi3_mask (rot2_2, rot2_1, GEN_INT (0xff))); + emit_insn (gen_subdi3 (result, rot1_2, rot2_2)); + } + else + gcc_unreachable (); + + return; +} + /* Expand a string compare operation with length, and return true if successful. Return false if we should let the compiler generate normal code, probably a strncmp call. @@ -1682,8 +1812,8 @@ align_rtx = operands[4]; } unsigned HOST_WIDE_INT cmp_bytes = 0; - rtx src1 = orig_src1; - rtx src2 = orig_src2; + rtx src1_addr = force_reg (Pmode, XEXP (orig_src1, 0)); + rtx src2_addr = force_reg (Pmode, XEXP (orig_src2, 0)); /* If we have a length, it must be constant. This simplifies things a bit as we don't have to generate code to check if we've exceeded @@ -1696,8 +1826,8 @@ return false; unsigned int base_align = UINTVAL (align_rtx); - int align1 = MEM_ALIGN (orig_src1) / BITS_PER_UNIT; - int align2 = MEM_ALIGN (orig_src2) / BITS_PER_UNIT; + unsigned int align1 = MEM_ALIGN (orig_src1) / BITS_PER_UNIT; + unsigned int align2 = MEM_ALIGN (orig_src2) / BITS_PER_UNIT; /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ if (targetm.slow_unaligned_access (word_mode, align1) @@ -1749,8 +1879,9 @@ rtx final_move_label = gen_label_rtx (); rtx final_label = gen_label_rtx (); rtx begin_compare_label = NULL; + unsigned int required_align = 8; - if (base_align < 8) + if (base_align < required_align) { /* Generate code that checks distance to 4k boundary for this case. */ begin_compare_label = gen_label_rtx (); @@ -1773,14 +1904,14 @@ } else { - align_test = ROUND_UP (align_test, 8); - base_align = 8; + align_test = ROUND_UP (align_test, required_align); + base_align = required_align; } - if (align1 < 8) - expand_strncmp_align_check (strncmp_label, src1, align_test); - if (align2 < 8) - expand_strncmp_align_check (strncmp_label, src2, align_test); + if (align1 < required_align) + expand_strncmp_align_check (strncmp_label, src1_addr, align_test); + if (align2 < required_align) + expand_strncmp_align_check (strncmp_label, src2_addr, align_test); /* Now generate the following sequence: - branch to begin_compare @@ -1797,25 +1928,13 @@ emit_label (strncmp_label); - if (!REG_P (XEXP (src1, 0))) - { - rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0)); - src1 = replace_equiv_address (src1, src1_reg); - } - - if (!REG_P (XEXP (src2, 0))) - { - rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0)); - src2 = replace_equiv_address (src2, src2_reg); - } - if (no_length) { tree fun = builtin_decl_explicit (BUILT_IN_STRCMP); emit_library_call_value (XEXP (DECL_RTL (fun), 0), target, LCT_NORMAL, GET_MODE (target), - force_reg (Pmode, XEXP (src1, 0)), Pmode, - force_reg (Pmode, XEXP (src2, 0)), Pmode); + force_reg (Pmode, src1_addr), Pmode, + force_reg (Pmode, src2_addr), Pmode); } else { @@ -1833,8 +1952,8 @@ tree fun = builtin_decl_explicit (BUILT_IN_STRNCMP); emit_library_call_value (XEXP (DECL_RTL (fun), 0), target, LCT_NORMAL, GET_MODE (target), - force_reg (Pmode, XEXP (src1, 0)), Pmode, - force_reg (Pmode, XEXP (src2, 0)), Pmode, + force_reg (Pmode, src1_addr), Pmode, + force_reg (Pmode, src2_addr), Pmode, len_rtx, GET_MODE (len_rtx)); } @@ -1850,12 +1969,12 @@ rtx tmp_reg_src1 = gen_reg_rtx (word_mode); rtx tmp_reg_src2 = gen_reg_rtx (word_mode); - /* Generate sequence of ld/ldbrx, cmpb to compare out + /* Generate a sequence of GPR or VEC/VSX instructions to compare out to the length specified. */ unsigned HOST_WIDE_INT bytes_to_compare = compare_length; while (bytes_to_compare > 0) { - /* Compare sequence: + /* GPR compare sequence: check each 8B with: ld/ld cmpd bne If equal, use rldicr/cmpb to check for zero byte. cleanup code at end: @@ -1869,13 +1988,10 @@ The last compare can branch around the cleanup code if the result is zero because the strings are exactly equal. */ + unsigned int align = compute_current_alignment (base_align, offset); - if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) - load_mode = select_block_compare_mode (offset, bytes_to_compare, align, - word_mode_ok); - else - load_mode = select_block_compare_mode (0, bytes_to_compare, align, - word_mode_ok); + load_mode = select_block_compare_mode (offset, bytes_to_compare, + align, word_mode_ok); load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; @@ -1898,26 +2014,11 @@ rid of the extra bytes. */ cmp_bytes = bytes_to_compare; - src1 = adjust_address (orig_src1, load_mode, offset); - src2 = adjust_address (orig_src2, load_mode, offset); + rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset)); + do_load_for_compare_from_addr (load_mode, tmp_reg_src1, addr1, orig_src1); + rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset)); + do_load_for_compare_from_addr (load_mode, tmp_reg_src2, addr2, orig_src2); - if (!REG_P (XEXP (src1, 0))) - { - rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0)); - src1 = replace_equiv_address (src1, src1_reg); - } - set_mem_size (src1, load_mode_size); - - if (!REG_P (XEXP (src2, 0))) - { - rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0)); - src2 = replace_equiv_address (src2, src2_reg); - } - set_mem_size (src2, load_mode_size); - - do_load_for_compare (tmp_reg_src1, src1, load_mode); - do_load_for_compare (tmp_reg_src2, src2, load_mode); - /* We must always left-align the data we read, and clear any bytes to the right that are beyond the string. Otherwise the cmpb sequence won't produce the correct @@ -1929,16 +2030,8 @@ { /* Rotate left first. */ rtx sh = GEN_INT (BITS_PER_UNIT * (word_mode_size - load_mode_size)); - if (word_mode == DImode) - { - emit_insn (gen_rotldi3 (tmp_reg_src1, tmp_reg_src1, sh)); - emit_insn (gen_rotldi3 (tmp_reg_src2, tmp_reg_src2, sh)); - } - else - { - emit_insn (gen_rotlsi3 (tmp_reg_src1, tmp_reg_src1, sh)); - emit_insn (gen_rotlsi3 (tmp_reg_src2, tmp_reg_src2, sh)); - } + do_rotl3 (tmp_reg_src1, tmp_reg_src1, sh); + do_rotl3 (tmp_reg_src2, tmp_reg_src2, sh); } if (cmp_bytes < word_mode_size) @@ -1947,16 +2040,8 @@ turned into a rldicr instruction. */ HOST_WIDE_INT mb = BITS_PER_UNIT * (word_mode_size - cmp_bytes); rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb); - if (word_mode == DImode) - { - emit_insn (gen_anddi3_mask (tmp_reg_src1, tmp_reg_src1, mask)); - emit_insn (gen_anddi3_mask (tmp_reg_src2, tmp_reg_src2, mask)); - } - else - { - emit_insn (gen_andsi3_mask (tmp_reg_src1, tmp_reg_src1, mask)); - emit_insn (gen_andsi3_mask (tmp_reg_src2, tmp_reg_src2, mask)); - } + do_and3_mask (tmp_reg_src1, tmp_reg_src1, mask); + do_and3_mask (tmp_reg_src2, tmp_reg_src2, mask); } /* Cases to handle. A and B are chunks of the two strings. @@ -2013,32 +2098,17 @@ rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label); rtx condz = gen_reg_rtx (CCmode); rtx zero_reg = gen_reg_rtx (word_mode); - if (word_mode == SImode) + emit_move_insn (zero_reg, GEN_INT (0)); + do_cmpb3 (cmpb_zero, tmp_reg_src1, zero_reg); + + if (cmp_bytes < word_mode_size) { - emit_insn (gen_movsi (zero_reg, GEN_INT (0))); - emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg)); - if (cmp_bytes < word_mode_size) - { - /* Don't want to look at zero bytes past end. */ - HOST_WIDE_INT mb = - BITS_PER_UNIT * (word_mode_size - cmp_bytes); - rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb); - emit_insn (gen_andsi3_mask (cmpb_zero, cmpb_zero, mask)); - } + /* Don't want to look at zero bytes past end. */ + HOST_WIDE_INT mb = + BITS_PER_UNIT * (word_mode_size - cmp_bytes); + rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb); + do_and3_mask (cmpb_zero, cmpb_zero, mask); } - else - { - emit_insn (gen_movdi (zero_reg, GEN_INT (0))); - emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg)); - if (cmp_bytes < word_mode_size) - { - /* Don't want to look at zero bytes past end. */ - HOST_WIDE_INT mb = - BITS_PER_UNIT * (word_mode_size - cmp_bytes); - rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb); - emit_insn (gen_anddi3_mask (cmpb_zero, cmpb_zero, mask)); - } - } emit_move_insn (condz, gen_rtx_COMPARE (CCmode, cmpb_zero, zero_reg)); rtx cmpnz_rtx = gen_rtx_NE (VOIDmode, condz, const0_rtx); @@ -2057,23 +2127,11 @@ if (equality_compare_rest) { /* Update pointers past what has been compared already. */ - src1 = adjust_address (orig_src1, load_mode, offset); - src2 = adjust_address (orig_src2, load_mode, offset); + rtx src1 = force_reg (Pmode, + gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset))); + rtx src2 = force_reg (Pmode, + gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset))); - if (!REG_P (XEXP (src1, 0))) - { - rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0)); - src1 = replace_equiv_address (src1, src1_reg); - } - set_mem_size (src1, load_mode_size); - - if (!REG_P (XEXP (src2, 0))) - { - rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0)); - src2 = replace_equiv_address (src2, src2_reg); - } - set_mem_size (src2, load_mode_size); - /* Construct call to strcmp/strncmp to compare the rest of the string. */ if (no_length) { @@ -2080,8 +2138,7 @@ tree fun = builtin_decl_explicit (BUILT_IN_STRCMP); emit_library_call_value (XEXP (DECL_RTL (fun), 0), target, LCT_NORMAL, GET_MODE (target), - force_reg (Pmode, XEXP (src1, 0)), Pmode, - force_reg (Pmode, XEXP (src2, 0)), Pmode); + src1, Pmode, src2, Pmode); } else { @@ -2095,8 +2152,7 @@ tree fun = builtin_decl_explicit (BUILT_IN_STRNCMP); emit_library_call_value (XEXP (DECL_RTL (fun), 0), target, LCT_NORMAL, GET_MODE (target), - force_reg (Pmode, XEXP (src1, 0)), Pmode, - force_reg (Pmode, XEXP (src2, 0)), Pmode, + src1, Pmode, src2, Pmode, len_rtx, GET_MODE (len_rtx)); } @@ -2110,64 +2166,8 @@ if (cleanup_label) emit_label (cleanup_label); - /* Generate the final sequence that identifies the differing - byte and generates the final result, taking into account - zero bytes: + emit_final_str_compare_gpr (tmp_reg_src1, tmp_reg_src2, result_reg); - cmpb cmpb_result1, src1, src2 - cmpb cmpb_result2, src1, zero - orc cmpb_result1, cmp_result1, cmpb_result2 - cntlzd get bit of first zero/diff byte - addi convert for rldcl use - rldcl rldcl extract diff/zero byte - subf subtract for final result - */ - - rtx cmpb_diff = gen_reg_rtx (word_mode); - rtx cmpb_zero = gen_reg_rtx (word_mode); - rtx rot_amt = gen_reg_rtx (word_mode); - rtx zero_reg = gen_reg_rtx (word_mode); - - rtx rot1_1 = gen_reg_rtx (word_mode); - rtx rot1_2 = gen_reg_rtx (word_mode); - rtx rot2_1 = gen_reg_rtx (word_mode); - rtx rot2_2 = gen_reg_rtx (word_mode); - - if (word_mode == SImode) - { - emit_insn (gen_cmpbsi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2)); - emit_insn (gen_movsi (zero_reg, GEN_INT (0))); - emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg)); - emit_insn (gen_one_cmplsi2 (cmpb_diff,cmpb_diff)); - emit_insn (gen_iorsi3 (cmpb_diff, cmpb_diff, cmpb_zero)); - emit_insn (gen_clzsi2 (rot_amt, cmpb_diff)); - emit_insn (gen_addsi3 (rot_amt, rot_amt, GEN_INT (8))); - emit_insn (gen_rotlsi3 (rot1_1, tmp_reg_src1, - gen_lowpart (SImode, rot_amt))); - emit_insn (gen_andsi3_mask (rot1_2, rot1_1, GEN_INT (0xff))); - emit_insn (gen_rotlsi3 (rot2_1, tmp_reg_src2, - gen_lowpart (SImode, rot_amt))); - emit_insn (gen_andsi3_mask (rot2_2, rot2_1, GEN_INT (0xff))); - emit_insn (gen_subsi3 (result_reg, rot1_2, rot2_2)); - } - else - { - emit_insn (gen_cmpbdi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2)); - emit_insn (gen_movdi (zero_reg, GEN_INT (0))); - emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg)); - emit_insn (gen_one_cmpldi2 (cmpb_diff,cmpb_diff)); - emit_insn (gen_iordi3 (cmpb_diff, cmpb_diff, cmpb_zero)); - emit_insn (gen_clzdi2 (rot_amt, cmpb_diff)); - emit_insn (gen_adddi3 (rot_amt, rot_amt, GEN_INT (8))); - emit_insn (gen_rotldi3 (rot1_1, tmp_reg_src1, - gen_lowpart (SImode, rot_amt))); - emit_insn (gen_anddi3_mask (rot1_2, rot1_1, GEN_INT (0xff))); - emit_insn (gen_rotldi3 (rot2_1, tmp_reg_src2, - gen_lowpart (SImode, rot_amt))); - emit_insn (gen_anddi3_mask (rot2_2, rot2_1, GEN_INT (0xff))); - emit_insn (gen_subdi3 (result_reg, rot1_2, rot2_2)); - } - emit_label (final_move_label); emit_insn (gen_movsi (target, gen_lowpart (SImode, result_reg))); Index: vsx.md =================================================================== --- vsx.md (revision 261573) +++ vsx.md (working copy) @@ -1194,7 +1194,7 @@ ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR) ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW ;; VSX 0/-1 GPR 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) -(define_insn "*vsx_mov_64bit" +(define_insn "vsx_mov_64bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, , , r, we, ?wQ, ?&r, ??r, ??Y, , wo, v,