From patchwork Wed Mar 26 07:58:57 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Bruel X-Patchwork-Id: 333797 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id C9A7914007A for ; Wed, 26 Mar 2014 18:59:28 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=AopBtMOv6YyCJH0y3bjUDCYsTa+0uHawMTaYzfB81Aa5mJ zTrBooEk6GwqXTvXzKPisfS54uGV2DuqjfpWyrOGyzFYDW/GmfSWPW9zodsenQEw cZ2rkxMi2TObhLzoN2xzldLCqyD5T/9LTJG25KILwHmzD0oEpeKgxkFtU8a88= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=D+9cmeC0tqSAo3OYSqL4TqgkC3A=; b=ihnV2tI1bvJoR3L6J4Gr fS9OBm3bh0PY8Nq6N9GDWDxGsfPdgqnEOeoE37yfF/LBV1LMfJxEwWsZW1nIjZcp yt5qtJC5mdjK7PvEqBhEqaTpooL0GEhFtRk3QpdEM1Khc/w/Vw1vomgULIBKlZ80 D9vv1ag8aDe832E/IP6X4J0= Received: (qmail 12563 invoked by alias); 26 Mar 2014 07:59:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12553 invoked by uid 89); 26 Mar 2014 07:59:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 X-HELO: mx08-00178001.pphosted.com Received: from mx08-00178001.pphosted.com (HELO mx08-00178001.pphosted.com) (91.207.212.93) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 26 Mar 2014 07:59:19 +0000 Received: from pps.filterd (m0046661.ppops.net [127.0.0.1]) by mx08-00178001.pphosted.com (8.14.5/8.14.5) with SMTP id s2Q7wiUx005172; Wed, 26 Mar 2014 08:59:16 +0100 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx08-00178001.pphosted.com with ESMTP id 1jrajkd7aw-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Wed, 26 Mar 2014 08:59:15 +0100 Received: from zeta.dmz-eu.st.com (zeta.dmz-eu.st.com [164.129.230.9]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 24F1651; Wed, 26 Mar 2014 07:58:58 +0000 (GMT) Received: from Webmail-eu.st.com (safex1hubcas5.st.com [10.75.90.71]) by zeta.dmz-eu.st.com (STMicroelectronics) with ESMTP id B543BB39C; Wed, 26 Mar 2014 07:58:58 +0000 (GMT) Received: from [164.129.122.166] (164.129.122.166) by webmail-eu.st.com (10.75.90.13) with Microsoft SMTP Server (TLS) id 8.3.298.1; Wed, 26 Mar 2014 08:58:58 +0100 Message-ID: <533288C1.1080306@st.com> Date: Wed, 26 Mar 2014 08:58:57 +0100 From: Christian Bruel User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" , Oleg Endo , Kaz Kojima Subject: [PATH, SH] Small builtin_strlen improvement X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87, 1.0.14, 0.0.0000 definitions=2014-03-26_01:2014-03-26, 2014-03-26, 1970-01-01 signatures=0 X-IsSubscribed: yes Hello, This patches adds a few instructions to the inlined builtin_strlen to unroll the remaining bytes for word-at-a-time loop. This enables to have 2 distinct execution paths (no fall-thru in the byte-at-a-time loop), allowing block alignment assignation. This partially improves the problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin string functions ignore loop and label alignment whereas the test now expands (-O2 -m4) as mov r4,r0 tst #3,r0 mov r4,r2 bf/s .L12 mov r4,r3 mov #0,r2 .L4: mov.l @r4+,r1 cmp/str r2,r1 bf .L4 add #-4,r4 mov.b @r4,r1 tst r1,r1 bt .L2 add #1,r4 mov.b @r4,r1 tst r1,r1 bt .L2 add #1,r4 mov.b @r4,r1 tst r1,r1 mov #-1,r1 negc r1,r1 add r1,r4 .L2: mov r4,r0 rts sub r3,r0 .align 1 .L12: mov.b @r4+,r1 tst r1,r1 bf/s .L12 mov r2,r3 add #1,r3 mov r4,r0 rts sub r3,r0 Best tuning compared to the "compact" version I got on is ~1% for c++ regular expression benchmark, but well, code looks best this way. regtested tested for -m2, -m4 OK for trunk ? 2014-03-20 Christian Bruel * config/sh/sh-mem.cc (sh_expand_strlen): Unroll last word. Index: gcc/config/sh/sh-mem.cc =================================================================== --- gcc/config/sh/sh-mem.cc (revision 208745) +++ gcc/config/sh/sh-mem.cc (working copy) @@ -586,9 +586,35 @@ sh_expand_strlen (rtx *operands) emit_move_insn (current_addr, plus_constant (Pmode, current_addr, -4)); - /* start byte loop. */ addr1 = adjust_address (addr1, QImode, 0); + /* unroll remaining bytes. */ + emit_insn (gen_extendqisi2 (tmp1, addr1)); + emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); + jump = emit_jump_insn (gen_branch_true (L_return)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + + emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); + + emit_insn (gen_extendqisi2 (tmp1, addr1)); + emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); + jump = emit_jump_insn (gen_branch_true (L_return)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + + emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); + + emit_insn (gen_extendqisi2 (tmp1, addr1)); + emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); + jump = emit_jump_insn (gen_branch_true (L_return)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + + emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); + + emit_insn (gen_extendqisi2 (tmp1, addr1)); + jump = emit_jump_insn (gen_jump_compact (L_return)); + emit_barrier_after (jump); + + /* start byte loop. */ emit_label (L_loop_byte); emit_insn (gen_extendqisi2 (tmp1, addr1)); @@ -600,11 +626,12 @@ sh_expand_strlen (rtx *operands) /* end loop. */ + emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1))); + emit_label (L_return); - emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1))); - emit_insn (gen_subsi3 (operands[0], current_addr, start_addr)); return true; } +