From patchwork Wed Mar 26 10:22:22 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Bruel X-Patchwork-Id: 333810 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 69B10140081 for ; Wed, 26 Mar 2014 21:22:52 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=ocQGqf1FjIvIsnS3RwxWuwK/XWq8dG9n2IPa/MwBj4sf8I C9KfP48iPmPyudfyMYNV+edDm5arkdJuC9Ko3wK4IPqixtrrHFwAC/E/xN0KI+mf UW4X6mcLg39DTabg1oQhtPkTn+saIHjTw9NZBdBoAgXVmPjfrabHlKnvUzSQk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=61+X1lJ1mzaV0wcF78IZjdyW1lM=; b=jh5UP6SoLHZhVBnJz0nF vConDJ827lhGcCOcftNkmwlb8vvnP6j7gyDb043TkjVi34MntLhwd7pp3E6ynA8T 7R2yQ5UBY60XXuCQO4j9GnGrEQthYE4+EDCq5WfJ9Sbihl/oblyaNVMkfYEDaqFG iP4Q1KMiczDPRsXpVOOv+cE= Received: (qmail 8656 invoked by alias); 26 Mar 2014 10:22:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8643 invoked by uid 89); 26 Mar 2014 10:22:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 X-HELO: mx08-00178001.pphosted.com Received: from mx08-00178001.pphosted.com (HELO mx08-00178001.pphosted.com) (91.207.212.93) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 26 Mar 2014 10:22:41 +0000 Received: from pps.filterd (m0046661.ppops.net [127.0.0.1]) by mx08-00178001.pphosted.com (8.14.5/8.14.5) with SMTP id s2QAKVkg028414; Wed, 26 Mar 2014 11:22:38 +0100 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx08-00178001.pphosted.com with ESMTP id 1jrajke28m-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Wed, 26 Mar 2014 11:22:38 +0100 Received: from zeta.dmz-eu.st.com (zeta.dmz-eu.st.com [164.129.230.9]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 53FA94E; Wed, 26 Mar 2014 10:22:23 +0000 (GMT) Received: from Webmail-eu.st.com (safex1hubcas5.st.com [10.75.90.71]) by zeta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 2A89DA24D; Wed, 26 Mar 2014 10:22:23 +0000 (GMT) Received: from [164.129.122.166] (164.129.122.166) by webmail-eu.st.com (10.75.90.13) with Microsoft SMTP Server (TLS) id 8.3.298.1; Wed, 26 Mar 2014 11:22:22 +0100 Message-ID: <5332AA5E.1090701@st.com> Date: Wed, 26 Mar 2014 11:22:22 +0100 From: Christian Bruel User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" , Kaz Kojima Subject: [PATCH, SH] inline builtin_memset X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87, 1.0.14, 0.0.0000 definitions=2014-03-26_02:2014-03-26, 2014-03-26, 1970-01-01 signatures=0 X-IsSubscribed: yes Hello, This patch inlines builtin_memset whose size is a constant 128 < size < 15. Small sizes are better unrolled with mov_insn sequences. Big size (or non constants) are better handled with a libc implementation that does cache line aligned copying and unrolling or prefetching. No new regressions for sh-none-elf and sh-linux-elf without new errors. OK for trunk ? many thanks, 2014-03-20 Christian Bruel * config/sh/sh.md (setmemqi): New expand pattern. (CLEAR_RATIO): Define. * config/sh/sh-mem.cc (sh_expand_setmem): Define. * config/sh/sh-protos.h (sh_expand_setmem): Declare. 2014-01-20 Christian Bruel * gcc.target/sh/memset.c: New test. Index: gcc/config/sh/sh-mem.cc =================================================================== --- gcc/config/sh/sh-mem.cc (revision 208745) +++ gcc/config/sh/sh-mem.cc (working copy) @@ -608,3 +608,106 @@ sh_expand_strlen (rtx *operands) return true; } + +/* Emit code to perform a memset + + OPERANDS[0] is the destination. + OPERANDS[1] is the size; + OPERANDS[2] is the char to search. + OPERANDS[3] is the alignment. */ +void +sh_expand_setmem (rtx *operands) +{ + rtx L_loop_byte = gen_label_rtx (); + rtx L_loop_word = gen_label_rtx (); + rtx L_return = gen_label_rtx (); + rtx jump; + rtx dest = copy_rtx (operands[0]); + rtx dest_addr = copy_addr_to_reg (XEXP (dest, 0)); + rtx val = force_reg (SImode, operands[2]); + int align = INTVAL (operands[3]); + int count = 0; + rtx len = force_reg (SImode, operands[1]); + + if (! CONST_INT_P (operands[1])) + return; + + count = INTVAL (operands[1]); + + if (CONST_INT_P (operands[2]) + && (INTVAL (operands[2]) == 0 || INTVAL (operands[2]) == -1) && count > 8) + { + rtx lenw = gen_reg_rtx (SImode); + + if (align < 4) + { + emit_insn (gen_tstsi_t (GEN_INT (3), dest_addr)); + jump = emit_jump_insn (gen_branch_false (L_loop_byte)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + } + + /* word count. Do we have iterations ? */ + emit_insn (gen_lshrsi3 (lenw, len, GEN_INT (2))); + + dest = adjust_automodify_address (dest, SImode, dest_addr, 0); + + /* start loop. */ + emit_label (L_loop_word); + + if (TARGET_SH2) + emit_insn (gen_dect (lenw, lenw)); + else + { + emit_insn (gen_addsi3 (lenw, lenw, GEN_INT (-1))); + emit_insn (gen_tstsi_t (lenw, lenw)); + } + + emit_move_insn (dest, val); + emit_move_insn (dest_addr, plus_constant (Pmode, dest_addr, + GET_MODE_SIZE (SImode))); + + + jump = emit_jump_insn (gen_branch_false (L_loop_word)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + count = count % 4; + + dest = adjust_address (dest, QImode, 0); + + val = gen_lowpart (QImode, val); + + while (count--) + { + emit_move_insn (dest, val); + emit_move_insn (dest_addr, plus_constant (Pmode, dest_addr, + GET_MODE_SIZE (QImode))); + } + + jump = emit_jump_insn (gen_jump_compact (L_return)); + emit_barrier_after (jump); + } + + dest = adjust_automodify_address (dest, QImode, dest_addr, 0); + + /* start loop. */ + emit_label (L_loop_byte); + + if (TARGET_SH2) + emit_insn (gen_dect (len, len)); + else + { + emit_insn (gen_addsi3 (len, len, GEN_INT (-1))); + emit_insn (gen_tstsi_t (len, len)); + } + + val = gen_lowpart (QImode, val); + emit_move_insn (dest, val); + emit_move_insn (dest_addr, plus_constant (Pmode, dest_addr, + GET_MODE_SIZE (QImode))); + + jump = emit_jump_insn (gen_branch_false (L_loop_byte)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + + emit_label (L_return); + + return; +} Index: gcc/config/sh/sh-protos.h =================================================================== --- gcc/config/sh/sh-protos.h (revision 208745) +++ gcc/config/sh/sh-protos.h (working copy) @@ -119,6 +119,7 @@ extern void prepare_move_operands (rtx[], enum mac extern bool sh_expand_cmpstr (rtx *); extern bool sh_expand_cmpnstr (rtx *); extern bool sh_expand_strlen (rtx *); +extern void sh_expand_setmem (rtx *); extern enum rtx_code prepare_cbranch_operands (rtx *, enum machine_mode mode, enum rtx_code comparison); extern void expand_cbranchsi4 (rtx *operands, enum rtx_code comparison, int); Index: gcc/config/sh/sh.h =================================================================== --- gcc/config/sh/sh.h (revision 208745) +++ gcc/config/sh/sh.h (working copy) @@ -1594,6 +1594,11 @@ struct sh_args { #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN) +/* If a memory clear move would take CLEAR_RATIO or more simple + move-instruction pairs, we will do a setmem instead. */ + +#define CLEAR_RATIO(speed) ((speed) ? 15 : 3) + /* Macros to check register numbers against specific register classes. */ /* These assume that REGNO is a hard or pseudo reg number. Index: gcc/config/sh/sh.md =================================================================== --- gcc/config/sh/sh.md (revision 208745) +++ gcc/config/sh/sh.md (working copy) @@ -12089,6 +12089,20 @@ label: FAIL; }) +(define_expand "setmemqi" + [(parallel [(set (match_operand:BLK 0 "memory_operand") + (match_operand 2 "const_int_operand")) + (use (match_operand:QI 1 "const_int_operand")) + (use (match_operand:QI 3 "const_int_operand"))])] + "TARGET_SH1 && optimize" + { + if (optimize_insn_for_size_p ()) + FAIL; + + sh_expand_setmem (operands); + DONE; + }) + ;; ------------------------------------------------------------------------- ;; Floating point instructions. Index: gcc/testsuite/gcc.target/sh/memset.c =================================================================== --- gcc/testsuite/gcc.target/sh/memset.c (revision 0) +++ gcc/testsuite/gcc.target/sh/memset.c (working copy) @@ -0,0 +1,13 @@ +/* Check that the __builtin_memset function is inlined when + optimizing for speed. */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } } */ +/* { dg-final { scan-assembler-not "jmp" } } */ + +void +test00(char *dstb) +{ + __builtin_memset (dstb, 0, 15); +} +