From patchwork Fri Aug 10 13:38:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Youdkevitch X-Patchwork-Id: 956256 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-483502-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bell-sw.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="txvIovn7"; dkim=pass (1024-bit key; unprotected) header.d=bell-sw.com header.i=@bell-sw.com header.b="Y4VLQMnn"; dkim=pass (1024-bit key) header.d=bell-sw.com header.i=@bell-sw.com header.b="UtQbbXRa"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41n5nN5wZPz9s8k for ; Fri, 10 Aug 2018 23:39:01 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=UifiD9diYTVF+SBivjXNni1MD/UtWPv15Pn6Iq7fEcA992WERr RkUP7fV9k/ZPKsD4DYhaN7DW6TSKexeRrcE5TNul5vR4zpuQ0Fe56n9BsNkkyz6G lrV+4I62wDoW7ZTE7P3TA4Q3pY6mRhoo3Mg5m+LXkAIjb/MPsQCb0njQw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=0CEwzSjoJONFyrWjnb3gTVfmWRA=; b=txvIovn7i+u64MSl60LV B6E+gFETklXq4crpAZ78SyKyL/y5r5xV68hZXuq7Nn4nDscx+C0KMp1akKBlw1Po QJdZik58XErjbZgTnMWJMbGeWfsNDODK5sYdoP3qx7tSc6c35G7FTTtIdcVooflx hIbFpn3F7DlWWO1gm0XLpGI= Received: (qmail 106172 invoked by alias); 10 Aug 2018 13:38:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 106141 invoked by uid 89); 10 Aug 2018 13:38:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=mask1, H*r:Client, aarch64.c, UD:aarch64.c X-HELO: forward104o.mail.yandex.net Received: from forward104o.mail.yandex.net (HELO forward104o.mail.yandex.net) (37.140.190.179) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 10 Aug 2018 13:38:49 +0000 Received: from mxback11g.mail.yandex.net (mxback11g.mail.yandex.net [IPv6:2a02:6b8:0:1472:2741:0:8b7:90]) by forward104o.mail.yandex.net (Yandex) with ESMTP id D92CC3D83489; Fri, 10 Aug 2018 16:38:46 +0300 (MSK) Received: from smtp3o.mail.yandex.net (smtp3o.mail.yandex.net [2a02:6b8:0:1a2d::27]) by mxback11g.mail.yandex.net (nwsmtp/Yandex) with ESMTP id zTmr6X7S7f-ck3uF5qf; Fri, 10 Aug 2018 16:38:46 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell-sw.com; s=mail; t=1533908326; bh=XyeTyLi7UfSEMc82ENMpcmpeeoOw2jP4L/oT+kYH1KE=; h=Date:From:To:Cc:Subject:Message-ID; b=Y4VLQMnnr1nsmM6L+ca+piUz/f9gtZAiZ/BmJ5KFJ2lMnLSMml0YpUkEmPPgIoBJu PZkRJFW21Zs3TlO/tFMj7wOKx+IzkwLZ/BQo9u7uz3ARyIztOi6/Cvqi+A9GwglSeA mSN2uTMf/u9C5gTdOKQ6dyWjRdVo1CHeQwqTrBk4= Received: by smtp3o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id q1JQUIkuQL-cjTSRFFh; Fri, 10 Aug 2018 16:38:45 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell-sw.com; s=mail; t=1533908325; bh=XyeTyLi7UfSEMc82ENMpcmpeeoOw2jP4L/oT+kYH1KE=; h=Date:From:To:Cc:Subject:Message-ID; b=UtQbbXRapUTAwBZla2sn98jv4XOFIQTuXJamOdZCUqVZo2cGo0Ll+AHzsy9IKGTaw E0ZPwkNQu2sUSLsEzDXX9ZheoHFdiMA0PPQyfDpoUn17AZ5HrKZ3G8oMk1f/IOLZ45 +8NiB29rpfroitwUIxMDf1oBnJlVYroQSO2b8HrA= Authentication-Results: smtp3o.mail.yandex.net; dkim=pass header.i=@bell-sw.com Date: Fri, 10 Aug 2018 16:38:44 +0300 From: Anton Youdkevitch To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings Message-ID: <20180810133844.GB8523@bell-sw.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) The patch inlines strlen for 8-byte aligned strings on AARCH64 like it's done on other platforms (power, s390). The implementation falls back to the library call if the string is not aligned. Synthetic testing on Cavium T88 and Cavium T99 showed the following performance gains: T99: up to 8 bytes: +100%, 100+ bytes - +20% T88: up 8 bytes: +100%, 100+ bytes - 0% which seems to be OK as most of the string are short strings. SPEC performance testing on T99 and T88 did not show any statistically significant differences. Bootstrapped and regression-tested on aarch64-linux-gnu. No new failures found. OK for trunk? 2016-08-10 Anton Youdkevitch * gcc/config/aarch64/aarch64.md (strlen) New pattern. (UNSPEC_BUILTIN_STRLEN): Define. * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen): Expand only in 8-byte aligned case, do not attempt to adjust address * gcc/config/aarch64/aarch64-protos.h (aarch64_expand_strlen): Declare. * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index cda2895..9beb289 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); void aarch64_expand_call (rtx, rtx, bool); bool aarch64_expand_movmem (rtx *); +void aarch64_expand_strlen (rtx *); bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_float_const_rtx_p (rtx); bool aarch64_function_arg_regno_p (unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4b5183b..d12fb6b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands) return true; } +/* Emit code to perform a strlen. + + OPERANDS[0] is the destination. + OPERANDS[1] is the string. + OPERANDS[2] is the char to search. + OPERANDS[3] is the alignment. */ + +void aarch64_expand_strlen (rtx* operands) { + rtx result = operands[0]; + rtx src = operands[1]; + rtx loop_label = gen_label_rtx (); + rtx end_label = gen_label_rtx (); + rtx end_loop_label = gen_label_rtx (); + rtx preloop_label = gen_label_rtx (); + rtx str = gen_reg_rtx (DImode); + rtx addr = force_reg (DImode, XEXP (src, 0)); + rtx start_addr = gen_reg_rtx(DImode); + rtx tmp1 = gen_reg_rtx (DImode); + rtx tmp2 = gen_reg_rtx (DImode); + rtx tmp3 = gen_reg_rtx (DImode); + rtx mask1 = gen_reg_rtx (DImode); + rtx mask2 = gen_reg_rtx (DImode); + rtx x; + rtx mem; + + emit_insn (gen_rtx_SET (start_addr, addr)); + emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1))); + /* if less than 16 bytes left till the end of the page */ + x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (Pmode, preloop_label), pc_rtx); + + emit_move_insn (str, gen_rtx_MEM (DImode, addr)); + emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101))); + emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f))); + + /* process the chunk */ + emit_insn (gen_subdi3 (tmp1, str, mask1)); + emit_insn (gen_iordi3 (tmp2, str, mask2)); + emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2))); + emit_insn (gen_anddi3 (tmp3, tmp1, tmp2)); + + + /* if NULL found jump to calculate it's exact position */ + x = gen_rtx_NE (DImode, tmp3, GEN_INT (0)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (Pmode, end_loop_label), pc_rtx); + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); + + emit_insn (gen_adddi3 (addr, addr, GEN_INT (8))); + emit_label (preloop_label); + mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1)); + + /* simple byte loop */ + emit_label (loop_label); + emit_move_insn (str, gen_rtx_ZERO_EXTEND (DImode, gen_rtx_MEM (QImode, mem))); + x = gen_rtx_NE (SImode, str, GEN_INT(0)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, loop_label), pc_rtx); + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); + + emit_insn (gen_subdi3 (result, addr, start_addr)); + /* adjusting after the last post-decrement */ + emit_insn (gen_adddi3 (result, result, GEN_INT (-1))); + emit_jump_insn (gen_jump (end_label)); + emit_barrier (); + + emit_label (end_loop_label); + emit_insn (gen_bswapdi2 (tmp3, tmp3)); + emit_insn (gen_clzdi2 (tmp3, tmp3)); + emit_insn (gen_ashrdi3 (tmp3, tmp3, GEN_INT (3))); + emit_move_insn (result, tmp3); + + emit_label(end_label); +} + /* Split a DImode store of a CONST_INT SRC to MEM DST as two SImode stores. Handle the case when the constant has identical bottom and top halves. This is beneficial when the two stores can be diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 10fcde6..7c60b69 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -189,6 +189,7 @@ UNSPEC_CLASTB UNSPEC_FADDA UNSPEC_REV_SUBREG + UNSPEC_BUILTIN_STRLEN ]) (define_c_enum "unspecv" [ @@ -395,6 +396,19 @@ [(set_attr "type" "fccmp")] ) +(define_expand "strlen" + [(set (match_operand:P 0 "register_operand") + (unspec:P [(match_operand:BLK 1 "memory_operand") + (match_operand 2 "immediate_operand") + (match_operand 3 "immediate_operand")] + UNSPEC_BUILTIN_STRLEN))] + "" +{ + aarch64_expand_strlen (operands); + DONE; +}) + + ;; Expansion of signed mod by a power of 2 using CSNEG. ;; For x0 % n where n is a power of 2 produce: ;; negs x1, x0