From patchwork Fri Feb 5 12:56:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kaushik Phatak X-Patchwork-Id: 579475 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id AC993140328 for ; Fri, 5 Feb 2016 23:57:14 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=pQ2IPMiU; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=RoylQFsTH1PHk8ASERpmCCFp+QpY4HIXEG3ae5/nfXjEsJSCwf x7cuHoJ1uwGmf3gwEBKzl/D4CF+ip6zowcukYDeYE8CDNLD+3+nvHbBbTSv20i9A BadgYaof2Mi9IbwNAZbnKE9LR4Wp/TTyvHkVdmgm5LPiMFu38A4P1Olmw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=/8FdAeyOUuWttXBmnQozIYCieOI=; b=pQ2IPMiUgHPWrsGYN37e mJVgbNTLKLc45HJwLUfEBcf5BFx9xI0+5gj4Nf3sO6K6kP0zqZG4K1kshMI9l9su ypw5NeAC38AdCojYbEk2gn6hlxkN4wqr8E1FPOuqvn9frGX5G11fbHELf0zUZP3P YCE/S27KyQdNAVHlH76/IqA= Received: (qmail 47286 invoked by alias); 5 Feb 2016 12:57:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 47272 invoked by uid 89); 5 Feb 2016 12:57:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.6 required=5.0 tests=BAYES_50, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=no version=3.3.2 spammy=Hx-spam-relays-external:sk:mail-pu, H*RU:sk:mail-pu, Hx-spam-relays-external:sk:APC01-P, H*RU:sk:APC01-P X-HELO: APC01-PU1-obe.outbound.protection.outlook.com Received: from mail-pu1apc01on0056.outbound.protection.outlook.com (HELO APC01-PU1-obe.outbound.protection.outlook.com) (104.47.126.56) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA256 encrypted) ESMTPS; Fri, 05 Feb 2016 12:56:59 +0000 Received: from SG2PR03MB1391.apcprd03.prod.outlook.com (10.169.54.13) by SG2PR03MB1390.apcprd03.prod.outlook.com (10.169.54.12) with Microsoft SMTP Server (TLS) id 15.1.403.16; Fri, 5 Feb 2016 12:56:53 +0000 Received: from SG2PR03MB1391.apcprd03.prod.outlook.com ([10.169.54.13]) by SG2PR03MB1391.apcprd03.prod.outlook.com ([10.169.54.13]) with mapi id 15.01.0403.016; Fri, 5 Feb 2016 12:56:53 +0000 From: Kaushik Phatak To: "'gcc-patches@gcc.gnu.org'" CC: "nick clifton (nickc@redhat.com)" Subject: [PATCH: RL78] Optimize libgcc routines using clrw and clrb Date: Fri, 5 Feb 2016 12:56:52 +0000 Message-ID: authentication-results: gcc.gnu.org; dkim=none (message not signed) header.d=none; gcc.gnu.org; dmarc=none action=none header.from=kpit.com; x-microsoft-exchange-diagnostics: 1; SG2PR03MB1390; 5:7tEoQMxly4QJ67BgZt26+UY08YB9azNt9xZtPr/uoverVKMzROtQ4GeeCwbmMtCzvNn0YZvYSvsbJWPLmaHxNOY4xeet65SKMWCVJmuc8MD8gWWKRE/6ItnwK0tVKRnIaF4U9ZObkYjbOhxR4Md+OA==; 24:ujZzz77u+0/wxLc6YWDv7T6yiB2eTKM1mAkge8VkVKX6ELu3tJDvx/2FA+RGE4+wlM4esGlj8bZY2FwsdhcgNrozgbdM9V2gOKaHV2OOtZk= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SG2PR03MB1390; x-ms-office365-filtering-correlation-id: cba896fd-871a-4b07-ad84-08d32e2bd23b x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(146161314209440); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(102415267)(102615245)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046); SRVR:SG2PR03MB1390; BCL:0; PCL:0; RULEID:; SRVR:SG2PR03MB1390; x-forefront-prvs: 0843C17679 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(377424004)(76104003)(5003600100002)(1220700001)(5002640100001)(1096002)(4001430100002)(11100500001)(5008740100001)(586003)(3846002)(102836003)(74316001)(6116002)(189998001)(107886002)(92566002)(2906002)(10400500002)(77096005)(3660700001)(5001960100002)(4326007)(5890100001)(229853001)(110136002)(19580395003)(2900100001)(19580405001)(66066001)(450100001)(3280700002)(122556002)(33656002)(40100003)(99936001)(5004730100002)(54356999)(50986999)(86362001)(87936001)(76576001)(491001)(473944003); DIR:OUT; SFP:1101; SCL:1; SRVR:SG2PR03MB1390; H:SG2PR03MB1391.apcprd03.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; MIME-Version: 1.0 X-OriginatorOrg: kpit.com X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Feb 2016 12:56:52.9969 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3539451e-b46e-4a26-a242-ff61502855c7 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SG2PR03MB1390 Hi, Please find below a simple patch which optimizes the loading of immediate value by using the clrw or clrb instruction in case a 0x00 is being loaded into the register. The patch replaces movw/mov instruction with the smaller clrw/clrb instruction. The clrw and clrb generates only 1 byte of opcode as compared to 3 or 2 bytes for movw and mov. There is a total of about 94 bytes code size improvement with this patch in these libgcc routines. The following routines have improved code size, ___mulsi3 : 2 bytes ___divsi3 : 20 bytes ___modsi3 : 20 bytes ___divhi3 : 10 bytes ___modhi3 : 10 bytes ___parityqi_internal : 2 bytes __int_cmpsf : 2 bytes ___fixsfsi : 5 bytes ___fixunssfsi : 2 bytes ___floatsisf : 6 bytes _int_unpack_sf : 1 bytes ___addsf3 : 5 bytes __rl78_int_pack_a_r8 : 2 bytes ___mulsf3 : 2 bytes ___divsf3 : 3 bytes __gcc_bcmp : 2 bytes I have also attached a draft version of a similar patch (rl78_libgcc_optimize_draft.patch), which goes further and removes movw immediate to other saddr registers and replaces them with 2 instructions, i.e. START_FUNC ___modhi3 ;; r8 = 4[sp] % 6[sp] - movw de, #0 + clrw ax + movw de,ax mov a, [sp+5] This patch improves code size by 1 byte for each such substitution, however does add an extra clock cycle. We may consider this patch in case we are purely looking for code size improvement, assuming the libraries are built with -Os. This shows a total of 134 bytes improvement in code size. Patch1: rl78_libgcc_optimize_clrw.patch - 94 bytes improvement in code size. Patch2: rl78_libgcc_optimize_draft.patch - 134 bytes improvement in code size. Kindly review this patch and let me know what you think. This is regression tested for rl78 -msim. Best Regards, Kaushik p.s. Kindly ignore any disclaimers at end of this e-mail as they are auto-inserted. Apologies for the same. 2016-02-05 Kaushik Phatak * config/rl78/bit-count.S: Use clrw/clrb where possible. * config/rl78/cmpsi2.S: Likewise. * config/rl78/divmodhi.S Likewise. * config/rl78/divmodsi.S Likewise. * config/rl78/fpbit-sf.S Likewise. * config/rl78/fpmath-sf.S Likewise. * config/rl78/mulsi3.S Likewise. Index: libgcc/config/rl78/bit-count.S =================================================================== --- libgcc/config/rl78/bit-count.S (revision 3174) +++ libgcc/config/rl78/bit-count.S (working copy) @@ -139,7 +139,7 @@ xor1 cy, a.5 xor1 cy, a.6 xor1 cy, a.7 - movw ax, #0 + clrw ax bnc $1f incw ax 1: @@ -190,7 +190,7 @@ movw ax, sp addw ax, #4 movw hl, ax - mov a, #0 + clrb a 1: xch a, b mov a, [hl] @@ -207,7 +207,7 @@ bnz $1b mov x, a - mov a, #0 + clrb a movw r8, ax ret END_FUNC ___popcountqi_internal Index: libgcc/config/rl78/cmpsi2.S =================================================================== --- libgcc/config/rl78/cmpsi2.S (revision 3174) +++ libgcc/config/rl78/cmpsi2.S (working copy) @@ -162,8 +162,8 @@ ;; They differ. Subtract *S2 from *S1 and return as the result. mov x, a - mov a, #0 - mov r9, #0 + clrb a + clrb r9 subw ax, r8 1: movw r8, ax Index: libgcc/config/rl78/divmodhi.S =================================================================== --- libgcc/config/rl78/divmodhi.S (revision 3174) +++ libgcc/config/rl78/divmodhi.S (working copy) @@ -576,7 +576,7 @@ .macro NEG_AX movw hl, ax - movw ax, #0 + clrw ax subw ax, [hl] movw [hl], ax .endm Index: libgcc/config/rl78/divmodsi.S =================================================================== --- libgcc/config/rl78/divmodsi.S (revision 3174) +++ libgcc/config/rl78/divmodsi.S (working copy) @@ -952,10 +952,10 @@ .macro NEG_AX movw hl, ax - movw ax, #0 + clrw ax subw ax, [hl] movw [hl], ax - movw ax, #0 + clrw ax sknc decw ax subw ax, [hl+2] Index: libgcc/config/rl78/fpbit-sf.S =================================================================== --- libgcc/config/rl78/fpbit-sf.S (revision 3174) +++ libgcc/config/rl78/fpbit-sf.S (working copy) @@ -117,7 +117,7 @@ call $!__int_iszero bnz $2f ;; At this point, both args are zero. - mov a, #0 + clrb a ret 2: @@ -151,7 +151,7 @@ bc $ybig_cmpsf ; branch if X < Y bnz $xbig_cmpsf ; branch if X > Y - mov a, #0 + clrb a ret xbig_cmpsf: ; |X| > |Y| so return A = 1 if pos, 0xff if neg @@ -285,7 +285,7 @@ movw r10, #0x7fff ret ;; -inf -2: mov r8, #0 +2: clrb r8 mov r10, #0x8000 ret @@ -302,10 +302,10 @@ clr1 a.7 call $!__int_fixunssfsi - movw ax, #0 + clrw ax subw ax, r8 movw r8, ax - movw ax, #0 + clrw ax sknc decw ax subw ax, r10 @@ -410,7 +410,7 @@ set1 a.7 ;; Clear B:C:R12:R13 - movw bc, #0 + clrw bc movw r12, #0 ;; Shift bits from the mantissa (A:X:R10) into (B:C:R12:R13), @@ -482,10 +482,10 @@ ;; If negative convert to positive ... movw hl, ax - movw ax, #0 + clrw ax subw ax, bc movw bc, ax - movw ax, #0 + clrw ax sknc decw ax subw ax, hl @@ -533,7 +533,7 @@ bnz $1f movw ax, bc cmpw ax, #0 - movw ax, #0 + clrw ax bnz $1f ;; Return 0.0 Index: libgcc/config/rl78/fpmath-sf.S =================================================================== --- libgcc/config/rl78/fpmath-sf.S (revision 3174) +++ libgcc/config/rl78/fpmath-sf.S (working copy) @@ -87,7 +87,7 @@ or a, #0x80 mov A_FRAC_H, a - mov a, #0 + clrb a mov A_FRAC_HH, a ;; rounding-bit-shift @@ -273,7 +273,7 @@ ;; "zero out" b movw ax, A_EXP movw B_EXP, ax - movw ax, #0 + clrw ax movw B_FRAC_L, ax movw B_FRAC_H, ax br $5f @@ -281,7 +281,7 @@ ;; "zero out" a movw ax, B_EXP movw A_EXP, ax - movw ax, #0 + clrw ax movw A_FRAC_L, ax movw A_FRAC_H, ax @@ -379,7 +379,7 @@ bt a.7, $.L706 ;; subtraction was positive - mov a, #0 + clrb a mov A_SIGN, a br $.L712 @@ -543,7 +543,7 @@ or a, A_FRAC_H or a, A_FRAC_HH bnz $1f - movw ax, #0 + clrw ax movw A_EXP, ax 1: mov a, A_FRAC_H @@ -682,7 +682,7 @@ movw ax, B_FRAC_H movw [sp+10], ax - movw ax, #0 + clrw ax movw [sp+4], ax movw [sp+6], ax movw [sp+12], ax @@ -867,7 +867,7 @@ and a, #0x80 mov r11, a movw r8, #0 - mov r10, #0 + clrb r10 ret 1: @@ -930,7 +930,7 @@ movw ax, B_FRAC_H movw [sp+10], ax - movw ax, #0 + clrw ax movw [sp+0], ax movw [sp+2], ax movw [sp+12], ax Index: libgcc/config/rl78/mulsi3.S =================================================================== --- libgcc/config/rl78/mulsi3.S (revision 3174) +++ libgcc/config/rl78/mulsi3.S (working copy) @@ -148,7 +148,7 @@ movw ax, bc .Lmul_hisi_top: - movw bc, #0 + clrw bc .Lmul_hisi_loop: shrw ax, 1