From patchwork Mon Apr 14 10:13:41 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Terry Guo X-Patchwork-Id: 338895 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4349C14007D for ; Mon, 14 Apr 2014 20:13:54 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=ETXNhlB62U9Nxm+U5V iA04FoBNIFiuDQOo9crQ5RuG0ew4uZQmGlHQDaMYYuFvOYoKVZA4eAdipDqNtkMx Jvw5qugnRlEYETiJOdPgnrk+doLYCODRPBrySCnhH+FL6bNTQNjMCkGGyu3Jsmgd /wotXFKNES60b7nHXiFDRIKfI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=OyN+y02WOsYYVVulOp5dAfS6 S/c=; b=gveuWgP6fED0QRiQrQ/f669ce/xikx7NRBNMTsVknNtm4LpBkfM1ACAX YLyHv3clgsyLKb1e/oQIffRWh3NZLN25BsQahDmI5mEsxPv87yEEw7EHnjpDZXDY jV8m/hYYbuTaz2IS1bfo0MfgiOcsOhdCjzMiViAjLbej1HJWC+w= Received: (qmail 27828 invoked by alias); 14 Apr 2014 10:13:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27815 invoked by uid 89); 14 Apr 2014 10:13:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ve0-f177.google.com Received: from mail-ve0-f177.google.com (HELO mail-ve0-f177.google.com) (209.85.128.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 14 Apr 2014 10:13:44 +0000 Received: by mail-ve0-f177.google.com with SMTP id sa20so7067636veb.22 for ; Mon, 14 Apr 2014 03:13:42 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.220.147.16 with SMTP id j16mr36081873vcv.14.1397470421909; Mon, 14 Apr 2014 03:13:41 -0700 (PDT) Received: by 10.221.50.202 with HTTP; Mon, 14 Apr 2014 03:13:41 -0700 (PDT) In-Reply-To: <5139634.FXfFVqhjGC@polaris> References: <000001cf4a5a$0cc1c010$26454030$@arm.com> <5139634.FXfFVqhjGC@polaris> Date: Mon, 14 Apr 2014 18:13:41 +0800 Message-ID: Subject: Re: [Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation From: Terry Guo To: Eric Botcazou Cc: gcc-patches@gcc.gnu.org X-IsSubscribed: yes On Thu, Apr 3, 2014 at 10:11 PM, Eric Botcazou wrote: >> I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI >> (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to zero >> during the fwprop1 pass, considering the fact that the high 32-bit part of >> (a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications >> for high 32-bit part of the result of AND operation. The attached patch is >> trying to improve simplify_rtx to handle such case. Other target like x86 >> seems hasn't such issue because it generates different RTX to handle 64bit >> multiplication on a 32bit machine. > > See http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00073.html for another try, > which led to the simplification in combine.c:combine_simplify_rtx line 5448. > > Your variant is both more general, because it isn't restricted to the lowpart, > and less general, because it is artificially restricted to AND. > > Some remarks: > - this needs to be restricted to non-paradoxical subregs, > - you need to test HWI_COMPUTABLE_MODE_P (innermode), > - you need to test !side_effects_p (op). > > I think we need to find a common ground between Jakub's patch and yours and > put a single transformation in simplify_subreg. > > -- > Eric Botcazou Thanks for your review. Even without Jakub's patch, the combine pass can simplify such subreg to zero. But the live range of destination register causes combine pass to undo all attempts. Here is detailed explanation, please note that the AND operation (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) from fwprop1 pass is turned into ZERO_EXTEND operation (zero_extend:DI (reg:SI 0 r0 [ a ])) in combine pass. The variable a is function arguments and occupies register r0 and r1. Right before try_combine function we have below three instructions: (insn 8 4 10 2 (set (reg:DI 118 [ D.4091 ]) (zero_extend:DI (reg:SI 0 r0 [ a ]))) 000.c:4 170 {zero_extendsidi2} (expr_list:REG_DEAD (reg:SI 0 r0) (nil))) (insn 10 8 13 2 (set (reg:SI 121) (mult:SI (reg/v:SI 116 [ b ]) (subreg:SI (reg:DI 118 [ D.4091 ]) 4))) 000.c:4 41 {*arm_mulsi3_v6} (nil)) (insn 13 10 14 2 (set (reg:DI 117 [ D.4091 ]) (mult:DI (zero_extend:DI (subreg:SI (reg:DI 118 [ D.4091 ]) 0)) (zero_extend:DI (reg/v:SI 116 [ b ])))) 000.c:4 60 {*umulsidi3_v6} (expr_list:REG_DEAD (reg:DI 118 [ D.4091 ]) (expr_list:REG_DEAD (reg/v:SI 116 [ b ]) (nil)))) Without any changes, current gcc can simplify the insn 10 to (insn 10 8 13 2 (set (reg:SI 121) (const_int 0 [0])) 000.c:4 41 {*arm_mulsi3_v6} (nil)) This is what we wanted. Unfortunately the live range of register (reg:DI 118) is extended to insn 13. Thus unable to remove insn 8. The combine pass has to generate PARALLEL rtx to handle such case: (parallel [ (set (reg:SI 121) (const_int 0 [0])) (set (reg:DI 118 [ D.4091 ]) (zero_extend:DI (reg:SI 0 r0 [ a ]))) ]) There is no instruction patter to match this parallel rtx. This causes combine pass to undo all attempts and roll back simplification to subreg operand. Without such live range extension, everything works fine. That's why such optimization can only happen in fwprop1 pass. I made a typo in my previous changelog. My code are indeed for simplify_subreg function. I updated my patch per your suggestions to handle more general cases. Please review again. Thanks. BR, Terry diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 04af01e..1949edc 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -6099,6 +6099,22 @@ simplify_subreg (enum machine_mode outermode, rtx op, return CONST0_RTX (outermode); } + /* If we can make sure the required bits of subreg are zero + in subreg operand, we can return a zero. Here is an example: + (subreg:SI (and:DI (reg:DI X) (const_int 0xFFFFFFFF)) 4). */ + if (HWI_COMPUTABLE_MODE_P (innermode) + && SCALAR_INT_MODE_P (innermode) + && GET_MODE_SIZE (innermode) > GET_MODE_SIZE (outermode) + && !side_effects_p (op)) + { + unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); + unsigned HOST_WIDE_INT nzmask = nonzero_bits (op, innermode); + unsigned HOST_WIDE_INT smask = GET_MODE_MASK (outermode); + + if (((smask << bitpos) & nzmask) == 0) + return CONST0_RTX (outermode); + } + if (SCALAR_INT_MODE_P (outermode) && SCALAR_INT_MODE_P (innermode) && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)