From patchwork Tue Jan 17 19:24:57 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ulrich Weigand X-Patchwork-Id: 136518 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 99D99B6EE8 for ; Wed, 18 Jan 2012 06:25:36 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1327433138; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Received:Received:Received:Subject:To:Date: From:Cc:In-Reply-To:MIME-Version:Message-ID: Content-Transfer-Encoding:Content-Type:Mailing-List:Precedence: List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=jhIF3IsRQNVgop868qQEsSwwHao=; b=WUJ9sy+XEZCK+Se tXz9+wOZ+EFccCMbZk0e2qYJTyyaKJoc7h0gTwAuJS7/5TlZ7+vDHdmJ67R3dRrA 54J6EmJmw2HV+06u1zEjVJq6kfpMjzUATFDFw7udYuE6iAtRAPAUToKo5Gq7WQwd 2JCaWkH1uO5v5u/VyaADfg8DCSWA= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Received:Received:Subject:To:Date:From:Cc:In-Reply-To:MIME-Version:Message-ID:Content-Transfer-Encoding:Content-Type:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=dSzF5KX4jwGgmHSvyE0dbhN/64JEkUQ0yd3MdfxpbImTtrVwOxaeAM7nABTvi9 n4e+m6yXKAXxApdkgXUF8Jfp12Gx/tLDRIQZieRVu88tHsP2QyXd1KyAGCh0/sJP TIhS6Av0lPJ8jKl/XZ231B++cUgQobMPJ9+DKjZjDdxos=; Received: (qmail 15479 invoked by alias); 17 Jan 2012 19:25:28 -0000 Received: (qmail 15446 invoked by uid 22791); 17 Jan 2012 19:25:23 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL, BAYES_00, TW_FW, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e06smtp12.uk.ibm.com (HELO e06smtp12.uk.ibm.com) (195.75.94.108) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 17 Jan 2012 19:25:06 +0000 Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 17 Jan 2012 19:25:04 -0000 Received: from d06nrmr1307.portsmouth.uk.ibm.com ([9.149.38.129]) by e06smtp12.uk.ibm.com ([192.168.101.142]) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 17 Jan 2012 19:25:01 -0000 Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by d06nrmr1307.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q0HJP0mG2576452 for ; Tue, 17 Jan 2012 19:25:00 GMT Received: from d06av01.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q0HJP0cc025635 for ; Tue, 17 Jan 2012 12:25:00 -0700 Received: from d06ml032.portsmouth.uk.ibm.com (d06ml032.portsmouth.uk.ibm.com [9.149.76.137]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q0HJP0h6025625; Tue, 17 Jan 2012 12:25:00 -0700 Received: from tuxmaker.boeblingen.de.ibm.com ([9.152.85.9]) by d06ml032.portsmouth.uk.ibm.com (Lotus Domino Release 8.5.2FP3) with SMTP id 2012011720245245-40590 ; Tue, 17 Jan 2012 20:24:52 +0100 Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Tue, 17 Jan 2012 20:24:57 +0100 Subject: Re: RFC: allowing fwprop to propagate subregs To: kenner@vlsi1.ultra.nyu.edu (Richard Kenner) Date: Tue, 17 Jan 2012 20:24:57 +0100 (CET) From: Ulrich Weigand Cc: bonzini@gnu.org, gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org In-Reply-To: <11201161434.AA18637@vlsi1.ultra.nyu.edu> from "Richard Kenner" at Jan 16, 2012 09:34:32 AM MIME-Version: 1.0 Message-ID: x-cbid: 12011719-8372-0000-0000-0000016CFD4E Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Richard Kenner wrote: > > Maybe the best solution would be to remove the SUBREG case from the generic > > apply_distributive_law subroutine, and instead add a special check for the > > distributed subreg case right at the above place in simplify_set; i.e. to > > perform the inverse distribution only if it is already guaranteed that we > > will also be able to move the subreg to the LHS ... > > That could indeed work. I tried to implement that suggestion, but interestingly enough I cannot really test it since I was unable to find any single case where that SUBREG case in apply_distributive_law actually causes any difference whatsoever in generated code. As test case I used the whole of libstdc++.so on the following set of platforms: - i686-pc-linux - s390x-ibm-linux - powerpc-ibm-linux - arm-linux-gnueabi and built the compiler and libstdc++.so for each of: - current mainline - current mainline plus the first patch below - current mainline plus both patches below All three resulting object files were identical for every platform. Do you have any further suggestion of how to find a testcase (some particular source code and/or architecture)? Given the current set of results, since I do not have any way to verify whether my simplify_set changes would actually trigger correctly, I'd rather propose to just remove the SUBREG case in apply_distributive_law (i.e. only apply the first patch below). Thoughts? Thanks, Ulrich Patch A: Remove SUBREG case in apply_distributive_law Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -9238,37 +9269,6 @@ /* This is also a multiply, so it distributes over everything. */ break; - case SUBREG: - /* Non-paradoxical SUBREGs distributes over all operations, - provided the inner modes and byte offsets are the same, this - is an extraction of a low-order part, we don't convert an fp - operation to int or vice versa, this is not a vector mode, - and we would not be converting a single-word operation into a - multi-word operation. The latter test is not required, but - it prevents generating unneeded multi-word operations. Some - of the previous tests are redundant given the latter test, - but are retained because they are required for correctness. - - We produce the result slightly differently in this case. */ - - if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs)) - || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs) - || ! subreg_lowpart_p (lhs) - || (GET_MODE_CLASS (GET_MODE (lhs)) - != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs)))) - || paradoxical_subreg_p (lhs) - || VECTOR_MODE_P (GET_MODE (lhs)) - || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD - /* Result might need to be truncated. Don't change mode if - explicit truncation is needed. */ - || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x), - GET_MODE (SUBREG_REG (lhs)))) - return x; - - tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)), - SUBREG_REG (lhs), SUBREG_REG (rhs)); - return gen_lowpart (GET_MODE (x), tem); - default: return x; } Patch B: Re-implement SUBREG case specifically in simplify_set Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -6299,6 +6299,7 @@ rtx dest = SET_DEST (x); enum machine_mode mode = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest); + rtx src_subreg; rtx other_insn; rtx *cc_use; @@ -6496,6 +6497,10 @@ and X being a REG or (subreg (reg)), we may be able to convert this to (set (subreg:m2 x) (op)). + Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))), + we may be able to first distribute the subreg over op, and then apply + the above transformation. + We can always do this if M1 is narrower than M2 because that means that we only care about the low bits of the result. @@ -6504,30 +6509,56 @@ be undefined. On machine where it is defined, this transformation is safe as long as M1 and M2 have the same number of words. */ + src_subreg = NULL_RTX; if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src) - && !OBJECT_P (SUBREG_REG (src)) + && !OBJECT_P (SUBREG_REG (src))) + src_subreg = SUBREG_REG (src); + else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR + || GET_CODE (src) == AND + || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS) + { + rtx lhs = XEXP (x, 0); + rtx rhs = XEXP (x, 1); + + /* We can distribute non-paradoxical lowpart SUBREGs if the + inner modes agree. */ + if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG + && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs)) + && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs) + && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs) + /* This is safe in general only for integral modes. */ + && INTEGRAL_MODE_P (GET_MODE (lhs)) + && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs))) + /* Result might need to be truncated. Don't change mode if + explicit truncation is needed. */ + && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src), + GET_MODE (SUBREG_REG (lhs)))) + src_subreg = simplify_gen_binary (GET_CODE (src), + GET_MODE (SUBREG_REG (lhs)), + SUBREG_REG (lhs), SUBREG_REG (rhs)); + } + + if (src_subreg && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) - == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))) - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)) + == ((GET_MODE_SIZE (GET_MODE (src_subreg))) + + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) #ifndef WORD_REGISTER_OPERATIONS && (GET_MODE_SIZE (GET_MODE (src)) - < GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))) + < GET_MODE_SIZE (GET_MODE (src_subreg))) #endif #ifdef CANNOT_CHANGE_MODE_CLASS && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER && REG_CANNOT_CHANGE_MODE_P (REGNO (dest), - GET_MODE (SUBREG_REG (src)), + GET_MODE (src_subreg), GET_MODE (src))) #endif && (REG_P (dest) || (GET_CODE (dest) == SUBREG && REG_P (SUBREG_REG (dest))))) { - SUBST (SET_DEST (x), - gen_lowpart (GET_MODE (SUBREG_REG (src)), - dest)); - SUBST (SET_SRC (x), SUBREG_REG (src)); + SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest)); + SUBST (SET_SRC (x), src_subreg); src = SET_SRC (x), dest = SET_DEST (x); } -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com Richard Kenner wrote: > > Maybe the best solution would be to remove the SUBREG case from the generic > > apply_distributive_law subroutine, and instead add a special check for the > > distributed subreg case right at the above place in simplify_set; i.e. to > > perform the inverse distribution only if it is already guaranteed that we > > will also be able to move the subreg to the LHS ... > > That could indeed work. I tried to implement that suggestion, but interestingly enough I cannot really test it since I was unable to find any single case where that SUBREG case in apply_distributive_law actually causes any difference whatsoever in generated code. As test case I used the whole of libstdc++.so on the following set of platforms: - i686-pc-linux - s390x-ibm-linux - powerpc-ibm-linux - arm-linux-gnueabi and built the compiler and libstdc++.so for each of: - current mainline - current mainline plus the first patch below - current mainline plus both patches below All three resulting object files were identical for every platform. Do you have any further suggestion of how to find a testcase (some particular source code and/or architecture)? Given the current set of results, since I do not have any way to verify whether my simplify_set changes would actually trigger correctly, I'd rather propose to just remove the SUBREG case in apply_distributive_law (i.e. only apply the first patch below). Thoughts? Thanks, Ulrich Patch A: Remove SUBREG case in apply_distributive_law Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -9238,37 +9269,6 @@ /* This is also a multiply, so it distributes over everything. */ break; - case SUBREG: - /* Non-paradoxical SUBREGs distributes over all operations, - provided the inner modes and byte offsets are the same, this - is an extraction of a low-order part, we don't convert an fp - operation to int or vice versa, this is not a vector mode, - and we would not be converting a single-word operation into a - multi-word operation. The latter test is not required, but - it prevents generating unneeded multi-word operations. Some - of the previous tests are redundant given the latter test, - but are retained because they are required for correctness. - - We produce the result slightly differently in this case. */ - - if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs)) - || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs) - || ! subreg_lowpart_p (lhs) - || (GET_MODE_CLASS (GET_MODE (lhs)) - != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs)))) - || paradoxical_subreg_p (lhs) - || VECTOR_MODE_P (GET_MODE (lhs)) - || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD - /* Result might need to be truncated. Don't change mode if - explicit truncation is needed. */ - || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x), - GET_MODE (SUBREG_REG (lhs)))) - return x; - - tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)), - SUBREG_REG (lhs), SUBREG_REG (rhs)); - return gen_lowpart (GET_MODE (x), tem); - default: return x; } Patch B: Re-implement SUBREG case specifically in simplify_set Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -6299,6 +6299,7 @@ rtx dest = SET_DEST (x); enum machine_mode mode = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest); + rtx src_subreg; rtx other_insn; rtx *cc_use; @@ -6496,6 +6497,10 @@ and X being a REG or (subreg (reg)), we may be able to convert this to (set (subreg:m2 x) (op)). + Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))), + we may be able to first distribute the subreg over op, and then apply + the above transformation. + We can always do this if M1 is narrower than M2 because that means that we only care about the low bits of the result. @@ -6504,30 +6509,56 @@ be undefined. On machine where it is defined, this transformation is safe as long as M1 and M2 have the same number of words. */ + src_subreg = NULL_RTX; if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src) - && !OBJECT_P (SUBREG_REG (src)) + && !OBJECT_P (SUBREG_REG (src))) + src_subreg = SUBREG_REG (src); + else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR + || GET_CODE (src) == AND + || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS) + { + rtx lhs = XEXP (x, 0); + rtx rhs = XEXP (x, 1); + + /* We can distribute non-paradoxical lowpart SUBREGs if the + inner modes agree. */ + if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG + && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs)) + && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs) + && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs) + /* This is safe in general only for integral modes. */ + && INTEGRAL_MODE_P (GET_MODE (lhs)) + && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs))) + /* Result might need to be truncated. Don't change mode if + explicit truncation is needed. */ + && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src), + GET_MODE (SUBREG_REG (lhs)))) + src_subreg = simplify_gen_binary (GET_CODE (src), + GET_MODE (SUBREG_REG (lhs)), + SUBREG_REG (lhs), SUBREG_REG (rhs)); + } + + if (src_subreg && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) - == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))) - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)) + == ((GET_MODE_SIZE (GET_MODE (src_subreg))) + + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) #ifndef WORD_REGISTER_OPERATIONS && (GET_MODE_SIZE (GET_MODE (src)) - < GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))) + < GET_MODE_SIZE (GET_MODE (src_subreg))) #endif #ifdef CANNOT_CHANGE_MODE_CLASS && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER && REG_CANNOT_CHANGE_MODE_P (REGNO (dest), - GET_MODE (SUBREG_REG (src)), + GET_MODE (src_subreg), GET_MODE (src))) #endif && (REG_P (dest) || (GET_CODE (dest) == SUBREG && REG_P (SUBREG_REG (dest))))) { - SUBST (SET_DEST (x), - gen_lowpart (GET_MODE (SUBREG_REG (src)), - dest)); - SUBST (SET_SRC (x), SUBREG_REG (src)); + SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest)); + SUBST (SET_SRC (x), src_subreg); src = SET_SRC (x), dest = SET_DEST (x); } -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com Richard Kenner wrote: > > Maybe the best solution would be to remove the SUBREG case from the generic > > apply_distributive_law subroutine, and instead add a special check for the > > distributed subreg case right at the above place in simplify_set; i.e. to > > perform the inverse distribution only if it is already guaranteed that we > > will also be able to move the subreg to the LHS ... > > That could indeed work. I tried to implement that suggestion, but interestingly enough I cannot really test it since I was unable to find any single case where that SUBREG case in apply_distributive_law actually causes any difference whatsoever in generated code. As test case I used the whole of libstdc++.so on the following set of platforms: - i686-pc-linux - s390x-ibm-linux - powerpc-ibm-linux - arm-linux-gnueabi and built the compiler and libstdc++.so for each of: - current mainline - current mainline plus the first patch below - current mainline plus both patches below All three resulting object files were identical for every platform. Do you have any further suggestion of how to find a testcase (some particular source code and/or architecture)? Given the current set of results, since I do not have any way to verify whether my simplify_set changes would actually trigger correctly, I'd rather propose to just remove the SUBREG case in apply_distributive_law (i.e. only apply the first patch below). Thoughts? Thanks, Ulrich Patch A: Remove SUBREG case in apply_distributive_law Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -9238,37 +9269,6 @@ /* This is also a multiply, so it distributes over everything. */ break; - case SUBREG: - /* Non-paradoxical SUBREGs distributes over all operations, - provided the inner modes and byte offsets are the same, this - is an extraction of a low-order part, we don't convert an fp - operation to int or vice versa, this is not a vector mode, - and we would not be converting a single-word operation into a - multi-word operation. The latter test is not required, but - it prevents generating unneeded multi-word operations. Some - of the previous tests are redundant given the latter test, - but are retained because they are required for correctness. - - We produce the result slightly differently in this case. */ - - if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs)) - || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs) - || ! subreg_lowpart_p (lhs) - || (GET_MODE_CLASS (GET_MODE (lhs)) - != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs)))) - || paradoxical_subreg_p (lhs) - || VECTOR_MODE_P (GET_MODE (lhs)) - || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD - /* Result might need to be truncated. Don't change mode if - explicit truncation is needed. */ - || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x), - GET_MODE (SUBREG_REG (lhs)))) - return x; - - tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)), - SUBREG_REG (lhs), SUBREG_REG (rhs)); - return gen_lowpart (GET_MODE (x), tem); - default: return x; } Patch B: Re-implement SUBREG case specifically in simplify_set Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -6299,6 +6299,7 @@ rtx dest = SET_DEST (x); enum machine_mode mode = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest); + rtx src_subreg; rtx other_insn; rtx *cc_use; @@ -6496,6 +6497,10 @@ and X being a REG or (subreg (reg)), we may be able to convert this to (set (subreg:m2 x) (op)). + Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))), + we may be able to first distribute the subreg over op, and then apply + the above transformation. + We can always do this if M1 is narrower than M2 because that means that we only care about the low bits of the result. @@ -6504,30 +6509,56 @@ be undefined. On machine where it is defined, this transformation is safe as long as M1 and M2 have the same number of words. */ + src_subreg = NULL_RTX; if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src) - && !OBJECT_P (SUBREG_REG (src)) + && !OBJECT_P (SUBREG_REG (src))) + src_subreg = SUBREG_REG (src); + else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR + || GET_CODE (src) == AND + || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS) + { + rtx lhs = XEXP (x, 0); + rtx rhs = XEXP (x, 1); + + /* We can distribute non-paradoxical lowpart SUBREGs if the + inner modes agree. */ + if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG + && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs)) + && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs) + && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs) + /* This is safe in general only for integral modes. */ + && INTEGRAL_MODE_P (GET_MODE (lhs)) + && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs))) + /* Result might need to be truncated. Don't change mode if + explicit truncation is needed. */ + && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src), + GET_MODE (SUBREG_REG (lhs)))) + src_subreg = simplify_gen_binary (GET_CODE (src), + GET_MODE (SUBREG_REG (lhs)), + SUBREG_REG (lhs), SUBREG_REG (rhs)); + } + + if (src_subreg && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) - == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))) - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)) + == ((GET_MODE_SIZE (GET_MODE (src_subreg))) + + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) #ifndef WORD_REGISTER_OPERATIONS && (GET_MODE_SIZE (GET_MODE (src)) - < GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))) + < GET_MODE_SIZE (GET_MODE (src_subreg))) #endif #ifdef CANNOT_CHANGE_MODE_CLASS && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER && REG_CANNOT_CHANGE_MODE_P (REGNO (dest), - GET_MODE (SUBREG_REG (src)), + GET_MODE (src_subreg), GET_MODE (src))) #endif && (REG_P (dest) || (GET_CODE (dest) == SUBREG && REG_P (SUBREG_REG (dest))))) { - SUBST (SET_DEST (x), - gen_lowpart (GET_MODE (SUBREG_REG (src)), - dest)); - SUBST (SET_SRC (x), SUBREG_REG (src)); + SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest)); + SUBST (SET_SRC (x), src_subreg); src = SET_SRC (x), dest = SET_DEST (x); } -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com Richard Kenner wrote: > > Maybe the best solution would be to remove the SUBREG case from the generic > > apply_distributive_law subroutine, and instead add a special check for the > > distributed subreg case right at the above place in simplify_set; i.e. to > > perform the inverse distribution only if it is already guaranteed that we > > will also be able to move the subreg to the LHS ... > > That could indeed work. I tried to implement that suggestion, but interestingly enough I cannot really test it since I was unable to find any single case where that SUBREG case in apply_distributive_law actually causes any difference whatsoever in generated code. As test case I used the whole of libstdc++.so on the following set of platforms: - i686-pc-linux - s390x-ibm-linux - powerpc-ibm-linux - arm-linux-gnueabi and built the compiler and libstdc++.so for each of: - current mainline - current mainline plus the first patch below - current mainline plus both patches below All three resulting object files were identical for every platform. Do you have any further suggestion of how to find a testcase (some particular source code and/or architecture)? Given the current set of results, since I do not have any way to verify whether my simplify_set changes would actually trigger correctly, I'd rather propose to just remove the SUBREG case in apply_distributive_law (i.e. only apply the first patch below). Thoughts? Thanks, Ulrich Patch A: Remove SUBREG case in apply_distributive_law Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -9238,37 +9269,6 @@ /* This is also a multiply, so it distributes over everything. */ break; - case SUBREG: - /* Non-paradoxical SUBREGs distributes over all operations, - provided the inner modes and byte offsets are the same, this - is an extraction of a low-order part, we don't convert an fp - operation to int or vice versa, this is not a vector mode, - and we would not be converting a single-word operation into a - multi-word operation. The latter test is not required, but - it prevents generating unneeded multi-word operations. Some - of the previous tests are redundant given the latter test, - but are retained because they are required for correctness. - - We produce the result slightly differently in this case. */ - - if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs)) - || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs) - || ! subreg_lowpart_p (lhs) - || (GET_MODE_CLASS (GET_MODE (lhs)) - != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs)))) - || paradoxical_subreg_p (lhs) - || VECTOR_MODE_P (GET_MODE (lhs)) - || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD - /* Result might need to be truncated. Don't change mode if - explicit truncation is needed. */ - || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x), - GET_MODE (SUBREG_REG (lhs)))) - return x; - - tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)), - SUBREG_REG (lhs), SUBREG_REG (rhs)); - return gen_lowpart (GET_MODE (x), tem); - default: return x; } Patch B: Re-implement SUBREG case specifically in simplify_set Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 183240) +++ gcc/combine.c (working copy) @@ -6299,6 +6299,7 @@ rtx dest = SET_DEST (x); enum machine_mode mode = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest); + rtx src_subreg; rtx other_insn; rtx *cc_use; @@ -6496,6 +6497,10 @@ and X being a REG or (subreg (reg)), we may be able to convert this to (set (subreg:m2 x) (op)). + Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))), + we may be able to first distribute the subreg over op, and then apply + the above transformation. + We can always do this if M1 is narrower than M2 because that means that we only care about the low bits of the result. @@ -6504,30 +6509,56 @@ be undefined. On machine where it is defined, this transformation is safe as long as M1 and M2 have the same number of words. */ + src_subreg = NULL_RTX; if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src) - && !OBJECT_P (SUBREG_REG (src)) + && !OBJECT_P (SUBREG_REG (src))) + src_subreg = SUBREG_REG (src); + else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR + || GET_CODE (src) == AND + || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS) + { + rtx lhs = XEXP (x, 0); + rtx rhs = XEXP (x, 1); + + /* We can distribute non-paradoxical lowpart SUBREGs if the + inner modes agree. */ + if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG + && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs)) + && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs) + && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs) + /* This is safe in general only for integral modes. */ + && INTEGRAL_MODE_P (GET_MODE (lhs)) + && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs))) + /* Result might need to be truncated. Don't change mode if + explicit truncation is needed. */ + && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src), + GET_MODE (SUBREG_REG (lhs)))) + src_subreg = simplify_gen_binary (GET_CODE (src), + GET_MODE (SUBREG_REG (lhs)), + SUBREG_REG (lhs), SUBREG_REG (rhs)); + } + + if (src_subreg && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) - == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))) - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)) + == ((GET_MODE_SIZE (GET_MODE (src_subreg))) + + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) #ifndef WORD_REGISTER_OPERATIONS && (GET_MODE_SIZE (GET_MODE (src)) - < GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))) + < GET_MODE_SIZE (GET_MODE (src_subreg))) #endif #ifdef CANNOT_CHANGE_MODE_CLASS && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER && REG_CANNOT_CHANGE_MODE_P (REGNO (dest), - GET_MODE (SUBREG_REG (src)), + GET_MODE (src_subreg), GET_MODE (src))) #endif && (REG_P (dest) || (GET_CODE (dest) == SUBREG && REG_P (SUBREG_REG (dest))))) { - SUBST (SET_DEST (x), - gen_lowpart (GET_MODE (SUBREG_REG (src)), - dest)); - SUBST (SET_SRC (x), SUBREG_REG (src)); + SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest)); + SUBST (SET_SRC (x), src_subreg); src = SET_SRC (x), dest = SET_DEST (x); }