From patchwork Tue Jan 17 19:24:57 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ulrich Weigand <uweigand@de.ibm.com>
X-Patchwork-Id: 136518
Return-Path: 
 <gcc-patches-return-311558-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 99D99B6EE8
	for <incoming@patchwork.ozlabs.org>;
	Wed, 18 Jan 2012 06:25:36 +1100 (EST)
Comment: DKIM? See http://www.dkim.org
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;
	d=gcc.gnu.org; s=default; x=1327433138; h=Comment:
	DomainKey-Signature:Received:Received:Received:Received:Received:
	Received:Received:Received:Received:Received:Subject:To:Date:
	From:Cc:In-Reply-To:MIME-Version:Message-ID:
	Content-Transfer-Encoding:Content-Type:Mailing-List:Precedence:
	List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:
	Delivered-To; bh=jhIF3IsRQNVgop868qQEsSwwHao=; b=WUJ9sy+XEZCK+Se
	tXz9+wOZ+EFccCMbZk0e2qYJTyyaKJoc7h0gTwAuJS7/5TlZ7+vDHdmJ67R3dRrA
	54J6EmJmw2HV+06u1zEjVJq6kfpMjzUATFDFw7udYuE6iAtRAPAUToKo5Gq7WQwd
	2JCaWkH1uO5v5u/VyaADfg8DCSWA=
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Received:Received:Subject:To:Date:From:Cc:In-Reply-To:MIME-Version:Message-ID:Content-Transfer-Encoding:Content-Type:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=dSzF5KX4jwGgmHSvyE0dbhN/64JEkUQ0yd3MdfxpbImTtrVwOxaeAM7nABTvi9
	n4e+m6yXKAXxApdkgXUF8Jfp12Gx/tLDRIQZieRVu88tHsP2QyXd1KyAGCh0/sJP
	TIhS6Av0lPJ8jKl/XZ231B++cUgQobMPJ9+DKjZjDdxos=;
Received: (qmail 15479 invoked by alias); 17 Jan 2012 19:25:28 -0000
Received: (qmail 15446 invoked by uid 22791); 17 Jan 2012 19:25:23 -0000
X-SWARE-Spam-Status: No, hits=-1.7 required=5.0	tests=AWL, BAYES_00, TW_FW,
	T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from e06smtp12.uk.ibm.com (HELO e06smtp12.uk.ibm.com)
	(195.75.94.108) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Tue, 17 Jan 2012 19:25:06 +0000
Received: from /spool/local	by e06smtp12.uk.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<uweigand@de.ibm.com>; Tue, 17 Jan 2012 19:25:04 -0000
Received: from d06nrmr1307.portsmouth.uk.ibm.com ([9.149.38.129])	by
	e06smtp12.uk.ibm.com ([192.168.101.142]) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	Tue, 17 Jan 2012 19:25:01 -0000
Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com
	[9.149.37.212])	by d06nrmr1307.portsmouth.uk.ibm.com
	(8.13.8/8.13.8/NCO v10.0) with ESMTP id q0HJP0mG2576452	for
	<gcc-patches@gcc.gnu.org>; Tue, 17 Jan 2012 19:25:00 GMT
Received: from d06av01.portsmouth.uk.ibm.com (loopback [127.0.0.1])	by
	d06av01.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout)
	with ESMTP id q0HJP0cc025635	for <gcc-patches@gcc.gnu.org>;
	Tue, 17 Jan 2012 12:25:00 -0700
Received: from d06ml032.portsmouth.uk.ibm.com
	(d06ml032.portsmouth.uk.ibm.com [9.149.76.137])	by
	d06av01.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin)
	with ESMTP id q0HJP0h6025625; Tue, 17 Jan 2012 12:25:00 -0700
Received: from tuxmaker.boeblingen.de.ibm.com ([9.152.85.9]) by
	d06ml032.portsmouth.uk.ibm.com (Lotus Domino Release
	8.5.2FP3) with SMTP id 2012011720245245-40590 ;
	Tue, 17 Jan 2012 20:24:52 +0100
Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation);
	Tue, 17 Jan 2012 20:24:57 +0100
Subject: Re: RFC: allowing fwprop to propagate subregs
To: kenner@vlsi1.ultra.nyu.edu (Richard Kenner)
Date: Tue, 17 Jan 2012 20:24:57 +0100 (CET)
From: Ulrich Weigand <uweigand@de.ibm.com>
Cc: bonzini@gnu.org, gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org
In-Reply-To: <11201161434.AA18637@vlsi1.ultra.nyu.edu> from "Richard Kenner"
	at Jan 16, 2012 09:34:32 AM
MIME-Version: 1.0
Message-ID: <OF5CC97609.F2965202-ONC1257988.006AA629@de.ibm.com>
x-cbid: 12011719-8372-0000-0000-0000016CFD4E
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }