From patchwork Tue Aug 13 20:57:48 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kenneth Zadeck <zadeck@naturalbridge.com>
X-Patchwork-Id: 266919
Return-Path: 
 <gcc-patches-return-346777-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified))
	by ozlabs.org (Postfix) with ESMTPS id D896C2C011F
	for <incoming@patchwork.ozlabs.org>;
	Wed, 14 Aug 2013 06:58:01 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:content-type
	:content-transfer-encoding; q=dns; s=default; b=N8zaVPDY17K9r2l5
	ibFDgrBvuJFDJxu1w5QFiFBtm7Y5qQnXWVuRaHVHquKuaGlBM9IDId4n3mycwd/y
	oM9mxSxydBjoMnGJq1p8s2tiZ4tYIItvkhwtzMENDII0y+FJCXJ78e5qB2FPVeWY
	YJ0Pe+ni/y8k1XAl1x8NAkDUzMc=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:content-type
	:content-transfer-encoding; s=default; bh=yLva8A9CPboPhO8drBIgD9
	DZpeQ=; b=fTY0a9O+ZK5KgijQ00muRWyBfnfUDV5BT3T9OllCivr3eVfUXb8ABc
	uehLaChdVen/8JJUCPnHOW8z0w+HVHKBqXQfrhuaroc22+AGFNd3/VMIZu5acCPK
	8islqvBajgfCBXh8qEcbAKmmLTrEeVC/r+EMov89tasY/sHpusdm0=
Received: (qmail 31942 invoked by alias); 13 Aug 2013 20:57:55 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 31928 invoked by uid 89); 13 Aug 2013 20:57:55 -0000
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_NONE,
	RCVD_IN_HOSTKARMA_YE autolearn=ham version=3.3.2
Received: from mail-pd0-f171.google.com (HELO mail-pd0-f171.google.com)
	(209.85.192.171) by sourceware.org
	(qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP;
	Tue, 13 Aug 2013 20:57:53 +0000
Received: by mail-pd0-f171.google.com with SMTP id g10so5426984pdj.2 for
	<gcc-patches@gcc.gnu.org>; Tue, 13 Aug 2013 13:57:51 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=20120113;
	h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
	:subject:content-type:content-transfer-encoding;
	bh=ITp4yLdVxJhWvdtxFOEUvB7xr0UqDjQwNuAXXdRth3A=;
	b=Ygg+0Lu9f+jDbOjoosenD0MFgeAWKERcfB/mDZLOfk6A3eAj0RSUAYXvj83HYdSROx
	i7OYKI+VqGBPHvB0Ude6Z0AYevLLFW3ThU9q1T4NjRnCkBxWv3xrHFjGvpexTN/AHjLA
	UHGk3uar9aY/VtzlyeedCs491pgA9UY7u6B44JoFdZ6YmEq8lmrDyxt1RBLr3xvCCKGB
	On9RStWmlT7JB85ftVCxeghzULN3GvcP69t4nqCffag//gXDbL7KWLuUlGKEFNUPlJBq
	cR88WGQb3VpVi1Esj8PllOl/UWvBvgdJhswYDWNx5Go9KkSK0xQIFRw4Kxlw9i7aB/Tr
	PtTg==
X-Gm-Message-State: 
 ALoCoQkI96HihFhcunQmy9NM8JFQSC27e8Yhs6eJ6aTbuVfS4zGCowL2Cu/Bft+1iNve5CZHtp2+
X-Received: by 10.68.255.69 with SMTP id ao5mr6320004pbd.66.1376427471434;
	Tue, 13 Aug 2013 13:57:51 -0700 (PDT)
Received: from moria.site (pool-98-113-157-218.nycmny.fios.verizon.net.
	[98.113.157.218]) by mx.google.com with ESMTPSA id
	il4sm45881186pbb.36.2013.08.13.13.57.49 for <multiple
	recipients> (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA
	bits=128/128); Tue, 13 Aug 2013 13:57:50 -0700 (PDT)
Message-ID: <520A9DCC.6080609@naturalbridge.com>
Date: Tue, 13 Aug 2013 16:57:48 -0400
From: Kenneth Zadeck <zadeck@naturalbridge.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: rguenther@suse.de, gcc-patches <gcc-patches@gcc.gnu.org>,
	Mike Stump <mikestump@comcast.net>, r.sandiford@uk.ibm.com
Subject: wide-int branch now up for public comment and review

Richi and everyone else who may be interested,

Congrats on your first child.  They are a lot of fun, but are very
high maintenence.

Today we put up the wide-int branch for all to see and play with. See

svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int

At this point, we have completed testing it on x86-64.  Not only is it
regression free, but for code that uses only 64 bit or smaller data
types, it produces identical machine language (if a couple of changes
are made to the truck - see the patch below).  We are currently
working on the PPC and expect to get this platform to the same
position very soon.

 From a high level view, the branch looks somewhat closer to what you
asked for than I would have expected.  There are now three
implementations of wide-int as a template.  The default is the one you
saw before and takes its precision from the input mode or type. There
are two other template instances which have fixed precisions that are
defined to be large enough to be assumed to be infinite (just like
your favorite use of double-int).  Both are used in places where there
is not the notion of precision correctness of the operands. One is
used for all addressing arithmetic and the other is used mostly in the
vectorizer and loop optimizations.  The bottom line is that both a
finite and infinite precision model are really necessary in the
current structure of GCC.

The two infinite precision classes are not exactly the storage classes
that you proposed because they are implemented using the same storage
model as the default template but they do provide a different view of
the math which I assume was your primary concern.  You may also decide
that there is not reason to have a separate class for the addressing
arithmetic since they work substantially the same way.  We did it so
that we have the option in the future to allow the two reps to
diverge.

The one place where I can see changing which template is used is in
tree-ssa-ccp.  This is the only one of the many GCC constant
propagator that does not use the default template.  I did not convert
this pass to use the default template because, for testing purposes
(at your suggestion), we did tried to minimize the improvements so
that we get the same code out with wide-int.  When I convert it to use
the default template, the pass will run slightly faster and will find
slightly more constants: both very desirable features, but not in the
context of getting this large patch into GCC.

As I said earlier, we get the same code as long as the program uses
only 64 bit or smaller types.  For code that uses larger types, we do
not.  The problem actually stems from one of the assertions that you
made when we were arguing about fixed vs infinite precision.  You had
said that a lot of the code depended on double ints behaving like
infinite precision.  You were right!!!  However, what this really
meant is that when that code was subjected to at 128 bit type, it just
produced bogus results!!!!  All of this has been fixed now on the
branch.  The code that uses the default template works within it's
precision.  The code that uses one of the infinite precision templates
can be guaranteed that there is always enough head room because we
sniff out the largest mode on the target and multiply that by 4.
However, the net result is that programs that use 128 bit types get
better code out that is more likely to be correct.

The vast majority of the patch falls into two types of code:

1) The 4 files that hold the wide-int code itself.  You have seen a
    lot of this code before except for the infinite precision
    templates.  Also the classes are more C++ than C in their flavor.
    In particular, the integration with trees is very tight in that an
    int-cst or regular integers can be the operands of any wide-int
    operation.

2) The code that encapsulates the representation of a TREE_INT_CST.
    For the latter, I introduced a series of abstractions to hide the
    access so that I could change the representation of TREE_INT_CST
    away from having exactly two HWIs.  I do not really like these
    abstractions, but the good news is that most of them can/will go
    away after this branch is integrated into the trunk.  These
    abstractions allow the code to do the same function, without
    exposing the change in the data structures.  However, they preserve
    the fact that for the most part, the middle end of the compiler
    tries to do no optimization on anything larger than a single HWI.
    But this preserves the basic behavior of the compiler which is what
    you asked us to do.

    The abstractions that I have put in to hide the rep of TREE_INT_CST 
are:

    host_integerp (x, 1) -> tree_fits_uhwi_p (x)
    host_integerp (x, 0) -> tree_fits_shwi_p (x)
    host_integerp (x, TYPE_UNSIGNED (y)) -> tree_fits_hwi_p (x, 
TYPE_SIGN (y))
    host_integerp (x, TYPE_UNSIGNED (x)) -> tree_fits_hwi_p (x)


    TREE_INT_CST_HIGH (x) == 0 || TREE_INT_CST_HIGH (value) == -1 -> 
cst_fits_shwi_p (x)
    TREE_INT_CST_HIGH (x) + (tree_int_cst_sgn (x) < 0) -> 
cst_fits_shwi_p (x)
    cst_and_fits_in_hwi (x) -> cst_fits_shwi_p (x)

    TREE_INT_CST_HIGH (x) == 0) -> cst_fits_uhwi_p (x)

    tree_low_cst (x, 1) ->  tree_to_uhwi (x)
    tree_low_cst (x, 0) ->  tree_to_shwi (x)
    TREE_INT_CST_LOW (x) -> to either tree_to_uhwi (x), tree_to_shwi (x) 
or tree_to_hwi (x)

    Code that used the TREE_INT_CST_HIGH in ways beyond checking to see
    if contained 0 or -1 was converted directly to wide-int.


You had proposed that one of the ways that we should/could test the
non single HWI paths in wide-int was to change the size of the element
of the array used to represent value in wide-int.   I believe that
there are better ways to do this testing.   For one, the infinite
precision templates do not use the fast pathway anyway because
currently those pathways are only triggered for precisions that fit in
a single HWI.   (There is the possibility that some of the infinite
precision functions could use this fast path, but they currently do
not.)   However, what we are planning to do when the ppc gets stable
is to build a 64 bit compiler for the x86 that uses a 32 bit HWI.
This is no longer a supported path, but fixing the bugs on it would
shake out the remaining places where the compiler (as well as the
wide-int code) gets the wrong answer for larger types.

The code still has our tracing in it.   We will remove it before the
branch is committed, but for large scale debugging, we find this
very useful.

I am not going to close with the typical "ok to commit?" closing
because I know you will have a lot to say.   But I do think that you
will find that this is a lot closer to what you envisioned than what
you saw before.

kenny

=====================================

The two patches for the truck below are necessary to get identical
code between the wide-int branch and the truck.   The first patch has
been submitted for review and fixes a bug.   The second patch will not
be submitted as it is just for compatibility.   The second patch
slightly changes the hash function that the rtl gcse passes use. Code
is modified based on the traversal of a hash function, so if the hash
functions are not identical, the code is slightly different between
the two branches.


=====================================
diff --git a/gcc/expr.c b/gcc/expr.c
index 923f59b..f5744b0 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4815,7 +4815,8 @@ expand_assignment (tree to, tree from, bool 
nontemporal)
                    bitregion_start, bitregion_end,
                    mode1, from,
                    get_alias_set (to), nontemporal);
-      else if (bitpos >= mode_bitsize / 2)
+      else if (bitpos >= mode_bitsize / 2
+           && bitpos+bitsize <= mode_bitsize)
          result = store_field (XEXP (to_rtx, 1), bitsize,
                    bitpos - mode_bitsize / 2,
                    bitregion_start, bitregion_end,
@@ -4834,8 +4835,12 @@ expand_assignment (tree to, tree from, bool 
nontemporal)
          }
        else
          {
+          HOST_WIDE_INT extra = 0;
+          if (bitpos+bitsize > mode_bitsize)
+        extra = bitpos+bitsize - mode_bitsize;
            rtx temp = assign_stack_temp (GET_MODE (to_rtx),
-                        GET_MODE_SIZE (GET_MODE (to_rtx)));
+                        GET_MODE_SIZE (GET_MODE (to_rtx))
+                        + extra);
            write_complex_part (temp, XEXP (to_rtx, 0), false);
            write_complex_part (temp, XEXP (to_rtx, 1), true);
            result = store_field (temp, bitsize, bitpos,
diff --git a/gcc/rtl.def b/gcc/rtl.def
index b4ce1b9..5ed015c 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -342,6 +342,8 @@ DEF_RTL_EXPR(TRAP_IF, "trap_if", "ee", RTX_EXTRA)
  /* numeric integer constant */
  DEF_RTL_EXPR(CONST_INT, "const_int", "w", RTX_CONST_OBJ)

+DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ)
+
  /* fixed-point constant */
  DEF_RTL_EXPR(CONST_FIXED, "const_fixed", "www", RTX_CONST_OBJ)