From patchwork Tue Aug 13 20:57:48 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kenneth Zadeck X-Patchwork-Id: 266919 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id D896C2C011F for ; Wed, 14 Aug 2013 06:58:01 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type :content-transfer-encoding; q=dns; s=default; b=N8zaVPDY17K9r2l5 ibFDgrBvuJFDJxu1w5QFiFBtm7Y5qQnXWVuRaHVHquKuaGlBM9IDId4n3mycwd/y oM9mxSxydBjoMnGJq1p8s2tiZ4tYIItvkhwtzMENDII0y+FJCXJ78e5qB2FPVeWY YJ0Pe+ni/y8k1XAl1x8NAkDUzMc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type :content-transfer-encoding; s=default; bh=yLva8A9CPboPhO8drBIgD9 DZpeQ=; b=fTY0a9O+ZK5KgijQ00muRWyBfnfUDV5BT3T9OllCivr3eVfUXb8ABc uehLaChdVen/8JJUCPnHOW8z0w+HVHKBqXQfrhuaroc22+AGFNd3/VMIZu5acCPK 8islqvBajgfCBXh8qEcbAKmmLTrEeVC/r+EMov89tasY/sHpusdm0= Received: (qmail 31942 invoked by alias); 13 Aug 2013 20:57:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31928 invoked by uid 89); 13 Aug 2013 20:57:55 -0000 X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, RCVD_IN_HOSTKARMA_YE autolearn=ham version=3.3.2 Received: from mail-pd0-f171.google.com (HELO mail-pd0-f171.google.com) (209.85.192.171) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 13 Aug 2013 20:57:53 +0000 Received: by mail-pd0-f171.google.com with SMTP id g10so5426984pdj.2 for ; Tue, 13 Aug 2013 13:57:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:content-type:content-transfer-encoding; bh=ITp4yLdVxJhWvdtxFOEUvB7xr0UqDjQwNuAXXdRth3A=; b=Ygg+0Lu9f+jDbOjoosenD0MFgeAWKERcfB/mDZLOfk6A3eAj0RSUAYXvj83HYdSROx i7OYKI+VqGBPHvB0Ude6Z0AYevLLFW3ThU9q1T4NjRnCkBxWv3xrHFjGvpexTN/AHjLA UHGk3uar9aY/VtzlyeedCs491pgA9UY7u6B44JoFdZ6YmEq8lmrDyxt1RBLr3xvCCKGB On9RStWmlT7JB85ftVCxeghzULN3GvcP69t4nqCffag//gXDbL7KWLuUlGKEFNUPlJBq cR88WGQb3VpVi1Esj8PllOl/UWvBvgdJhswYDWNx5Go9KkSK0xQIFRw4Kxlw9i7aB/Tr PtTg== X-Gm-Message-State: ALoCoQkI96HihFhcunQmy9NM8JFQSC27e8Yhs6eJ6aTbuVfS4zGCowL2Cu/Bft+1iNve5CZHtp2+ X-Received: by 10.68.255.69 with SMTP id ao5mr6320004pbd.66.1376427471434; Tue, 13 Aug 2013 13:57:51 -0700 (PDT) Received: from moria.site (pool-98-113-157-218.nycmny.fios.verizon.net. [98.113.157.218]) by mx.google.com with ESMTPSA id il4sm45881186pbb.36.2013.08.13.13.57.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 13 Aug 2013 13:57:50 -0700 (PDT) Message-ID: <520A9DCC.6080609@naturalbridge.com> Date: Tue, 13 Aug 2013 16:57:48 -0400 From: Kenneth Zadeck User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: rguenther@suse.de, gcc-patches , Mike Stump , r.sandiford@uk.ibm.com Subject: wide-int branch now up for public comment and review Richi and everyone else who may be interested, Congrats on your first child. They are a lot of fun, but are very high maintenence. Today we put up the wide-int branch for all to see and play with. See svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int At this point, we have completed testing it on x86-64. Not only is it regression free, but for code that uses only 64 bit or smaller data types, it produces identical machine language (if a couple of changes are made to the truck - see the patch below). We are currently working on the PPC and expect to get this platform to the same position very soon. From a high level view, the branch looks somewhat closer to what you asked for than I would have expected. There are now three implementations of wide-int as a template. The default is the one you saw before and takes its precision from the input mode or type. There are two other template instances which have fixed precisions that are defined to be large enough to be assumed to be infinite (just like your favorite use of double-int). Both are used in places where there is not the notion of precision correctness of the operands. One is used for all addressing arithmetic and the other is used mostly in the vectorizer and loop optimizations. The bottom line is that both a finite and infinite precision model are really necessary in the current structure of GCC. The two infinite precision classes are not exactly the storage classes that you proposed because they are implemented using the same storage model as the default template but they do provide a different view of the math which I assume was your primary concern. You may also decide that there is not reason to have a separate class for the addressing arithmetic since they work substantially the same way. We did it so that we have the option in the future to allow the two reps to diverge. The one place where I can see changing which template is used is in tree-ssa-ccp. This is the only one of the many GCC constant propagator that does not use the default template. I did not convert this pass to use the default template because, for testing purposes (at your suggestion), we did tried to minimize the improvements so that we get the same code out with wide-int. When I convert it to use the default template, the pass will run slightly faster and will find slightly more constants: both very desirable features, but not in the context of getting this large patch into GCC. As I said earlier, we get the same code as long as the program uses only 64 bit or smaller types. For code that uses larger types, we do not. The problem actually stems from one of the assertions that you made when we were arguing about fixed vs infinite precision. You had said that a lot of the code depended on double ints behaving like infinite precision. You were right!!! However, what this really meant is that when that code was subjected to at 128 bit type, it just produced bogus results!!!! All of this has been fixed now on the branch. The code that uses the default template works within it's precision. The code that uses one of the infinite precision templates can be guaranteed that there is always enough head room because we sniff out the largest mode on the target and multiply that by 4. However, the net result is that programs that use 128 bit types get better code out that is more likely to be correct. The vast majority of the patch falls into two types of code: 1) The 4 files that hold the wide-int code itself. You have seen a lot of this code before except for the infinite precision templates. Also the classes are more C++ than C in their flavor. In particular, the integration with trees is very tight in that an int-cst or regular integers can be the operands of any wide-int operation. 2) The code that encapsulates the representation of a TREE_INT_CST. For the latter, I introduced a series of abstractions to hide the access so that I could change the representation of TREE_INT_CST away from having exactly two HWIs. I do not really like these abstractions, but the good news is that most of them can/will go away after this branch is integrated into the trunk. These abstractions allow the code to do the same function, without exposing the change in the data structures. However, they preserve the fact that for the most part, the middle end of the compiler tries to do no optimization on anything larger than a single HWI. But this preserves the basic behavior of the compiler which is what you asked us to do. The abstractions that I have put in to hide the rep of TREE_INT_CST are: host_integerp (x, 1) -> tree_fits_uhwi_p (x) host_integerp (x, 0) -> tree_fits_shwi_p (x) host_integerp (x, TYPE_UNSIGNED (y)) -> tree_fits_hwi_p (x, TYPE_SIGN (y)) host_integerp (x, TYPE_UNSIGNED (x)) -> tree_fits_hwi_p (x) TREE_INT_CST_HIGH (x) == 0 || TREE_INT_CST_HIGH (value) == -1 -> cst_fits_shwi_p (x) TREE_INT_CST_HIGH (x) + (tree_int_cst_sgn (x) < 0) -> cst_fits_shwi_p (x) cst_and_fits_in_hwi (x) -> cst_fits_shwi_p (x) TREE_INT_CST_HIGH (x) == 0) -> cst_fits_uhwi_p (x) tree_low_cst (x, 1) -> tree_to_uhwi (x) tree_low_cst (x, 0) -> tree_to_shwi (x) TREE_INT_CST_LOW (x) -> to either tree_to_uhwi (x), tree_to_shwi (x) or tree_to_hwi (x) Code that used the TREE_INT_CST_HIGH in ways beyond checking to see if contained 0 or -1 was converted directly to wide-int. You had proposed that one of the ways that we should/could test the non single HWI paths in wide-int was to change the size of the element of the array used to represent value in wide-int. I believe that there are better ways to do this testing. For one, the infinite precision templates do not use the fast pathway anyway because currently those pathways are only triggered for precisions that fit in a single HWI. (There is the possibility that some of the infinite precision functions could use this fast path, but they currently do not.) However, what we are planning to do when the ppc gets stable is to build a 64 bit compiler for the x86 that uses a 32 bit HWI. This is no longer a supported path, but fixing the bugs on it would shake out the remaining places where the compiler (as well as the wide-int code) gets the wrong answer for larger types. The code still has our tracing in it. We will remove it before the branch is committed, but for large scale debugging, we find this very useful. I am not going to close with the typical "ok to commit?" closing because I know you will have a lot to say. But I do think that you will find that this is a lot closer to what you envisioned than what you saw before. kenny ===================================== The two patches for the truck below are necessary to get identical code between the wide-int branch and the truck. The first patch has been submitted for review and fixes a bug. The second patch will not be submitted as it is just for compatibility. The second patch slightly changes the hash function that the rtl gcse passes use. Code is modified based on the traversal of a hash function, so if the hash functions are not identical, the code is slightly different between the two branches. ===================================== diff --git a/gcc/expr.c b/gcc/expr.c index 923f59b..f5744b0 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -4815,7 +4815,8 @@ expand_assignment (tree to, tree from, bool nontemporal) bitregion_start, bitregion_end, mode1, from, get_alias_set (to), nontemporal); - else if (bitpos >= mode_bitsize / 2) + else if (bitpos >= mode_bitsize / 2 + && bitpos+bitsize <= mode_bitsize) result = store_field (XEXP (to_rtx, 1), bitsize, bitpos - mode_bitsize / 2, bitregion_start, bitregion_end, @@ -4834,8 +4835,12 @@ expand_assignment (tree to, tree from, bool nontemporal) } else { + HOST_WIDE_INT extra = 0; + if (bitpos+bitsize > mode_bitsize) + extra = bitpos+bitsize - mode_bitsize; rtx temp = assign_stack_temp (GET_MODE (to_rtx), - GET_MODE_SIZE (GET_MODE (to_rtx))); + GET_MODE_SIZE (GET_MODE (to_rtx)) + + extra); write_complex_part (temp, XEXP (to_rtx, 0), false); write_complex_part (temp, XEXP (to_rtx, 1), true); result = store_field (temp, bitsize, bitpos, diff --git a/gcc/rtl.def b/gcc/rtl.def index b4ce1b9..5ed015c 100644 --- a/gcc/rtl.def +++ b/gcc/rtl.def @@ -342,6 +342,8 @@ DEF_RTL_EXPR(TRAP_IF, "trap_if", "ee", RTX_EXTRA) /* numeric integer constant */ DEF_RTL_EXPR(CONST_INT, "const_int", "w", RTX_CONST_OBJ) +DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ) + /* fixed-point constant */ DEF_RTL_EXPR(CONST_FIXED, "const_fixed", "www", RTX_CONST_OBJ)