diff mbox

GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

Message ID 50926EB3.8030604@naturalbridge.com
State New
Headers show

Commit Message

Kenneth Zadeck Nov. 1, 2012, 12:44 p.m. UTC
richi,

I would like you to respond to at least point 1 of this email.   In it 
there is code from the rtl level that was written twice, once for the 
case when the size of the mode is less than the size of a HWI and once 
for the case where the size of the mode is less that 2 HWIs.

my patch changes this to one instance of the code that works no matter 
how large the data passed to it is.

you have made a specific requirement for wide int to be a template that 
can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I 
would like to know how this particular fragment is to be rewritten in 
this model?   It seems that I would have to retain the structure where 
there is one version of the code for each size that the template is 
instantiated.

I would like to point out that there are about 125 places where we have 
two copies of the code for some operation.   Many of these places are 
smaller than this, but some are larger.   There are also at least 
several hundred places where the code only was written for the 1 hwi 
case.   These are harder to find with simple greps.

I am very concerned about this particular aspect of your comments 
because it seems to doom us to write the same code over and over again.

kenny




On 10/31/2012 02:19 PM, Kenneth Zadeck wrote:
> Jakub,
>
> it is hard from all of the threads to actually distill what the real 
> issues are here.  So let me start from a clean slate and state them 
> simply.
>
> Richi has three primary objections:
>
> 1) that we can do all of this with a templated version of double-int.
> 2) that we should not be passing in a precision and bitsize into the 
> interface.
> 3) that the interface is too large.
>
> I have attached a fragment of my patch #5 to illustrate the main 
> thrust of my patches and to illustrate the usefulness to gcc right now.
>
> In the current trunk, we have code that does simplification when the 
> mode fits in an HWI and we have code that does the simplification if 
> the mode fits in two HWIs.   if the mode does not fit in two hwi's the 
> code does not do the simplification.
>
> Thus here and in a large number of other places we have two copies of 
> the code.    Richi wants there to be multiple template instantiations 
> of double-int.    This means that we are now going to have to have 3 
> copies of this code to support oi mode on a 64 bit host and 4 copies 
> on a 32 bit host.
>
> Further note that there are not as many cases for the 2*hwi in the 
> code as their are for the hwi case and in general this is true through 
> out the compiler.  (CLRSB is missing from the 2hwi case in the patch)  
> We really did not write twice the code when we stated supporting 2 
> hwi, we added about 1.5 times the code (simplify-rtx is better than 
> most of the rest of the compiler).  I am using the rtl level as an 
> example here because i have posted all of those patches, but the tree 
> level is no better.
>
> I do not want to write this code a third time and certainly not a 
> fourth time.   Just fixing all of this is quite useful now: it fills 
> in a lot of gaps in our transformations and it removes many edge case 
> crashes because ti mode really is lightly tested. However, this patch 
> becomes crucial as the world gets larger.
>
> Richi's second point is that we should be doing everything at 
> "infinite precision" and not passing in an explicit bitsize and 
> precision.   That works ok (sans the issues i raised with it in 
> tree-vpn earlier) when the largest precision on the machine fits in a 
> couple of hwis.    However, for targets that have large integers or 
> cross compilers, this becomes expensive.    The idea behind my set of 
> patches is that for the transformations that can work this way, we do 
> the math in the precision of the type or mode.   In general this means 
> that almost all of the math will be done quickly, even on targets that 
> support really big integers. For passes like tree-vrp, the math will 
> be done at some multiple of the largest type seen in the actual 
> program.    The amount of the multiple is a function of the 
> optimization, not the target or the host. Currently (on my home 
> computer) the wide-int interface allows the optimization to go 4x the 
> largest mode on the target.
>
> I can get rid of this bound at the expense of doing an alloca rather 
> than stack allocating a fixed sized structure.    However, given the 
> extremely heavy use of this interface, that does not seem like the 
> best of tradeoffs.
>
> The truth is that the vast majority of the compiler actually wants to 
> see the math done the way that it is going to be done on the machine.  
> Tree-vrp and the gimple constant prop do not.  But i have made 
> accommodations to handle both needs.    I believe that the reason that 
> double-int was never used at the rtl level is that it does not 
> actually do the math in a way that is useful to the target.
>
> Richi's third objection is that the interface is too large.   I 
> disagree.   It was designed based on the actual usage of the 
> interface.   When i found places where i was writing the same code 
> over and over again, i put it in a function as part of the 
> interface.   I later went back and optimized many of these because 
> this is a very heavily used interface.  Richi has many other 
> objections, but i have agreed to fix almost all of them, so i am not 
> going to address them here.
>
> It really will be a huge burden to have to carry these patched until 
> the next revision.  We are currently in stage 1 and i believe that the 
> minor issues that richi raises can be easily addressed.
>
> kenny

Comments

Richard Sandiford Nov. 1, 2012, 1:10 p.m. UTC | #1
Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> I would like you to respond to at least point 1 of this email.   In it 
> there is code from the rtl level that was written twice, once for the 
> case when the size of the mode is less than the size of a HWI and once 
> for the case where the size of the mode is less that 2 HWIs.
>
> my patch changes this to one instance of the code that works no matter 
> how large the data passed to it is.
>
> you have made a specific requirement for wide int to be a template that 
> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I 
> would like to know how this particular fragment is to be rewritten in 
> this model?   It seems that I would have to retain the structure where 
> there is one version of the code for each size that the template is 
> instantiated.

I think richi's argument was that wide_int should be split into two.
There should be a "bare-metal" class that just has a length and HWIs,
and the main wide_int class should be an extension on top of that
that does things to a bit precision instead.  Presumably with some
template magic so that the length (number of HWIs) is a constant for:

  typedef foo<2> double_int;

and a variable for wide_int (because in wide_int the length would be
the number of significant HWIs rather than the size of the underlying
array).  wide_int would also record the precision and apply it after
the full HWI operation.

So the wide_int class would still provide "as wide as we need" arithmetic,
as in your rtl patch.  I don't think he was objecting to that.

As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
complication without any clear use.  Especially since the number of
significant HWIs in a wide_int isn't always going to be the same for
both operands to a binary operation, and it's not clear to me whether
that should be handled in the base class or wide_int.

Richard
Kenneth Zadeck Nov. 1, 2012, 1:18 p.m. UTC | #2
On 11/01/2012 09:10 AM, Richard Sandiford wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>> I would like you to respond to at least point 1 of this email.   In it
>> there is code from the rtl level that was written twice, once for the
>> case when the size of the mode is less than the size of a HWI and once
>> for the case where the size of the mode is less that 2 HWIs.
>>
>> my patch changes this to one instance of the code that works no matter
>> how large the data passed to it is.
>>
>> you have made a specific requirement for wide int to be a template that
>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>> would like to know how this particular fragment is to be rewritten in
>> this model?   It seems that I would have to retain the structure where
>> there is one version of the code for each size that the template is
>> instantiated.
> I think richi's argument was that wide_int should be split into two.
> There should be a "bare-metal" class that just has a length and HWIs,
> and the main wide_int class should be an extension on top of that
> that does things to a bit precision instead.  Presumably with some
> template magic so that the length (number of HWIs) is a constant for:
>
>    typedef foo<2> double_int;
>
> and a variable for wide_int (because in wide_int the length would be
> the number of significant HWIs rather than the size of the underlying
> array).  wide_int would also record the precision and apply it after
> the full HWI operation.
>
> So the wide_int class would still provide "as wide as we need" arithmetic,
> as in your rtl patch.  I don't think he was objecting to that.
>
> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
> complication without any clear use.  Especially since the number of
> significant HWIs in a wide_int isn't always going to be the same for
> both operands to a binary operation, and it's not clear to me whether
> that should be handled in the base class or wide_int.
>
> Richard
There is a certain amount of surprise about all of this on my part.    I 
thought that i was doing such a great thing by looking at the specific 
port that you are building to determine how to size these data 
structures.    You would think from the response that i am getting that 
i had murdered some one.

do you think that when he gets around to reading the patch for 
simplify-rtx.c that he is going to object to this frag?
@@ -5179,13 +4815,11 @@ static rtx
  simplify_immed_subreg (enum machine_mode outermode, rtx op,
                 enum machine_mode innermode, unsigned int byte)
  {
-  /* We support up to 512-bit values (for V8DFmode).  */
    enum {
-    max_bitsize = 512,
      value_bit = 8,
      value_mask = (1 << value_bit) - 1
    };
-  unsigned char value[max_bitsize / value_bit];
+  unsigned char value [MAX_BITSIZE_MODE_ANY_MODE/value_bit];
    int value_start;
    int i;
    int elem;
Kenneth Zadeck Nov. 1, 2012, 1:24 p.m. UTC | #3
anyway richard, it does not answer the question as to what you are going 
to do with a typedef foo<2>.

the point of all of this work by me was to leave no traces of the host 
in the way the compiler works.
instantiating a specific size of the double-ints is not going to get you 
there.

kenny

On 11/01/2012 09:10 AM, Richard Sandiford wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>> I would like you to respond to at least point 1 of this email.   In it
>> there is code from the rtl level that was written twice, once for the
>> case when the size of the mode is less than the size of a HWI and once
>> for the case where the size of the mode is less that 2 HWIs.
>>
>> my patch changes this to one instance of the code that works no matter
>> how large the data passed to it is.
>>
>> you have made a specific requirement for wide int to be a template that
>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>> would like to know how this particular fragment is to be rewritten in
>> this model?   It seems that I would have to retain the structure where
>> there is one version of the code for each size that the template is
>> instantiated.
> I think richi's argument was that wide_int should be split into two.
> There should be a "bare-metal" class that just has a length and HWIs,
> and the main wide_int class should be an extension on top of that
> that does things to a bit precision instead.  Presumably with some
> template magic so that the length (number of HWIs) is a constant for:
>
>    typedef foo<2> double_int;
>
> and a variable for wide_int (because in wide_int the length would be
> the number of significant HWIs rather than the size of the underlying
> array).  wide_int would also record the precision and apply it after
> the full HWI operation.
>
> So the wide_int class would still provide "as wide as we need" arithmetic,
> as in your rtl patch.  I don't think he was objecting to that.
>
> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
> complication without any clear use.  Especially since the number of
> significant HWIs in a wide_int isn't always going to be the same for
> both operands to a binary operation, and it's not clear to me whether
> that should be handled in the base class or wide_int.
>
> Richard
Richard Sandiford Nov. 1, 2012, 3:16 p.m. UTC | #4
Richard Sandiford <rdsandiford@googlemail.com> writes:
> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
> complication without any clear use.  Especially since the number of
> significant HWIs in a wide_int isn't always going to be the same for
> both operands to a binary operation, and it's not clear to me whether
> that should be handled in the base class or wide_int.

...and the number of HWIs in the result might be different again.
Whether that's true depends on the value as well as the (HWI) size
of the operands.

Richard
Richard Biener Nov. 4, 2012, 4:54 p.m. UTC | #5
On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>> I would like you to respond to at least point 1 of this email.   In it
>> there is code from the rtl level that was written twice, once for the
>> case when the size of the mode is less than the size of a HWI and once
>> for the case where the size of the mode is less that 2 HWIs.
>>
>> my patch changes this to one instance of the code that works no matter
>> how large the data passed to it is.
>>
>> you have made a specific requirement for wide int to be a template that
>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>> would like to know how this particular fragment is to be rewritten in
>> this model?   It seems that I would have to retain the structure where
>> there is one version of the code for each size that the template is
>> instantiated.
>
> I think richi's argument was that wide_int should be split into two.
> There should be a "bare-metal" class that just has a length and HWIs,
> and the main wide_int class should be an extension on top of that
> that does things to a bit precision instead.  Presumably with some
> template magic so that the length (number of HWIs) is a constant for:
>
>   typedef foo<2> double_int;
>
> and a variable for wide_int (because in wide_int the length would be
> the number of significant HWIs rather than the size of the underlying
> array).  wide_int would also record the precision and apply it after
> the full HWI operation.
>
> So the wide_int class would still provide "as wide as we need" arithmetic,
> as in your rtl patch.  I don't think he was objecting to that.

That summarizes one part of my complaints / suggestions correctly.  In other
mails I suggested to not make it a template but a constant over object lifetime
'bitsize' (or maxlen) field.  Both suggestions likely require more thought than
I put into them.  The main reason is that with C++ you can abstract from where
wide-int information pieces are stored and thus use the arithmetic / operation
workers without copying the (source) "wide-int" objects.  Thus you should
be able to write adaptors for double-int storage, tree or RTX storage.

> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
> complication without any clear use.  Especially since the number of

Maybe the double_int typedef is without any clear use.  Properly
abstracting from the storage / information providers will save
compile-time, memory and code though.  I don't see that any thought
was spent on how to avoid excessive copying or dealing with
long(er)-lived objects and their storage needs.

> significant HWIs in a wide_int isn't always going to be the same for
> both operands to a binary operation, and it's not clear to me whether
> that should be handled in the base class or wide_int.

It certainly depends.

Richard.

> Richard
Kenneth Zadeck Nov. 5, 2012, 1:59 p.m. UTC | #6
On 11/04/2012 11:54 AM, Richard Biener wrote:
> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>> I would like you to respond to at least point 1 of this email.   In it
>>> there is code from the rtl level that was written twice, once for the
>>> case when the size of the mode is less than the size of a HWI and once
>>> for the case where the size of the mode is less that 2 HWIs.
>>>
>>> my patch changes this to one instance of the code that works no matter
>>> how large the data passed to it is.
>>>
>>> you have made a specific requirement for wide int to be a template that
>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>>> would like to know how this particular fragment is to be rewritten in
>>> this model?   It seems that I would have to retain the structure where
>>> there is one version of the code for each size that the template is
>>> instantiated.
>> I think richi's argument was that wide_int should be split into two.
>> There should be a "bare-metal" class that just has a length and HWIs,
>> and the main wide_int class should be an extension on top of that
>> that does things to a bit precision instead.  Presumably with some
>> template magic so that the length (number of HWIs) is a constant for:
>>
>>    typedef foo<2> double_int;
>>
>> and a variable for wide_int (because in wide_int the length would be
>> the number of significant HWIs rather than the size of the underlying
>> array).  wide_int would also record the precision and apply it after
>> the full HWI operation.
>>
>> So the wide_int class would still provide "as wide as we need" arithmetic,
>> as in your rtl patch.  I don't think he was objecting to that.
> That summarizes one part of my complaints / suggestions correctly.  In other
> mails I suggested to not make it a template but a constant over object lifetime
> 'bitsize' (or maxlen) field.  Both suggestions likely require more thought than
> I put into them.  The main reason is that with C++ you can abstract from where
> wide-int information pieces are stored and thus use the arithmetic / operation
> workers without copying the (source) "wide-int" objects.  Thus you should
> be able to write adaptors for double-int storage, tree or RTX storage.
We had considered something along these lines and rejected it.   I am 
not really opposed to doing something like this, but it is not an 
obvious winning idea and is likely not to be a good idea.   Here was our 
thought process:

if you abstract away the storage inside a wide int, then you should be 
able to copy a pointer to the block of data from either the rtl level 
integer constant or the tree level one into the wide int.   It is 
certainly true that making a wide_int from one of these is an extremely 
common operation and doing this would avoid those copies.

However, this causes two problems:
1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to 
make the object.   it created the base object and then it allocated the 
array.  Richard S noticed that we could just allocate one CONST_WIDE_INT 
that had the array in it.   Doing it this way saves one ggc allocation 
and one indirection when accessing the data within the CONST_WIDE_INT.   
Our plan is to use the same trick at the tree level.   So to avoid the 
copying, you seem to have to have a more expensive rep for 
CONST_WIDE_INT and INT_CST.

2) You are now stuck either ggcing the storage inside a wide_int when 
they are created as part of an expression or you have to play some game 
to represent the two different storage plans inside of wide_int.   
Clearly this is where you think that we should be going by suggesting 
that we abstract away the internal storage.   However, this comes at a 
price:   what is currently an array access in my patches would (i 
believe) become a function call.  From a performance point of view, i 
believe that this is a non starter. If you can figure out how to design 
this so that it is not a function call, i would consider this a viable 
option.

On the other side of this you are clearly correct that we are copying 
the data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs. 
    But this is why we represent data inside of the wide_ints, the 
INT_CSTs and the CONST_WIDE_INTs in a compressed form.   Even with very 
big types, which are generally rare, the constants them selves are very 
small.   So the copy operation is a loop that almost always copies one 
element, even with tree-vrp which doubles the sizes of every type.

There is the third option which is that the storage inside the wide int 
is just ggced storage.  We rejected this because of the functional 
nature of wide-ints.    There are zillions created, they can be stack 
allocated, and they last for very short periods of time.

>> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
>> complication without any clear use.  Especially since the number of
> Maybe the double_int typedef is without any clear use.  Properly
> abstracting from the storage / information providers will save
> compile-time, memory and code though.  I don't see that any thought
> was spent on how to avoid excessive copying or dealing with
> long(er)-lived objects and their storage needs.
I actually disagree.    Wide ints can use a bloated amount of storage 
because they are designed to be very short lived and very low cost 
objects that are stack allocated.   For long term storage, there is 
INT_CST at the tree level and CONST_WIDE_INT at the rtl level.  Those 
use a very compact storage model.   The copying entailed is only a small 
part of the overall performance.

Everything that you are suggesting along these lines is adding to the 
weight of a wide-int object.  You have to understand there will be many 
more wide-ints created in a normal compilation than were ever created 
with double-int.    This is because the rtl level had no object like 
this at all and at the tree level, many of the places that should have 
used double int, short cut the code and only did the transformations if 
the types fit in a HWI.

This is why we are extremely defensive about this issue.   We really did 
think a lot about it.

Kenny

>> significant HWIs in a wide_int isn't always going to be the same for
>> both operands to a binary operation, and it's not clear to me whether
>> that should be handled in the base class or wide_int.
> It certainly depends.
>
> Richard.
>
>> Richard
Kenneth Zadeck Nov. 5, 2012, 5 p.m. UTC | #7
Jakub and Richi,

At this point I have decided to that i am not going to get the rest of 
the wide-int patches into a stable enough form for this round. The 
combination of still living without power at my house and some issues 
that i hit with the front ends has made it impossible to get this 
finished by today's deadline.

I do want patches 1-7 to go in (after proper review) but i am going to 
withdraw patch 8 for this round.

patches 1-5 deal with the rtl level.   These have been extensively 
tested and "examined" with the exception of patch 4, "examined" by 
Richard Sandiford.    They clean up a lot of things at the rtl level 
that effect every port as well as fixing some outstanding regressions.

patches 6 and 7 are general cleanups at the tree level and can be 
justified as on their own without any regard to wide-int.    They have 
also been extensively tested.

I am withdrawing patch 8 because it converted tree-vpn to use wide-ints 
but the benefit of this patch really cannot be seen without the rest of 
the tree level wide-int patches.

In the next couple of days i will resubmit patches 1-7 with the patch 
rot removed and the public comments folded into them.

Kenny
Richard Biener Nov. 26, 2012, 3:03 p.m. UTC | #8
On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>
> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>
>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>>
>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>
>>>> I would like you to respond to at least point 1 of this email.   In it
>>>> there is code from the rtl level that was written twice, once for the
>>>> case when the size of the mode is less than the size of a HWI and once
>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>
>>>> my patch changes this to one instance of the code that works no matter
>>>> how large the data passed to it is.
>>>>
>>>> you have made a specific requirement for wide int to be a template that
>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>>>> would like to know how this particular fragment is to be rewritten in
>>>> this model?   It seems that I would have to retain the structure where
>>>> there is one version of the code for each size that the template is
>>>> instantiated.
>>>
>>> I think richi's argument was that wide_int should be split into two.
>>> There should be a "bare-metal" class that just has a length and HWIs,
>>> and the main wide_int class should be an extension on top of that
>>> that does things to a bit precision instead.  Presumably with some
>>> template magic so that the length (number of HWIs) is a constant for:
>>>
>>>    typedef foo<2> double_int;
>>>
>>> and a variable for wide_int (because in wide_int the length would be
>>> the number of significant HWIs rather than the size of the underlying
>>> array).  wide_int would also record the precision and apply it after
>>> the full HWI operation.
>>>
>>> So the wide_int class would still provide "as wide as we need"
>>> arithmetic,
>>> as in your rtl patch.  I don't think he was objecting to that.
>>
>> That summarizes one part of my complaints / suggestions correctly.  In
>> other
>> mails I suggested to not make it a template but a constant over object
>> lifetime
>> 'bitsize' (or maxlen) field.  Both suggestions likely require more thought
>> than
>> I put into them.  The main reason is that with C++ you can abstract from
>> where
>> wide-int information pieces are stored and thus use the arithmetic /
>> operation
>> workers without copying the (source) "wide-int" objects.  Thus you should
>> be able to write adaptors for double-int storage, tree or RTX storage.
>
> We had considered something along these lines and rejected it.   I am not
> really opposed to doing something like this, but it is not an obvious
> winning idea and is likely not to be a good idea.   Here was our thought
> process:
>
> if you abstract away the storage inside a wide int, then you should be able
> to copy a pointer to the block of data from either the rtl level integer
> constant or the tree level one into the wide int.   It is certainly true
> that making a wide_int from one of these is an extremely common operation
> and doing this would avoid those copies.
>
> However, this causes two problems:
> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make
> the object.   it created the base object and then it allocated the array.
> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
> the array in it.   Doing it this way saves one ggc allocation and one
> indirection when accessing the data within the CONST_WIDE_INT.   Our plan is
> to use the same trick at the tree level.   So to avoid the copying, you seem
> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.

I did not propose having a pointer to the data in the RTX or tree int.  Just
the short-lived wide-ints (which are on the stack) would have a pointer to
the data - which can then obviously point into the RTX and tree data.

> 2) You are now stuck either ggcing the storage inside a wide_int when they
> are created as part of an expression or you have to play some game to
> represent the two different storage plans inside of wide_int.

Hm?  wide-ints are short-lived and thus never live across a garbage collection
point.  We create non-GCed objects pointing to GCed objects all the time
and everywhere this way.

>   Clearly this
> is where you think that we should be going by suggesting that we abstract
> away the internal storage.   However, this comes at a price:   what is
> currently an array access in my patches would (i believe) become a function
> call.

No, the workers (that perform the array accesses) will simply get
a pointer to the first data element.  Then whether it's embedded or
external is of no interest to them.

>  From a performance point of view, i believe that this is a non
> starter. If you can figure out how to design this so that it is not a
> function call, i would consider this a viable option.
>
> On the other side of this you are clearly correct that we are copying the
> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.    But
> this is why we represent data inside of the wide_ints, the INT_CSTs and the
> CONST_WIDE_INTs in a compressed form.   Even with very big types, which are
> generally rare, the constants them selves are very small.   So the copy
> operation is a loop that almost always copies one element, even with
> tree-vrp which doubles the sizes of every type.
>
> There is the third option which is that the storage inside the wide int is
> just ggced storage.  We rejected this because of the functional nature of
> wide-ints.    There are zillions created, they can be stack allocated, and
> they last for very short periods of time.

Of course - GCing wide-ints is a non-starter.

>
>>> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
>>> complication without any clear use.  Especially since the number of
>>
>> Maybe the double_int typedef is without any clear use.  Properly
>> abstracting from the storage / information providers will save
>> compile-time, memory and code though.  I don't see that any thought
>> was spent on how to avoid excessive copying or dealing with
>> long(er)-lived objects and their storage needs.
>
> I actually disagree.    Wide ints can use a bloated amount of storage
> because they are designed to be very short lived and very low cost objects
> that are stack allocated.   For long term storage, there is INT_CST at the
> tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
> storage model.   The copying entailed is only a small part of the overall
> performance.

Well, but both trees and RTXen are not viable for short-lived things because
the are GCed!  double-ints were suitable for this kind of stuff because
the also have a moderate size.  With wide-ints size becomes a problem
(or GC, if you instead use trees or RTXen).

> Everything that you are suggesting along these lines is adding to the weight
> of a wide-int object.

On the contrary - it lessens their weight (with external already
existing storage)
or does not do anything to it (with the embedded storage).

>  You have to understand there will be many more
> wide-ints created in a normal compilation than were ever created with
> double-int.    This is because the rtl level had no object like this at all
> and at the tree level, many of the places that should have used double int,
> short cut the code and only did the transformations if the types fit in a
> HWI.

Your argument shows that the copy-in/out from tree/RTX to/from wide-int
will become a very frequent operation and thus it is worth optimizing it.

> This is why we are extremely defensive about this issue.   We really did
> think a lot about it.

I'm sure you did.

Richard.
Kenneth Zadeck Nov. 26, 2012, 4:03 p.m. UTC | #9
On 11/26/2012 10:03 AM, Richard Biener wrote:
> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>> I would like you to respond to at least point 1 of this email.   In it
>>>>> there is code from the rtl level that was written twice, once for the
>>>>> case when the size of the mode is less than the size of a HWI and once
>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>
>>>>> my patch changes this to one instance of the code that works no matter
>>>>> how large the data passed to it is.
>>>>>
>>>>> you have made a specific requirement for wide int to be a template that
>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>>>>> would like to know how this particular fragment is to be rewritten in
>>>>> this model?   It seems that I would have to retain the structure where
>>>>> there is one version of the code for each size that the template is
>>>>> instantiated.
>>>> I think richi's argument was that wide_int should be split into two.
>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>> and the main wide_int class should be an extension on top of that
>>>> that does things to a bit precision instead.  Presumably with some
>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>
>>>>     typedef foo<2> double_int;
>>>>
>>>> and a variable for wide_int (because in wide_int the length would be
>>>> the number of significant HWIs rather than the size of the underlying
>>>> array).  wide_int would also record the precision and apply it after
>>>> the full HWI operation.
>>>>
>>>> So the wide_int class would still provide "as wide as we need"
>>>> arithmetic,
>>>> as in your rtl patch.  I don't think he was objecting to that.
>>> That summarizes one part of my complaints / suggestions correctly.  In
>>> other
>>> mails I suggested to not make it a template but a constant over object
>>> lifetime
>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more thought
>>> than
>>> I put into them.  The main reason is that with C++ you can abstract from
>>> where
>>> wide-int information pieces are stored and thus use the arithmetic /
>>> operation
>>> workers without copying the (source) "wide-int" objects.  Thus you should
>>> be able to write adaptors for double-int storage, tree or RTX storage.
>> We had considered something along these lines and rejected it.   I am not
>> really opposed to doing something like this, but it is not an obvious
>> winning idea and is likely not to be a good idea.   Here was our thought
>> process:
>>
>> if you abstract away the storage inside a wide int, then you should be able
>> to copy a pointer to the block of data from either the rtl level integer
>> constant or the tree level one into the wide int.   It is certainly true
>> that making a wide_int from one of these is an extremely common operation
>> and doing this would avoid those copies.
>>
>> However, this causes two problems:
>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make
>> the object.   it created the base object and then it allocated the array.
>> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
>> the array in it.   Doing it this way saves one ggc allocation and one
>> indirection when accessing the data within the CONST_WIDE_INT.   Our plan is
>> to use the same trick at the tree level.   So to avoid the copying, you seem
>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
> I did not propose having a pointer to the data in the RTX or tree int.  Just
> the short-lived wide-ints (which are on the stack) would have a pointer to
> the data - which can then obviously point into the RTX and tree data.
There is the issue then what if some wide-ints are not short lived. It 
makes me nervous to create internal pointers to gc ed memory.
>> 2) You are now stuck either ggcing the storage inside a wide_int when they
>> are created as part of an expression or you have to play some game to
>> represent the two different storage plans inside of wide_int.
> Hm?  wide-ints are short-lived and thus never live across a garbage collection
> point.  We create non-GCed objects pointing to GCed objects all the time
> and everywhere this way.
Again, this makes me nervous but it could be done.  However, it does 
mean that now the wide ints that are not created from rtxes or trees 
will be more expensive because they are not going to get their storage 
"for free", they are going to alloca it.

however, it still is not clear, given that 99% of the wide ints are 
going to fit in a single hwi, that this would be a noticeable win.
>
>>    Clearly this
>> is where you think that we should be going by suggesting that we abstract
>> away the internal storage.   However, this comes at a price:   what is
>> currently an array access in my patches would (i believe) become a function
>> call.
> No, the workers (that perform the array accesses) will simply get
> a pointer to the first data element.  Then whether it's embedded or
> external is of no interest to them.
so is your plan that the wide int constructors from rtx or tree would 
just copy the pointer to the array on top of the array that is otherwise 
allocated on the stack?    I can easily do this.   But as i said, the 
gain seems quite small.

And of course, going the other way still does need the copy.
>>   From a performance point of view, i believe that this is a non
>> starter. If you can figure out how to design this so that it is not a
>> function call, i would consider this a viable option.
>>
>> On the other side of this you are clearly correct that we are copying the
>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.    But
>> this is why we represent data inside of the wide_ints, the INT_CSTs and the
>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which are
>> generally rare, the constants them selves are very small.   So the copy
>> operation is a loop that almost always copies one element, even with
>> tree-vrp which doubles the sizes of every type.
>>
>> There is the third option which is that the storage inside the wide int is
>> just ggced storage.  We rejected this because of the functional nature of
>> wide-ints.    There are zillions created, they can be stack allocated, and
>> they last for very short periods of time.
> Of course - GCing wide-ints is a non-starter.
>
>>>> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
>>>> complication without any clear use.  Especially since the number of
>>> Maybe the double_int typedef is without any clear use.  Properly
>>> abstracting from the storage / information providers will save
>>> compile-time, memory and code though.  I don't see that any thought
>>> was spent on how to avoid excessive copying or dealing with
>>> long(er)-lived objects and their storage needs.
>> I actually disagree.    Wide ints can use a bloated amount of storage
>> because they are designed to be very short lived and very low cost objects
>> that are stack allocated.   For long term storage, there is INT_CST at the
>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
>> storage model.   The copying entailed is only a small part of the overall
>> performance.
> Well, but both trees and RTXen are not viable for short-lived things because
> the are GCed!  double-ints were suitable for this kind of stuff because
> the also have a moderate size.  With wide-ints size becomes a problem
> (or GC, if you instead use trees or RTXen).
>
>> Everything that you are suggesting along these lines is adding to the weight
>> of a wide-int object.
> On the contrary - it lessens their weight (with external already
> existing storage)
> or does not do anything to it (with the embedded storage).
>
>>   You have to understand there will be many more
>> wide-ints created in a normal compilation than were ever created with
>> double-int.    This is because the rtl level had no object like this at all
>> and at the tree level, many of the places that should have used double int,
>> short cut the code and only did the transformations if the types fit in a
>> HWI.
> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
> will become a very frequent operation and thus it is worth optimizing it.
>
>> This is why we are extremely defensive about this issue.   We really did
>> think a lot about it.
> I'm sure you did.
>
> Richard.
Richard Biener Nov. 26, 2012, 4:30 p.m. UTC | #10
On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>
>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zadeck@naturalbridge.com>
>> wrote:
>>>
>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>
>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>>
>>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>>>
>>>>>> I would like you to respond to at least point 1 of this email.   In it
>>>>>> there is code from the rtl level that was written twice, once for the
>>>>>> case when the size of the mode is less than the size of a HWI and once
>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>
>>>>>> my patch changes this to one instance of the code that works no matter
>>>>>> how large the data passed to it is.
>>>>>>
>>>>>> you have made a specific requirement for wide int to be a template
>>>>>> that
>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>> I
>>>>>> would like to know how this particular fragment is to be rewritten in
>>>>>> this model?   It seems that I would have to retain the structure where
>>>>>> there is one version of the code for each size that the template is
>>>>>> instantiated.
>>>>>
>>>>> I think richi's argument was that wide_int should be split into two.
>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>> and the main wide_int class should be an extension on top of that
>>>>> that does things to a bit precision instead.  Presumably with some
>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>
>>>>>     typedef foo<2> double_int;
>>>>>
>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>> the number of significant HWIs rather than the size of the underlying
>>>>> array).  wide_int would also record the precision and apply it after
>>>>> the full HWI operation.
>>>>>
>>>>> So the wide_int class would still provide "as wide as we need"
>>>>> arithmetic,
>>>>> as in your rtl patch.  I don't think he was objecting to that.
>>>>
>>>> That summarizes one part of my complaints / suggestions correctly.  In
>>>> other
>>>> mails I suggested to not make it a template but a constant over object
>>>> lifetime
>>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more
>>>> thought
>>>> than
>>>> I put into them.  The main reason is that with C++ you can abstract from
>>>> where
>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>> operation
>>>> workers without copying the (source) "wide-int" objects.  Thus you
>>>> should
>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>
>>> We had considered something along these lines and rejected it.   I am not
>>> really opposed to doing something like this, but it is not an obvious
>>> winning idea and is likely not to be a good idea.   Here was our thought
>>> process:
>>>
>>> if you abstract away the storage inside a wide int, then you should be
>>> able
>>> to copy a pointer to the block of data from either the rtl level integer
>>> constant or the tree level one into the wide int.   It is certainly true
>>> that making a wide_int from one of these is an extremely common operation
>>> and doing this would avoid those copies.
>>>
>>> However, this causes two problems:
>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>> make
>>> the object.   it created the base object and then it allocated the array.
>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
>>> the array in it.   Doing it this way saves one ggc allocation and one
>>> indirection when accessing the data within the CONST_WIDE_INT.   Our plan
>>> is
>>> to use the same trick at the tree level.   So to avoid the copying, you
>>> seem
>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>
>> I did not propose having a pointer to the data in the RTX or tree int.
>> Just
>> the short-lived wide-ints (which are on the stack) would have a pointer to
>> the data - which can then obviously point into the RTX and tree data.
>
> There is the issue then what if some wide-ints are not short lived. It makes
> me nervous to create internal pointers to gc ed memory.

I thought they were all short-lived.

>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>> they
>>> are created as part of an expression or you have to play some game to
>>> represent the two different storage plans inside of wide_int.
>>
>> Hm?  wide-ints are short-lived and thus never live across a garbage
>> collection
>> point.  We create non-GCed objects pointing to GCed objects all the time
>> and everywhere this way.
>
> Again, this makes me nervous but it could be done.  However, it does mean
> that now the wide ints that are not created from rtxes or trees will be more
> expensive because they are not going to get their storage "for free", they
> are going to alloca it.

No, those would simply use the embedded storage model.

> however, it still is not clear, given that 99% of the wide ints are going to
> fit in a single hwi, that this would be a noticeable win.

Currently even if they fit into a HWI you will still allocate 4 times the
larges integer mode size.  You say that doesn't matter because they
are short-lived, but I say it does matter because not all of them are
short-lived enough.  If 99% fit in a HWI why allocate 4 times the
largest integer mode size in 99% of the cases?

>>
>>>    Clearly this
>>> is where you think that we should be going by suggesting that we abstract
>>> away the internal storage.   However, this comes at a price:   what is
>>> currently an array access in my patches would (i believe) become a
>>> function
>>> call.
>>
>> No, the workers (that perform the array accesses) will simply get
>> a pointer to the first data element.  Then whether it's embedded or
>> external is of no interest to them.
>
> so is your plan that the wide int constructors from rtx or tree would just
> copy the pointer to the array on top of the array that is otherwise
> allocated on the stack?    I can easily do this.   But as i said, the gain
> seems quite small.
>
> And of course, going the other way still does need the copy.

The proposal was to template wide_int on a storage model, the embedded
one would work as-is (embedding 4 times largest integer mode), the
external one would have a pointer to data.  All functions that return a
wide_int produce a wide_int with the embedded model.  To avoid
the function call penalty you described the storage model provides
a way to get a pointer to the first element and the templated operations
simply dispatch to a worker that takes this pointer to the first element
(as the storage model is designed as a template its abstraction is going
to be optimized away by means of inlining).

Richard.

>>>   From a performance point of view, i believe that this is a non
>>> starter. If you can figure out how to design this so that it is not a
>>> function call, i would consider this a viable option.
>>>
>>> On the other side of this you are clearly correct that we are copying the
>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>> But
>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>> the
>>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which
>>> are
>>> generally rare, the constants them selves are very small.   So the copy
>>> operation is a loop that almost always copies one element, even with
>>> tree-vrp which doubles the sizes of every type.
>>>
>>> There is the third option which is that the storage inside the wide int
>>> is
>>> just ggced storage.  We rejected this because of the functional nature of
>>> wide-ints.    There are zillions created, they can be stack allocated,
>>> and
>>> they last for very short periods of time.
>>
>> Of course - GCing wide-ints is a non-starter.
>>
>>>>> As is probably obvious, I don't agree FWIW.  It seems like an
>>>>> unnecessary
>>>>> complication without any clear use.  Especially since the number of
>>>>
>>>> Maybe the double_int typedef is without any clear use.  Properly
>>>> abstracting from the storage / information providers will save
>>>> compile-time, memory and code though.  I don't see that any thought
>>>> was spent on how to avoid excessive copying or dealing with
>>>> long(er)-lived objects and their storage needs.
>>>
>>> I actually disagree.    Wide ints can use a bloated amount of storage
>>> because they are designed to be very short lived and very low cost
>>> objects
>>> that are stack allocated.   For long term storage, there is INT_CST at
>>> the
>>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
>>> storage model.   The copying entailed is only a small part of the overall
>>> performance.
>>
>> Well, but both trees and RTXen are not viable for short-lived things
>> because
>> the are GCed!  double-ints were suitable for this kind of stuff because
>> the also have a moderate size.  With wide-ints size becomes a problem
>> (or GC, if you instead use trees or RTXen).
>>
>>> Everything that you are suggesting along these lines is adding to the
>>> weight
>>> of a wide-int object.
>>
>> On the contrary - it lessens their weight (with external already
>> existing storage)
>> or does not do anything to it (with the embedded storage).
>>
>>>   You have to understand there will be many more
>>> wide-ints created in a normal compilation than were ever created with
>>> double-int.    This is because the rtl level had no object like this at
>>> all
>>> and at the tree level, many of the places that should have used double
>>> int,
>>> short cut the code and only did the transformations if the types fit in a
>>> HWI.
>>
>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>> will become a very frequent operation and thus it is worth optimizing it.
>>
>>> This is why we are extremely defensive about this issue.   We really did
>>> think a lot about it.
>>
>> I'm sure you did.
>>
>> Richard.
>
>
Kenneth Zadeck Nov. 27, 2012, 12:06 a.m. UTC | #11
Richard,

I spent a good part of the afternoon talking to Mike about this.  He is 
on the c++ standards committee and is a much more seasoned c++ 
programmer than I am.

He convinced me that with a large amount of engineering and c++ 
"foolishness" that it was indeed possible to get your proposal to 
POSSIBLY work as well as what we did.

But now the question is why would any want to do this?

At the very least you are talking about instantiating two instances of 
wide-ints, one for the stack allocated uses and one for the places where 
we just move a pointer from the tree or the rtx. Then you are talking 
about creating connectors so that the stack allocated functions can take 
parameters of pointer version and visa versa.

Then there is the issue that rather than just saying that something is a 
wide int, that the programmer is going to have to track it's origin.   
In particular,  where in the code right now i say.

wide_int foo = wide_int::from_rtx (r1);
wide_int bar = wide_int::from_rtx (r2) + foo;

now i would have to say

wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;

then when i want to call some function using a wide_int ref that 
function now must be either overloaded to take both or i have to choose 
one of the two instantiations (presumably based on which is going to be 
more common) and just have the compiler fix up everything (which it is 
likely to do).

And so what is the payoff:
1) No one except the c++ elite is going to understand the code. The rest 
of the community will hate me and curse the ground that i walk on.
2) I will end up with a version of wide-int that can be used as a medium 
life container (where i define medium life as not allowed to survive a 
gc since they will contain pointers into rtxes and trees.)
3) An no clients that actually wanted to do this!!    I could use as an 
example one of your favorite passes, tree-vrp.   The current double-int 
could have been a medium lifetime container since it has a smaller 
footprint, but in fact tree-vrp converts those double-ints back into 
trees for medium storage.   Why, because it needs the other fields of a 
tree-cst to store the entire state.  Wide-ints also "suffer" this 
problem.  their only state are the data, and the three length fields.   
They have no type and none of the other tree info so the most obvious 
client for a medium lifetime object is really not going to be a good 
match even if you "solve the storage problem".

The fact is that wide-ints are an excellent short term storage class 
that can be very quickly converted into our two long term storage 
classes.  Your proposal is requires a lot of work, will not be easy to 
use and as far as i can see has no payoff on the horizon.   It could be 
that there could be future clients for a medium lifetime value, but 
asking for this with no clients in hand is really beyond the scope of a 
reasonable review.

I remind you that the purpose of these patches is to solve problems that 
exist in the current compiler that we have papered over for years.   If 
someone needs wide-ints in some way that is not foreseen then they can 
change it.

kenny

On 11/26/2012 11:30 AM, Richard Biener wrote:
> On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zadeck@naturalbridge.com>
>>> wrote:
>>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>>>> I would like you to respond to at least point 1 of this email.   In it
>>>>>>> there is code from the rtl level that was written twice, once for the
>>>>>>> case when the size of the mode is less than the size of a HWI and once
>>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>>
>>>>>>> my patch changes this to one instance of the code that works no matter
>>>>>>> how large the data passed to it is.
>>>>>>>
>>>>>>> you have made a specific requirement for wide int to be a template
>>>>>>> that
>>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>>> I
>>>>>>> would like to know how this particular fragment is to be rewritten in
>>>>>>> this model?   It seems that I would have to retain the structure where
>>>>>>> there is one version of the code for each size that the template is
>>>>>>> instantiated.
>>>>>> I think richi's argument was that wide_int should be split into two.
>>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>>> and the main wide_int class should be an extension on top of that
>>>>>> that does things to a bit precision instead.  Presumably with some
>>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>>
>>>>>>      typedef foo<2> double_int;
>>>>>>
>>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>>> the number of significant HWIs rather than the size of the underlying
>>>>>> array).  wide_int would also record the precision and apply it after
>>>>>> the full HWI operation.
>>>>>>
>>>>>> So the wide_int class would still provide "as wide as we need"
>>>>>> arithmetic,
>>>>>> as in your rtl patch.  I don't think he was objecting to that.
>>>>> That summarizes one part of my complaints / suggestions correctly.  In
>>>>> other
>>>>> mails I suggested to not make it a template but a constant over object
>>>>> lifetime
>>>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more
>>>>> thought
>>>>> than
>>>>> I put into them.  The main reason is that with C++ you can abstract from
>>>>> where
>>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>>> operation
>>>>> workers without copying the (source) "wide-int" objects.  Thus you
>>>>> should
>>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>> We had considered something along these lines and rejected it.   I am not
>>>> really opposed to doing something like this, but it is not an obvious
>>>> winning idea and is likely not to be a good idea.   Here was our thought
>>>> process:
>>>>
>>>> if you abstract away the storage inside a wide int, then you should be
>>>> able
>>>> to copy a pointer to the block of data from either the rtl level integer
>>>> constant or the tree level one into the wide int.   It is certainly true
>>>> that making a wide_int from one of these is an extremely common operation
>>>> and doing this would avoid those copies.
>>>>
>>>> However, this causes two problems:
>>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>>> make
>>>> the object.   it created the base object and then it allocated the array.
>>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
>>>> the array in it.   Doing it this way saves one ggc allocation and one
>>>> indirection when accessing the data within the CONST_WIDE_INT.   Our plan
>>>> is
>>>> to use the same trick at the tree level.   So to avoid the copying, you
>>>> seem
>>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>> I did not propose having a pointer to the data in the RTX or tree int.
>>> Just
>>> the short-lived wide-ints (which are on the stack) would have a pointer to
>>> the data - which can then obviously point into the RTX and tree data.
>> There is the issue then what if some wide-ints are not short lived. It makes
>> me nervous to create internal pointers to gc ed memory.
> I thought they were all short-lived.
>
>>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>>> they
>>>> are created as part of an expression or you have to play some game to
>>>> represent the two different storage plans inside of wide_int.
>>> Hm?  wide-ints are short-lived and thus never live across a garbage
>>> collection
>>> point.  We create non-GCed objects pointing to GCed objects all the time
>>> and everywhere this way.
>> Again, this makes me nervous but it could be done.  However, it does mean
>> that now the wide ints that are not created from rtxes or trees will be more
>> expensive because they are not going to get their storage "for free", they
>> are going to alloca it.
> No, those would simply use the embedded storage model.
>
>> however, it still is not clear, given that 99% of the wide ints are going to
>> fit in a single hwi, that this would be a noticeable win.
> Currently even if they fit into a HWI you will still allocate 4 times the
> larges integer mode size.  You say that doesn't matter because they
> are short-lived, but I say it does matter because not all of them are
> short-lived enough.  If 99% fit in a HWI why allocate 4 times the
> largest integer mode size in 99% of the cases?
>
>>>>     Clearly this
>>>> is where you think that we should be going by suggesting that we abstract
>>>> away the internal storage.   However, this comes at a price:   what is
>>>> currently an array access in my patches would (i believe) become a
>>>> function
>>>> call.
>>> No, the workers (that perform the array accesses) will simply get
>>> a pointer to the first data element.  Then whether it's embedded or
>>> external is of no interest to them.
>> so is your plan that the wide int constructors from rtx or tree would just
>> copy the pointer to the array on top of the array that is otherwise
>> allocated on the stack?    I can easily do this.   But as i said, the gain
>> seems quite small.
>>
>> And of course, going the other way still does need the copy.
> The proposal was to template wide_int on a storage model, the embedded
> one would work as-is (embedding 4 times largest integer mode), the
> external one would have a pointer to data.  All functions that return a
> wide_int produce a wide_int with the embedded model.  To avoid
> the function call penalty you described the storage model provides
> a way to get a pointer to the first element and the templated operations
> simply dispatch to a worker that takes this pointer to the first element
> (as the storage model is designed as a template its abstraction is going
> to be optimized away by means of inlining).
>
> Richard.
>
>>>>    From a performance point of view, i believe that this is a non
>>>> starter. If you can figure out how to design this so that it is not a
>>>> function call, i would consider this a viable option.
>>>>
>>>> On the other side of this you are clearly correct that we are copying the
>>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>>> But
>>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>>> the
>>>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which
>>>> are
>>>> generally rare, the constants them selves are very small.   So the copy
>>>> operation is a loop that almost always copies one element, even with
>>>> tree-vrp which doubles the sizes of every type.
>>>>
>>>> There is the third option which is that the storage inside the wide int
>>>> is
>>>> just ggced storage.  We rejected this because of the functional nature of
>>>> wide-ints.    There are zillions created, they can be stack allocated,
>>>> and
>>>> they last for very short periods of time.
>>> Of course - GCing wide-ints is a non-starter.
>>>
>>>>>> As is probably obvious, I don't agree FWIW.  It seems like an
>>>>>> unnecessary
>>>>>> complication without any clear use.  Especially since the number of
>>>>> Maybe the double_int typedef is without any clear use.  Properly
>>>>> abstracting from the storage / information providers will save
>>>>> compile-time, memory and code though.  I don't see that any thought
>>>>> was spent on how to avoid excessive copying or dealing with
>>>>> long(er)-lived objects and their storage needs.
>>>> I actually disagree.    Wide ints can use a bloated amount of storage
>>>> because they are designed to be very short lived and very low cost
>>>> objects
>>>> that are stack allocated.   For long term storage, there is INT_CST at
>>>> the
>>>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
>>>> storage model.   The copying entailed is only a small part of the overall
>>>> performance.
>>> Well, but both trees and RTXen are not viable for short-lived things
>>> because
>>> the are GCed!  double-ints were suitable for this kind of stuff because
>>> the also have a moderate size.  With wide-ints size becomes a problem
>>> (or GC, if you instead use trees or RTXen).
>>>
>>>> Everything that you are suggesting along these lines is adding to the
>>>> weight
>>>> of a wide-int object.
>>> On the contrary - it lessens their weight (with external already
>>> existing storage)
>>> or does not do anything to it (with the embedded storage).
>>>
>>>>    You have to understand there will be many more
>>>> wide-ints created in a normal compilation than were ever created with
>>>> double-int.    This is because the rtl level had no object like this at
>>>> all
>>>> and at the tree level, many of the places that should have used double
>>>> int,
>>>> short cut the code and only did the transformations if the types fit in a
>>>> HWI.
>>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>>> will become a very frequent operation and thus it is worth optimizing it.
>>>
>>>> This is why we are extremely defensive about this issue.   We really did
>>>> think a lot about it.
>>> I'm sure you did.
>>>
>>> Richard.
>>
Richard Biener Nov. 27, 2012, 10:03 a.m. UTC | #12
On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
> Richard,
>
> I spent a good part of the afternoon talking to Mike about this.  He is on
> the c++ standards committee and is a much more seasoned c++ programmer than
> I am.
>
> He convinced me that with a large amount of engineering and c++
> "foolishness" that it was indeed possible to get your proposal to POSSIBLY
> work as well as what we did.
>
> But now the question is why would any want to do this?
>
> At the very least you are talking about instantiating two instances of
> wide-ints, one for the stack allocated uses and one for the places where we
> just move a pointer from the tree or the rtx. Then you are talking about
> creating connectors so that the stack allocated functions can take
> parameters of pointer version and visa versa.
>
> Then there is the issue that rather than just saying that something is a
> wide int, that the programmer is going to have to track it's origin.   In
> particular,  where in the code right now i say.
>
> wide_int foo = wide_int::from_rtx (r1);
> wide_int bar = wide_int::from_rtx (r2) + foo;
>
> now i would have to say
>
> wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
> wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;

No, you'd say

wide_int foo = wide_int::from_rtx (r1);

and the static, non-templated from_rtx method would automagically
return (always!) a "wide_int_ptr" kind.  The initialization then would
use the assignment operator that mediates between wide_int and
"wide_int_ptr", doing the copying.

The user should get a 'stack' kind by default when specifying wide_int,
like implemented with

struct wide_int_storage_stack;
struct wide_int_storage_ptr;

template <class storage = wide_int_storage_stack>
class wide_int : public storage
{
...
   static wide_int <wide_int_storage_ptr> from_rtx (rtx);
}

the whole point of the exercise is to make from_rtx and from_tree avoid
the copying (and excessive stack space allocation) for the rvalue case
like in

 wide_int res = wide_int::from_rtx (x) + 1;

if you save the result into a wide_int temporary first then you are lost
of course (modulo some magic GCC optimization being able to elide
the copy somehow).

And of course for code like VRP that keeps a lattice of wide_ints to
be able to reduce its footprint by using ptr storage and explicit allocations
(that's a secondary concern, of course).  And for VRP to specify that
it needs more than the otherwise needed MAX_INT_MODE_SIZE.
ptr storage would not have this arbitrary limitation, only embedded
storage (should) have.

> then when i want to call some function using a wide_int ref that function
> now must be either overloaded to take both or i have to choose one of the
> two instantiations (presumably based on which is going to be more common)
> and just have the compiler fix up everything (which it is likely to do).

Nope, they'd be

class wide_int ...
{
   template <class storage1, class storage2>
   wide_int operator+(wide_int <storage1> a, wide_int<storage2> b)
   {
      return wide_int::plus_worker (a.precision, a. ...., a.get_storage_ptr (),
                                                b.precision, ...,
b.get_storage_ptr ());
   }


> And so what is the payoff:
> 1) No one except the c++ elite is going to understand the code. The rest of
> the community will hate me and curse the ground that i walk on.

Maybe for the implementation - but look at hash-table and vec ... not for
usage certainly.

> 2) I will end up with a version of wide-int that can be used as a medium
> life container (where i define medium life as not allowed to survive a gc
> since they will contain pointers into rtxes and trees.)
> 3) An no clients that actually wanted to do this!!    I could use as an
> example one of your favorite passes, tree-vrp.   The current double-int
> could have been a medium lifetime container since it has a smaller
> footprint, but in fact tree-vrp converts those double-ints back into trees
> for medium storage.   Why, because it needs the other fields of a tree-cst
> to store the entire state.  Wide-ints also "suffer" this problem.  their
> only state are the data, and the three length fields.   They have no type
> and none of the other tree info so the most obvious client for a medium
> lifetime object is really not going to be a good match even if you "solve
> the storage problem".
>
> The fact is that wide-ints are an excellent short term storage class that
> can be very quickly converted into our two long term storage classes.  Your
> proposal is requires a lot of work, will not be easy to use and as far as i
> can see has no payoff on the horizon.   It could be that there could be
> future clients for a medium lifetime value, but asking for this with no
> clients in hand is really beyond the scope of a reasonable review.
>
> I remind you that the purpose of these patches is to solve problems that
> exist in the current compiler that we have papered over for years.   If
> someone needs wide-ints in some way that is not foreseen then they can
> change it.

The patches introduce a lot more temporary wide-ints (your words) and
at the same time makes construction of them from tree / rtx very expensive
both stack space and compile-time wise.  Look at how we for example
compute TREE_INT_CST + 1 - int_cst_binop internally uses double_ints
for the computation and then instantiates a new tree for holding the result.
Now we'd use wide_ints for this requring totally unnecessary copying.
Why not in the first place try to avoid that.  And try to avoid making
wide_ints 4 times as large as really necessary just for the sake of VRP!
(VRP should have a way to say "_I_ want larger wide_ints", without putting
this burden on all other users).

Richard.

> kenny
>
>
> On 11/26/2012 11:30 AM, Richard Biener wrote:
>>
>> On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>>>
>>>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck
>>>> <zadeck@naturalbridge.com>
>>>> wrote:
>>>>>
>>>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>>>
>>>>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>>>>>
>>>>>>>> I would like you to respond to at least point 1 of this email.   In
>>>>>>>> it
>>>>>>>> there is code from the rtl level that was written twice, once for
>>>>>>>> the
>>>>>>>> case when the size of the mode is less than the size of a HWI and
>>>>>>>> once
>>>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>>>
>>>>>>>> my patch changes this to one instance of the code that works no
>>>>>>>> matter
>>>>>>>> how large the data passed to it is.
>>>>>>>>
>>>>>>>> you have made a specific requirement for wide int to be a template
>>>>>>>> that
>>>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>>>> I
>>>>>>>> would like to know how this particular fragment is to be rewritten
>>>>>>>> in
>>>>>>>> this model?   It seems that I would have to retain the structure
>>>>>>>> where
>>>>>>>> there is one version of the code for each size that the template is
>>>>>>>> instantiated.
>>>>>>>
>>>>>>> I think richi's argument was that wide_int should be split into two.
>>>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>>>> and the main wide_int class should be an extension on top of that
>>>>>>> that does things to a bit precision instead.  Presumably with some
>>>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>>>
>>>>>>>      typedef foo<2> double_int;
>>>>>>>
>>>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>>>> the number of significant HWIs rather than the size of the underlying
>>>>>>> array).  wide_int would also record the precision and apply it after
>>>>>>> the full HWI operation.
>>>>>>>
>>>>>>> So the wide_int class would still provide "as wide as we need"
>>>>>>> arithmetic,
>>>>>>> as in your rtl patch.  I don't think he was objecting to that.
>>>>>>
>>>>>> That summarizes one part of my complaints / suggestions correctly.  In
>>>>>> other
>>>>>> mails I suggested to not make it a template but a constant over object
>>>>>> lifetime
>>>>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more
>>>>>> thought
>>>>>> than
>>>>>> I put into them.  The main reason is that with C++ you can abstract
>>>>>> from
>>>>>> where
>>>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>>>> operation
>>>>>> workers without copying the (source) "wide-int" objects.  Thus you
>>>>>> should
>>>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>>>
>>>>> We had considered something along these lines and rejected it.   I am
>>>>> not
>>>>> really opposed to doing something like this, but it is not an obvious
>>>>> winning idea and is likely not to be a good idea.   Here was our
>>>>> thought
>>>>> process:
>>>>>
>>>>> if you abstract away the storage inside a wide int, then you should be
>>>>> able
>>>>> to copy a pointer to the block of data from either the rtl level
>>>>> integer
>>>>> constant or the tree level one into the wide int.   It is certainly
>>>>> true
>>>>> that making a wide_int from one of these is an extremely common
>>>>> operation
>>>>> and doing this would avoid those copies.
>>>>>
>>>>> However, this causes two problems:
>>>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>>>> make
>>>>> the object.   it created the base object and then it allocated the
>>>>> array.
>>>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that
>>>>> had
>>>>> the array in it.   Doing it this way saves one ggc allocation and one
>>>>> indirection when accessing the data within the CONST_WIDE_INT.   Our
>>>>> plan
>>>>> is
>>>>> to use the same trick at the tree level.   So to avoid the copying, you
>>>>> seem
>>>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>>>
>>>> I did not propose having a pointer to the data in the RTX or tree int.
>>>> Just
>>>> the short-lived wide-ints (which are on the stack) would have a pointer
>>>> to
>>>> the data - which can then obviously point into the RTX and tree data.
>>>
>>> There is the issue then what if some wide-ints are not short lived. It
>>> makes
>>> me nervous to create internal pointers to gc ed memory.
>>
>> I thought they were all short-lived.
>>
>>>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>>>> they
>>>>> are created as part of an expression or you have to play some game to
>>>>> represent the two different storage plans inside of wide_int.
>>>>
>>>> Hm?  wide-ints are short-lived and thus never live across a garbage
>>>> collection
>>>> point.  We create non-GCed objects pointing to GCed objects all the time
>>>> and everywhere this way.
>>>
>>> Again, this makes me nervous but it could be done.  However, it does mean
>>> that now the wide ints that are not created from rtxes or trees will be
>>> more
>>> expensive because they are not going to get their storage "for free",
>>> they
>>> are going to alloca it.
>>
>> No, those would simply use the embedded storage model.
>>
>>> however, it still is not clear, given that 99% of the wide ints are going
>>> to
>>> fit in a single hwi, that this would be a noticeable win.
>>
>> Currently even if they fit into a HWI you will still allocate 4 times the
>> larges integer mode size.  You say that doesn't matter because they
>> are short-lived, but I say it does matter because not all of them are
>> short-lived enough.  If 99% fit in a HWI why allocate 4 times the
>> largest integer mode size in 99% of the cases?
>>
>>>>>     Clearly this
>>>>> is where you think that we should be going by suggesting that we
>>>>> abstract
>>>>> away the internal storage.   However, this comes at a price:   what is
>>>>> currently an array access in my patches would (i believe) become a
>>>>> function
>>>>> call.
>>>>
>>>> No, the workers (that perform the array accesses) will simply get
>>>> a pointer to the first data element.  Then whether it's embedded or
>>>> external is of no interest to them.
>>>
>>> so is your plan that the wide int constructors from rtx or tree would
>>> just
>>> copy the pointer to the array on top of the array that is otherwise
>>> allocated on the stack?    I can easily do this.   But as i said, the
>>> gain
>>> seems quite small.
>>>
>>> And of course, going the other way still does need the copy.
>>
>> The proposal was to template wide_int on a storage model, the embedded
>> one would work as-is (embedding 4 times largest integer mode), the
>> external one would have a pointer to data.  All functions that return a
>> wide_int produce a wide_int with the embedded model.  To avoid
>> the function call penalty you described the storage model provides
>> a way to get a pointer to the first element and the templated operations
>> simply dispatch to a worker that takes this pointer to the first element
>> (as the storage model is designed as a template its abstraction is going
>> to be optimized away by means of inlining).
>>
>> Richard.
>>
>>>>>    From a performance point of view, i believe that this is a non
>>>>> starter. If you can figure out how to design this so that it is not a
>>>>> function call, i would consider this a viable option.
>>>>>
>>>>> On the other side of this you are clearly correct that we are copying
>>>>> the
>>>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>>>> But
>>>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>>>> the
>>>>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which
>>>>> are
>>>>> generally rare, the constants them selves are very small.   So the copy
>>>>> operation is a loop that almost always copies one element, even with
>>>>> tree-vrp which doubles the sizes of every type.
>>>>>
>>>>> There is the third option which is that the storage inside the wide int
>>>>> is
>>>>> just ggced storage.  We rejected this because of the functional nature
>>>>> of
>>>>> wide-ints.    There are zillions created, they can be stack allocated,
>>>>> and
>>>>> they last for very short periods of time.
>>>>
>>>> Of course - GCing wide-ints is a non-starter.
>>>>
>>>>>>> As is probably obvious, I don't agree FWIW.  It seems like an
>>>>>>> unnecessary
>>>>>>> complication without any clear use.  Especially since the number of
>>>>>>
>>>>>> Maybe the double_int typedef is without any clear use.  Properly
>>>>>> abstracting from the storage / information providers will save
>>>>>> compile-time, memory and code though.  I don't see that any thought
>>>>>> was spent on how to avoid excessive copying or dealing with
>>>>>> long(er)-lived objects and their storage needs.
>>>>>
>>>>> I actually disagree.    Wide ints can use a bloated amount of storage
>>>>> because they are designed to be very short lived and very low cost
>>>>> objects
>>>>> that are stack allocated.   For long term storage, there is INT_CST at
>>>>> the
>>>>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very
>>>>> compact
>>>>> storage model.   The copying entailed is only a small part of the
>>>>> overall
>>>>> performance.
>>>>
>>>> Well, but both trees and RTXen are not viable for short-lived things
>>>> because
>>>> the are GCed!  double-ints were suitable for this kind of stuff because
>>>> the also have a moderate size.  With wide-ints size becomes a problem
>>>> (or GC, if you instead use trees or RTXen).
>>>>
>>>>> Everything that you are suggesting along these lines is adding to the
>>>>> weight
>>>>> of a wide-int object.
>>>>
>>>> On the contrary - it lessens their weight (with external already
>>>> existing storage)
>>>> or does not do anything to it (with the embedded storage).
>>>>
>>>>>    You have to understand there will be many more
>>>>> wide-ints created in a normal compilation than were ever created with
>>>>> double-int.    This is because the rtl level had no object like this at
>>>>> all
>>>>> and at the tree level, many of the places that should have used double
>>>>> int,
>>>>> short cut the code and only did the transformations if the types fit in
>>>>> a
>>>>> HWI.
>>>>
>>>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>>>> will become a very frequent operation and thus it is worth optimizing
>>>> it.
>>>>
>>>>> This is why we are extremely defensive about this issue.   We really
>>>>> did
>>>>> think a lot about it.
>>>>
>>>> I'm sure you did.
>>>>
>>>> Richard.
>>>
>>>
>
Kenneth Zadeck Nov. 27, 2012, 1:02 p.m. UTC | #13
i will discuss this with mike when he wakes up.    he lives on the west 
pole so that will not be until after you go to bed.

the one point that i will take exception to is that the copying 
operation is, in practice, any more time expensive than the pointer 
copy.   I never bother to initialize the storage in the array, i only 
copy the elements that are live.    This is with almost always 1 hwi 
because either most types are small or most constants of large types 
compress to 1 hwi.    So even if a compilation does a zillion 
::from_trees, you will most likely never see the difference in time.

kenny


On 11/27/2012 05:03 AM, Richard Biener wrote:
> On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> Richard,
>>
>> I spent a good part of the afternoon talking to Mike about this.  He is on
>> the c++ standards committee and is a much more seasoned c++ programmer than
>> I am.
>>
>> He convinced me that with a large amount of engineering and c++
>> "foolishness" that it was indeed possible to get your proposal to POSSIBLY
>> work as well as what we did.
>>
>> But now the question is why would any want to do this?
>>
>> At the very least you are talking about instantiating two instances of
>> wide-ints, one for the stack allocated uses and one for the places where we
>> just move a pointer from the tree or the rtx. Then you are talking about
>> creating connectors so that the stack allocated functions can take
>> parameters of pointer version and visa versa.
>>
>> Then there is the issue that rather than just saying that something is a
>> wide int, that the programmer is going to have to track it's origin.   In
>> particular,  where in the code right now i say.
>>
>> wide_int foo = wide_int::from_rtx (r1);
>> wide_int bar = wide_int::from_rtx (r2) + foo;
>>
>> now i would have to say
>>
>> wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
>> wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;
> No, you'd say
>
> wide_int foo = wide_int::from_rtx (r1);
>
> and the static, non-templated from_rtx method would automagically
> return (always!) a "wide_int_ptr" kind.  The initialization then would
> use the assignment operator that mediates between wide_int and
> "wide_int_ptr", doing the copying.
>
> The user should get a 'stack' kind by default when specifying wide_int,
> like implemented with
>
> struct wide_int_storage_stack;
> struct wide_int_storage_ptr;
>
> template <class storage = wide_int_storage_stack>
> class wide_int : public storage
> {
> ...
>     static wide_int <wide_int_storage_ptr> from_rtx (rtx);
> }
>
> the whole point of the exercise is to make from_rtx and from_tree avoid
> the copying (and excessive stack space allocation) for the rvalue case
> like in
>
>   wide_int res = wide_int::from_rtx (x) + 1;
>
> if you save the result into a wide_int temporary first then you are lost
> of course (modulo some magic GCC optimization being able to elide
> the copy somehow).
>
> And of course for code like VRP that keeps a lattice of wide_ints to
> be able to reduce its footprint by using ptr storage and explicit allocations
> (that's a secondary concern, of course).  And for VRP to specify that
> it needs more than the otherwise needed MAX_INT_MODE_SIZE.
> ptr storage would not have this arbitrary limitation, only embedded
> storage (should) have.
>
>> then when i want to call some function using a wide_int ref that function
>> now must be either overloaded to take both or i have to choose one of the
>> two instantiations (presumably based on which is going to be more common)
>> and just have the compiler fix up everything (which it is likely to do).
> Nope, they'd be
>
> class wide_int ...
> {
>     template <class storage1, class storage2>
>     wide_int operator+(wide_int <storage1> a, wide_int<storage2> b)
>     {
>        return wide_int::plus_worker (a.precision, a. ...., a.get_storage_ptr (),
>                                                  b.precision, ...,
> b.get_storage_ptr ());
>     }
>
>
>> And so what is the payoff:
>> 1) No one except the c++ elite is going to understand the code. The rest of
>> the community will hate me and curse the ground that i walk on.
> Maybe for the implementation - but look at hash-table and vec ... not for
> usage certainly.
>
>> 2) I will end up with a version of wide-int that can be used as a medium
>> life container (where i define medium life as not allowed to survive a gc
>> since they will contain pointers into rtxes and trees.)
>> 3) An no clients that actually wanted to do this!!    I could use as an
>> example one of your favorite passes, tree-vrp.   The current double-int
>> could have been a medium lifetime container since it has a smaller
>> footprint, but in fact tree-vrp converts those double-ints back into trees
>> for medium storage.   Why, because it needs the other fields of a tree-cst
>> to store the entire state.  Wide-ints also "suffer" this problem.  their
>> only state are the data, and the three length fields.   They have no type
>> and none of the other tree info so the most obvious client for a medium
>> lifetime object is really not going to be a good match even if you "solve
>> the storage problem".
>>
>> The fact is that wide-ints are an excellent short term storage class that
>> can be very quickly converted into our two long term storage classes.  Your
>> proposal is requires a lot of work, will not be easy to use and as far as i
>> can see has no payoff on the horizon.   It could be that there could be
>> future clients for a medium lifetime value, but asking for this with no
>> clients in hand is really beyond the scope of a reasonable review.
>>
>> I remind you that the purpose of these patches is to solve problems that
>> exist in the current compiler that we have papered over for years.   If
>> someone needs wide-ints in some way that is not foreseen then they can
>> change it.
> The patches introduce a lot more temporary wide-ints (your words) and
> at the same time makes construction of them from tree / rtx very expensive
> both stack space and compile-time wise.  Look at how we for example
> compute TREE_INT_CST + 1 - int_cst_binop internally uses double_ints
> for the computation and then instantiates a new tree for holding the result.
> Now we'd use wide_ints for this requring totally unnecessary copying.
> Why not in the first place try to avoid that.  And try to avoid making
> wide_ints 4 times as large as really necessary just for the sake of VRP!
> (VRP should have a way to say "_I_ want larger wide_ints", without putting
> this burden on all other users).
>
> Richard.
>
>> kenny
>>
>>
>> On 11/26/2012 11:30 AM, Richard Biener wrote:
>>> On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
>>> <zadeck@naturalbridge.com> wrote:
>>>> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>>>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck
>>>>> <zadeck@naturalbridge.com>
>>>>> wrote:
>>>>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>>>> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>>>>>>> I would like you to respond to at least point 1 of this email.   In
>>>>>>>>> it
>>>>>>>>> there is code from the rtl level that was written twice, once for
>>>>>>>>> the
>>>>>>>>> case when the size of the mode is less than the size of a HWI and
>>>>>>>>> once
>>>>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>>>>
>>>>>>>>> my patch changes this to one instance of the code that works no
>>>>>>>>> matter
>>>>>>>>> how large the data passed to it is.
>>>>>>>>>
>>>>>>>>> you have made a specific requirement for wide int to be a template
>>>>>>>>> that
>>>>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>>>>> I
>>>>>>>>> would like to know how this particular fragment is to be rewritten
>>>>>>>>> in
>>>>>>>>> this model?   It seems that I would have to retain the structure
>>>>>>>>> where
>>>>>>>>> there is one version of the code for each size that the template is
>>>>>>>>> instantiated.
>>>>>>>> I think richi's argument was that wide_int should be split into two.
>>>>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>>>>> and the main wide_int class should be an extension on top of that
>>>>>>>> that does things to a bit precision instead.  Presumably with some
>>>>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>>>>
>>>>>>>>       typedef foo<2> double_int;
>>>>>>>>
>>>>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>>>>> the number of significant HWIs rather than the size of the underlying
>>>>>>>> array).  wide_int would also record the precision and apply it after
>>>>>>>> the full HWI operation.
>>>>>>>>
>>>>>>>> So the wide_int class would still provide "as wide as we need"
>>>>>>>> arithmetic,
>>>>>>>> as in your rtl patch.  I don't think he was objecting to that.
>>>>>>> That summarizes one part of my complaints / suggestions correctly.  In
>>>>>>> other
>>>>>>> mails I suggested to not make it a template but a constant over object
>>>>>>> lifetime
>>>>>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more
>>>>>>> thought
>>>>>>> than
>>>>>>> I put into them.  The main reason is that with C++ you can abstract
>>>>>>> from
>>>>>>> where
>>>>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>>>>> operation
>>>>>>> workers without copying the (source) "wide-int" objects.  Thus you
>>>>>>> should
>>>>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>>>> We had considered something along these lines and rejected it.   I am
>>>>>> not
>>>>>> really opposed to doing something like this, but it is not an obvious
>>>>>> winning idea and is likely not to be a good idea.   Here was our
>>>>>> thought
>>>>>> process:
>>>>>>
>>>>>> if you abstract away the storage inside a wide int, then you should be
>>>>>> able
>>>>>> to copy a pointer to the block of data from either the rtl level
>>>>>> integer
>>>>>> constant or the tree level one into the wide int.   It is certainly
>>>>>> true
>>>>>> that making a wide_int from one of these is an extremely common
>>>>>> operation
>>>>>> and doing this would avoid those copies.
>>>>>>
>>>>>> However, this causes two problems:
>>>>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>>>>> make
>>>>>> the object.   it created the base object and then it allocated the
>>>>>> array.
>>>>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that
>>>>>> had
>>>>>> the array in it.   Doing it this way saves one ggc allocation and one
>>>>>> indirection when accessing the data within the CONST_WIDE_INT.   Our
>>>>>> plan
>>>>>> is
>>>>>> to use the same trick at the tree level.   So to avoid the copying, you
>>>>>> seem
>>>>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>>>> I did not propose having a pointer to the data in the RTX or tree int.
>>>>> Just
>>>>> the short-lived wide-ints (which are on the stack) would have a pointer
>>>>> to
>>>>> the data - which can then obviously point into the RTX and tree data.
>>>> There is the issue then what if some wide-ints are not short lived. It
>>>> makes
>>>> me nervous to create internal pointers to gc ed memory.
>>> I thought they were all short-lived.
>>>
>>>>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>>>>> they
>>>>>> are created as part of an expression or you have to play some game to
>>>>>> represent the two different storage plans inside of wide_int.
>>>>> Hm?  wide-ints are short-lived and thus never live across a garbage
>>>>> collection
>>>>> point.  We create non-GCed objects pointing to GCed objects all the time
>>>>> and everywhere this way.
>>>> Again, this makes me nervous but it could be done.  However, it does mean
>>>> that now the wide ints that are not created from rtxes or trees will be
>>>> more
>>>> expensive because they are not going to get their storage "for free",
>>>> they
>>>> are going to alloca it.
>>> No, those would simply use the embedded storage model.
>>>
>>>> however, it still is not clear, given that 99% of the wide ints are going
>>>> to
>>>> fit in a single hwi, that this would be a noticeable win.
>>> Currently even if they fit into a HWI you will still allocate 4 times the
>>> larges integer mode size.  You say that doesn't matter because they
>>> are short-lived, but I say it does matter because not all of them are
>>> short-lived enough.  If 99% fit in a HWI why allocate 4 times the
>>> largest integer mode size in 99% of the cases?
>>>
>>>>>>      Clearly this
>>>>>> is where you think that we should be going by suggesting that we
>>>>>> abstract
>>>>>> away the internal storage.   However, this comes at a price:   what is
>>>>>> currently an array access in my patches would (i believe) become a
>>>>>> function
>>>>>> call.
>>>>> No, the workers (that perform the array accesses) will simply get
>>>>> a pointer to the first data element.  Then whether it's embedded or
>>>>> external is of no interest to them.
>>>> so is your plan that the wide int constructors from rtx or tree would
>>>> just
>>>> copy the pointer to the array on top of the array that is otherwise
>>>> allocated on the stack?    I can easily do this.   But as i said, the
>>>> gain
>>>> seems quite small.
>>>>
>>>> And of course, going the other way still does need the copy.
>>> The proposal was to template wide_int on a storage model, the embedded
>>> one would work as-is (embedding 4 times largest integer mode), the
>>> external one would have a pointer to data.  All functions that return a
>>> wide_int produce a wide_int with the embedded model.  To avoid
>>> the function call penalty you described the storage model provides
>>> a way to get a pointer to the first element and the templated operations
>>> simply dispatch to a worker that takes this pointer to the first element
>>> (as the storage model is designed as a template its abstraction is going
>>> to be optimized away by means of inlining).
>>>
>>> Richard.
>>>
>>>>>>     From a performance point of view, i believe that this is a non
>>>>>> starter. If you can figure out how to design this so that it is not a
>>>>>> function call, i would consider this a viable option.
>>>>>>
>>>>>> On the other side of this you are clearly correct that we are copying
>>>>>> the
>>>>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>>>>> But
>>>>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>>>>> the
>>>>>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which
>>>>>> are
>>>>>> generally rare, the constants them selves are very small.   So the copy
>>>>>> operation is a loop that almost always copies one element, even with
>>>>>> tree-vrp which doubles the sizes of every type.
>>>>>>
>>>>>> There is the third option which is that the storage inside the wide int
>>>>>> is
>>>>>> just ggced storage.  We rejected this because of the functional nature
>>>>>> of
>>>>>> wide-ints.    There are zillions created, they can be stack allocated,
>>>>>> and
>>>>>> they last for very short periods of time.
>>>>> Of course - GCing wide-ints is a non-starter.
>>>>>
>>>>>>>> As is probably obvious, I don't agree FWIW.  It seems like an
>>>>>>>> unnecessary
>>>>>>>> complication without any clear use.  Especially since the number of
>>>>>>> Maybe the double_int typedef is without any clear use.  Properly
>>>>>>> abstracting from the storage / information providers will save
>>>>>>> compile-time, memory and code though.  I don't see that any thought
>>>>>>> was spent on how to avoid excessive copying or dealing with
>>>>>>> long(er)-lived objects and their storage needs.
>>>>>> I actually disagree.    Wide ints can use a bloated amount of storage
>>>>>> because they are designed to be very short lived and very low cost
>>>>>> objects
>>>>>> that are stack allocated.   For long term storage, there is INT_CST at
>>>>>> the
>>>>>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very
>>>>>> compact
>>>>>> storage model.   The copying entailed is only a small part of the
>>>>>> overall
>>>>>> performance.
>>>>> Well, but both trees and RTXen are not viable for short-lived things
>>>>> because
>>>>> the are GCed!  double-ints were suitable for this kind of stuff because
>>>>> the also have a moderate size.  With wide-ints size becomes a problem
>>>>> (or GC, if you instead use trees or RTXen).
>>>>>
>>>>>> Everything that you are suggesting along these lines is adding to the
>>>>>> weight
>>>>>> of a wide-int object.
>>>>> On the contrary - it lessens their weight (with external already
>>>>> existing storage)
>>>>> or does not do anything to it (with the embedded storage).
>>>>>
>>>>>>     You have to understand there will be many more
>>>>>> wide-ints created in a normal compilation than were ever created with
>>>>>> double-int.    This is because the rtl level had no object like this at
>>>>>> all
>>>>>> and at the tree level, many of the places that should have used double
>>>>>> int,
>>>>>> short cut the code and only did the transformations if the types fit in
>>>>>> a
>>>>>> HWI.
>>>>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>>>>> will become a very frequent operation and thus it is worth optimizing
>>>>> it.
>>>>>
>>>>>> This is why we are extremely defensive about this issue.   We really
>>>>>> did
>>>>>> think a lot about it.
>>>>> I'm sure you did.
>>>>>
>>>>> Richard.
>>>>
diff mbox

Patch

@@ -1373,302 +1411,87 @@  simplify_const_unary_operation (enum rtx_code code, enum machine_mode mode,
       return CONST_DOUBLE_FROM_REAL_VALUE (d, mode);
     }
 
-  if (CONST_INT_P (op)
-      && width <= HOST_BITS_PER_WIDE_INT && width > 0)
+  if (CONST_SCALAR_INT_P (op) && width > 0)
     {
-      HOST_WIDE_INT arg0 = INTVAL (op);
-      HOST_WIDE_INT val;
+      wide_int result;
+      enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode;
+      wide_int op0 = wide_int::from_rtx (op, imode);
+
+#if TARGET_SUPPORTS_WIDE_INT == 0
+      /* This assert keeps the simplification from producing a result
+	 that cannot be represented in a CONST_DOUBLE but a lot of
+	 upstream callers expect that this function never fails to
+	 simplify something and so you if you added this to the test
+	 above the code would die later anyway.  If this assert
+	 happens, you just need to make the port support wide int.  */
+      gcc_assert (width <= HOST_BITS_PER_DOUBLE_INT); 
+#endif
 
       switch (code)
 	{
 	case NOT:
-	  val = ~ arg0;
+	  result = ~op0;
 	  break;
 
 	case NEG:
-	  val = - arg0;
+	  result = op0.neg ();
 	  break;
 
 	case ABS:
-	  val = (arg0 >= 0 ? arg0 : - arg0);
+	  result = op0.abs ();
 	  break;
 
 	case FFS:
-	  arg0 &= GET_MODE_MASK (mode);
-	  val = ffs_hwi (arg0);
+	  result = op0.ffs ();
 	  break;
 
 	case CLZ:
-	  arg0 &= GET_MODE_MASK (mode);
-	  if (arg0 == 0 && CLZ_DEFINED_VALUE_AT_ZERO (mode, val))
-	    ;
-	  else
-	    val = GET_MODE_PRECISION (mode) - floor_log2 (arg0) - 1;
+	  result = op0.clz (GET_MODE_BITSIZE (mode), 
+			    GET_MODE_PRECISION (mode));
 	  break;
 
 	case CLRSB:
-	  arg0 &= GET_MODE_MASK (mode);
-	  if (arg0 == 0)
-	    val = GET_MODE_PRECISION (mode) - 1;
-	  else if (arg0 >= 0)
-	    val = GET_MODE_PRECISION (mode) - floor_log2 (arg0) - 2;
-	  else if (arg0 < 0)
-	    val = GET_MODE_PRECISION (mode) - floor_log2 (~arg0) - 2;
+	  result = op0.clrsb (GET_MODE_BITSIZE (mode), 
+			      GET_MODE_PRECISION (mode));
 	  break;
-
+	  
 	case CTZ:
-	  arg0 &= GET_MODE_MASK (mode);
-	  if (arg0 == 0)
-	    {
-	      /* Even if the value at zero is undefined, we have to come
-		 up with some replacement.  Seems good enough.  */
-	      if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, val))
-		val = GET_MODE_PRECISION (mode);
-	    }
-	  else
-	    val = ctz_hwi (arg0);
+	  result = op0.ctz (GET_MODE_BITSIZE (mode), 
+			    GET_MODE_PRECISION (mode));
 	  break;
 
 	case POPCOUNT:
-	  arg0 &= GET_MODE_MASK (mode);
-	  val = 0;
-	  while (arg0)
-	    val++, arg0 &= arg0 - 1;
+	  result = op0.popcount (GET_MODE_BITSIZE (mode), 
+				 GET_MODE_PRECISION (mode));
 	  break;
 
 	case PARITY:
-	  arg0 &= GET_MODE_MASK (mode);
-	  val = 0;
-	  while (arg0)
-	    val++, arg0 &= arg0 - 1;
-	  val &= 1;
+	  result = op0.parity (GET_MODE_BITSIZE (mode), 
+			       GET_MODE_PRECISION (mode));
 	  break;
 
 	case BSWAP:
-	  {
-	    unsigned int s;
-
-	    val = 0;
-	    for (s = 0; s < width; s += 8)
-	      {
-		unsigned int d = width - s - 8;
-		unsigned HOST_WIDE_INT byte;
-		byte = (arg0 >> s) & 0xff;
-		val |= byte << d;
-	      }
-	  }
+	  result = op0.bswap ();
 	  break;
 
 	case TRUNCATE:
-	  val = arg0;
+	  result = op0.sext (mode);
 	  break;
 
 	case ZERO_EXTEND:
-	  /* When zero-extending a CONST_INT, we need to know its
-             original mode.  */
-	  gcc_assert (op_mode != VOIDmode);
-	  if (op_width == HOST_BITS_PER_WIDE_INT)
-	    {
-	      /* If we were really extending the mode,
-		 we would have to distinguish between zero-extension
-		 and sign-extension.  */
-	      gcc_assert (width == op_width);
-	      val = arg0;
-	    }
-	  else if (GET_MODE_BITSIZE (op_mode) < HOST_BITS_PER_WIDE_INT)
-	    val = arg0 & GET_MODE_MASK (op_mode);
-	  else
-	    return 0;
+	  result = op0.zext (mode);
 	  break;
 
 	case SIGN_EXTEND:
-	  if (op_mode == VOIDmode)
-	    op_mode = mode;
-	  op_width = GET_MODE_PRECISION (op_mode);
-	  if (op_width == HOST_BITS_PER_WIDE_INT)
-	    {
-	      /* If we were really extending the mode,
-		 we would have to distinguish between zero-extension
-		 and sign-extension.  */
-	      gcc_assert (width == op_width);
-	      val = arg0;
-	    }
-	  else if (op_width < HOST_BITS_PER_WIDE_INT)
-	    {
-	      val = arg0 & GET_MODE_MASK (op_mode);
-	      if (val_signbit_known_set_p (op_mode, val))
-		val |= ~GET_MODE_MASK (op_mode);
-	    }
-	  else
-	    return 0;
+	  result = op0.sext (mode);
 	  break;
 
 	case SQRT:
-	case FLOAT_EXTEND:
-	case FLOAT_TRUNCATE:
-	case SS_TRUNCATE:
-	case US_TRUNCATE:
-	case SS_NEG:
-	case US_NEG:
-	case SS_ABS:
-	  return 0;
-
-	default:
-	  gcc_unreachable ();
-	}
-
-      return gen_int_mode (val, mode);
-    }
-
-  /* We can do some operations on integer CONST_DOUBLEs.  Also allow
-     for a DImode operation on a CONST_INT.  */
-  else if (width <= HOST_BITS_PER_DOUBLE_INT
-	   && (CONST_DOUBLE_AS_INT_P (op) || CONST_INT_P (op)))
-    {
-      double_int first, value;
-
-      if (CONST_DOUBLE_AS_INT_P (op))
-	first = double_int::from_pair (CONST_DOUBLE_HIGH (op),
-				       CONST_DOUBLE_LOW (op));
-      else
-	first = double_int::from_shwi (INTVAL (op));
-
-      switch (code)
-	{
-	case NOT:
-	  value = ~first;
-	  break;
-
-	case NEG:
-	  value = -first;
-	  break;
-
-	case ABS:
-	  if (first.is_negative ())
-	    value = -first;
-	  else
-	    value = first;
-	  break;
-
-	case FFS:
-	  value.high = 0;
-	  if (first.low != 0)
-	    value.low = ffs_hwi (first.low);
-	  else if (first.high != 0)
-	    value.low = HOST_BITS_PER_WIDE_INT + ffs_hwi (first.high);
-	  else
-	    value.low = 0;
-	  break;
-
-	case CLZ:
-	  value.high = 0;
-	  if (first.high != 0)
-	    value.low = GET_MODE_PRECISION (mode) - floor_log2 (first.high) - 1
-	              - HOST_BITS_PER_WIDE_INT;
-	  else if (first.low != 0)
-	    value.low = GET_MODE_PRECISION (mode) - floor_log2 (first.low) - 1;
-	  else if (! CLZ_DEFINED_VALUE_AT_ZERO (mode, value.low))
-	    value.low = GET_MODE_PRECISION (mode);
-	  break;
-
-	case CTZ:
-	  value.high = 0;
-	  if (first.low != 0)
-	    value.low = ctz_hwi (first.low);
-	  else if (first.high != 0)
-	    value.low = HOST_BITS_PER_WIDE_INT + ctz_hwi (first.high);
-	  else if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, value.low))
-	    value.low = GET_MODE_PRECISION (mode);
-	  break;
-
-	case POPCOUNT:
-	  value = double_int_zero;
-	  while (first.low)
-	    {
-	      value.low++;
-	      first.low &= first.low - 1;
-	    }
-	  while (first.high)
-	    {
-	      value.low++;
-	      first.high &= first.high - 1;
-	    }
-	  break;
-
-	case PARITY:
-	  value = double_int_zero;
-	  while (first.low)
-	    {
-	      value.low++;
-	      first.low &= first.low - 1;
-	    }
-	  while (first.high)
-	    {
-	      value.low++;
-	      first.high &= first.high - 1;
-	    }
-	  value.low &= 1;
-	  break;
-
-	case BSWAP:
-	  {
-	    unsigned int s;
-
-	    value = double_int_zero;
-	    for (s = 0; s < width; s += 8)
-	      {
-		unsigned int d = width - s - 8;
-		unsigned HOST_WIDE_INT byte;
-
-		if (s < HOST_BITS_PER_WIDE_INT)
-		  byte = (first.low >> s) & 0xff;
-		else
-		  byte = (first.high >> (s - HOST_BITS_PER_WIDE_INT)) & 0xff;
-
-		if (d < HOST_BITS_PER_WIDE_INT)
-		  value.low |= byte << d;
-		else
-		  value.high |= byte << (d - HOST_BITS_PER_WIDE_INT);
-	      }
-	  }
-	  break;
-
-	case TRUNCATE:
-	  /* This is just a change-of-mode, so do nothing.  */
-	  value = first;
-	  break;
-
-	case ZERO_EXTEND:
-	  gcc_assert (op_mode != VOIDmode);
-
-	  if (op_width > HOST_BITS_PER_WIDE_INT)
-	    return 0;
-
-	  value = double_int::from_uhwi (first.low & GET_MODE_MASK (op_mode));
-	  break;
-
-	case SIGN_EXTEND:
-	  if (op_mode == VOIDmode
-	      || op_width > HOST_BITS_PER_WIDE_INT)
-	    return 0;
-	  else
-	    {
-	      value.low = first.low & GET_MODE_MASK (op_mode);
-	      if (val_signbit_known_set_p (op_mode, value.low))
-		value.low |= ~GET_MODE_MASK (op_mode);
-
-	      value.high = HWI_SIGN_EXTEND (value.low);
-	    }
-	  break;
-
-	case SQRT:
-	  return 0;
-
 	default:
 	  return 0;
 	}
 
-      return immed_double_int_const (value, mode);
+      return immed_wide_int_const (result, mode);
     }