diff mbox

patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1

Message ID 512D686B.90000@naturalbridge.com
State New
Headers show

Commit Message

Kenneth Zadeck Feb. 27, 2013, 1:59 a.m. UTC
This patch contains a large number of the changes requested by Richi.   
It does not contain any of the changes that he requested to abstract the 
storage layer.   That suggestion appears to be quite unworkable.

I believe that the wide-int class addresses the needs of gcc for 
performing math on any size integer irregardless of the platform that 
hosts the compiler.  The interface is admittedly large, but it is large 
for a reason:  these are the operations that are commonly performed by 
the client optimizations in the compiler.

I would like to get this patch preapproved for the next stage 1.

kenny
2013-2-26  Kenneth Zadeck <zadeck@naturalbridge.com>

	* Makefile.in (wide-int.c, wide-int.h): New files.
	* wide-int.c: New file containing implementation of wide_int class.
	* wide-int.h: New file containing public spec for wide_int class.

Comments

Richard Biener March 27, 2013, 2:54 p.m. UTC | #1
On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
> This patch contains a large number of the changes requested by Richi.   It
> does not contain any of the changes that he requested to abstract the
> storage layer.   That suggestion appears to be quite unworkable.
>
> I believe that the wide-int class addresses the needs of gcc for performing
> math on any size integer irregardless of the platform that hosts the
> compiler.  The interface is admittedly large, but it is large for a reason:
> these are the operations that are commonly performed by the client
> optimizations in the compiler.
>
> I would like to get this patch preapproved for the next stage 1.

Please clean from dead code like

+// using wide_int::;

and

+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wh ("wide_int::from_shwi %s " HOST_WIDE_INT_PRINT_HEX ")\n",
+             result, op0);
+#endif

and

+#ifdef NEW_REP_FOR_INT_CST
+  /* This is the code once the tree level is converted.  */
+  wide_int result;
+  int i;
+
+  tree type = TREE_TYPE (tcst);
+
+  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  result.precision = TYPE_PRECISION (type);
+  result.len = TREE_INT_CST_LEN (tcst);
+  for (i = 0; i < result.len; i++)
+    result.val[i] = TREE_INT_CST_ELT (tcst, i);
+
+  return result;
+#else

this also shows the main reason I was asking for storage abstraction.
The initialization from tree is way too expensive.

+/* Convert a integer cst into a wide int expanded to BITSIZE and
+   PRECISION.  This call is used by tree passes like vrp that expect
+   that the math is done in an infinite precision style.  BITSIZE and
+   PRECISION are generally determined to be twice the largest type
+   seen in the function.  */
+
+wide_int
+wide_int::from_tree_as_infinite_precision (const_tree tcst,
+                                          unsigned int bitsize,
+                                          unsigned int precision)
+{

I know you have converted everything, but to make this patch reviewable
I'd like you to strip the initial wide_int down to a bare minimum.

Only then people will have a reasonable chance to play with interface
changes (such as providing a storage abstraction).

+/* Check the upper HOST_WIDE_INTs of src to see if the length can be
+   shortened.  An upper HOST_WIDE_INT is unnecessary if it is all ones
+   or zeros and the top bit of the next lower word matches.
+
+   This function may change the representation of THIS, but does not
+   change the value that THIS represents.  It does not sign extend in
+   the case that the size of the mode is less than
+   HOST_BITS_PER_WIDE_INT.  */
+
+void
+wide_int::canonize ()

this shouldn't be necessary - it's an optimization - and due to value
semantics (yes - I know you have a weird mix of value semantics
and modify-in-place in wide_int) the new length should be computed
transparently when creating a new value.

Well.  Leaving wide-int.c for now.

+class wide_int {
+  /* Internal representation.  */
+
+  /* VAL is set to a size that is capable of computing a full
+     multiplication on the largest mode that is represented on the
+     target.  The full multiplication is use by tree-vrp.  tree-vpn
+     currently does a 2x largest mode by 2x largest mode yielding a 4x
+     largest mode result.  If operations are added that require larger
+     buffers, then VAL needs to be changed.  */
+  HOST_WIDE_INT val[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];

as you conver partial int modes in MAX_BITSIZE_MODE_ANY_INT the
above may come too short.  Please properly round up.

+  unsigned short len;
+  unsigned int bitsize;
+  unsigned int precision;

I see we didn't get away with this mix of bitsize and precision.  I'm probably
going to try revisit the past discussions - but can you point me to a single
place in the RTL conversion where they make a difference?  Bits beyond
precision are either undefined or properly zero-/sign-extended.  Implicit
extension beyond len val members should then provide in "valid" bits
up to bitsize (if anyone cares).  That's how double-ints work on tree
INTGER_CSTs
which only care for precision, even with partial integer mode types
(ok, I never came along one of these beasts - testcase / target?).

[abstraction possibility - have both wide_ints with actual mode and
wide_ints with arbitrary bitsize/precision]

+  enum ShiftOp {
+    NONE,
+    /* There are two uses for the wide-int shifting functions.  The
+       first use is as an emulation of the target hardware.  The
+       second use is as service routines for other optimizations.  The
+       first case needs to be identified by passing TRUNC as the value
+       of ShiftOp so that shift amount is properly handled according to the
+       SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
+       amount is always truncated by the bytesize of the mode of
+       THIS.  */
+    TRUNC
+  };

I think I have expressed my opinion on this.  (and SHIFT_COUNT_TRUNCATED
should vanish - certainly wide-int shouldn't care, so doesn't double-int)

+  enum SignOp {
+    /* Many of the math functions produce different results depending
+       on if they are SIGNED or UNSIGNED.  In general, there are two
+       different functions, whose names are prefixed with an 'S" and
+       or an 'U'.  However, for some math functions there is also a
+       routine that does not have the prefix and takes an SignOp
+       parameter of SIGNED or UNSIGNED.  */
+    SIGNED,
+    UNSIGNED
+  };

See above.  GCC standard practice is a 'unsigned uns' parameter.

+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type);
+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type,
+                                  bool *overflow);
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, enum machine_mode mode);
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, enum machine_mode mode,
+                                   bool *overflow);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0,
+                                   enum machine_mode mode);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0,
+                                   enum machine_mode mode,
+                                   bool *overflow);

way too many overloads.  Besides the "raw" ones I only expect wide-ints
to be constructed from a tree INTEGER_CST or a rtx DOUBLE_INT.

+  inline static wide_int minus_one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int minus_one (const_tree type);
+  inline static wide_int minus_one (enum machine_mode mode);
+  inline static wide_int minus_one (const wide_int &op1);

wide-ints representation should be properly sign-extended, thus all of the
possible -1 variants have the same representation.  If it were not for
providing precision and bitsize for which you have the four overloads.
Just have one.  Providing precision (or bitsize).

I see that for example in wide_int::add bitsize and precision are arbitrarily
copied from the first operand.  With the present way of dealing with them
it would sound more logical to assert that they are the same for both
operands (do both need to match?).  I'd rather see wide-int being
"arbitrary" precision/bitsize up to its supported storage size (as
double-int is).
I suppose we've been there and discussed this to death already though ;)
As you have some fused operation plus sign-/zero-extend ops already
the alternative is to always provide a precision for the result and treat the
operands as "arbitrary" precision (that way wide_int::minus_one can
simply return a sign-extended precision 1 -1).

Btw, wide_int::add does not look at bitsize at all, so it clearly is redundant
information.  Grepping for uses of bitsize shows up only maintaining and
copying around this information as well.  please remove bitsize.

Ok, enough for today.

Thanks,
Richard.

> kenny
>
Richard Biener April 2, 2013, 3:04 p.m. UTC | #2
On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
> This patch contains a large number of the changes requested by Richi.   It
> does not contain any of the changes that he requested to abstract the
> storage layer.   That suggestion appears to be quite unworkable.

I of course took this claim as a challenge ... with the following result.  It is
of course quite workable ;)

The attached patch implements the core wide-int class and three storage
models (fixed size for things like plain HWI and double-int, variable size
similar to how your wide-int works and an adaptor for the double-int as
contained in trees).  With that you can now do

HOST_WIDE_INT
wi_test (tree x)
{
  // template argument deduction doesn't do the magic we want it to do
  // to make this kind of implicit conversions work
  // overload resolution considers this kind of conversions so we
  // need some magic that combines both ... but seeding the overload
  // set with some instantiations doesn't seem to be possible :/
  // wide_int<> w = x + 1;
  wide_int<> w;
  w += x;
  w += 1;
  // template argument deduction doesn't deduce the return value type,
  // not considering the template default argument either ...
  // w = wi (x) + 1;
  // we could support this by providing rvalue-to-lvalue promotion
  // via a traits class?
  // otoh it would lead to sub-optimal code anyway so we should
  // make the result available as reference parameter and only support
  // wide_int <> res; add (res, x, 1); ?
  w = wi (x).operator+<wide_int<> >(1);
  wide_int<>::add(w, x, 1);
  return w.to_hwi ();
}

we are somewhat limited with C++ unless we want to get really fancy.
Eventually providing operator+ just doesn't make much sense for
generic wide-int combinations (though then the issue is its operands
are no longer commutative which I think is the case with your wide-int
or double-int as well - they don't suport "1 + wide_int" for obvious reasons).

So there are implementation design choices left undecided.

Oh, and the operation implementations are crap (they compute nonsense).

But you should get the idea.

Richard.
Kenneth Zadeck April 2, 2013, 5:35 p.m. UTC | #3
Yes, I agree that you win the challenge that it can be done.    What you 
have always failed to address is why anyone would want to do this.  Or 
how this would at all be desirable.    But I completely agree that from 
a purely abstract point of view you can add a storage model.

Now here is why we REALLY do not want to go down this road:

1)  The following comment from your earlier mail is completely wrong

> +#ifdef NEW_REP_FOR_INT_CST
> +  /* This is the code once the tree level is converted.  */
> +  wide_int result;
> +  int i;
> +
> +  tree type = TREE_TYPE (tcst);
> +
> +  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
> +  result.precision = TYPE_PRECISION (type);
> +  result.len = TREE_INT_CST_LEN (tcst);
> +  for (i = 0; i < result.len; i++)
> +    result.val[i] = TREE_INT_CST_ELT (tcst, i);
> +
> +  return result;
> +#else

> this also shows the main reason I was asking for storage abstraction.
> The initialization from tree is way too expensive.

In almost all cases, constants will fit in a single HWI.  Thus, the only 
thing that you are copying is the length and a single HWI. So you are 
dragging in a lot of machinery just to save these two copies?   
Certainly there has to be more to it than that.

2)  You present this as if the implementor actually should care about 
the implementation and you give 3 alternatives:  the double_int, the 
current one, and HWI.     We have tried to make it so that the client 
should not care.   Certainly in my experience here, I have not found a 
place to care.

In my opinion double_int needs to go away.  That is the main thrust of 
my patches.   There is no place in a compiler for an abi that depends on 
constants fitting into 2 two words whose size is defined by the host.    
This is not a beauty contest argument, we have public ports are 
beginning to use modes that are larger than two x86-64 HWIs and i have a 
private port that has such modes and it is my experience that any pass 
that uses this interface has one of three behaviors: it silently gets 
the wrong answer, it ices, or it fails to do the transformation.  If we 
leave double_int as an available option, then any use of it potentially 
will have one of these three behaviors.  And so one of my strong 
objections to this direction is that i do not want to fight this kind of 
bug for the rest of my life.    Having a single storage model that just 
always works is in my opinion a highly desirable option.  What you have 
never answered in a concrete manner is, if we decide to provide this 
generality, what it would be used for.    There is no place in a 
portable compiler where the right answer for every target is two HOST 
wide integers.

However, i will admit that the HWI option has some merits.   We try to 
address this in our implementation by dividing what is done inline in 
wide-int.h to the cases that fit in an HWI and then only drop into the 
heavy code in wide-int.c if mode is larger (which it rarely will be).   
However, a case could be made that for certain kinds of things like 
string lengths and such, we could use another interface or as you argue, 
a different storage model with the same interface.   I just do not see 
that the cost of the conversion code is really going to show up on 
anyone's radar.

3) your trick will work at the tree level, but not at the rtl level.   
The wide-int code cannot share storage with the CONST_INTs.    We tried 
this, and there are a million bugs that would have to be fixed to make 
it work.    It could have worked if CONST_INTs had carried a mode 
around, but since they do not, you end up with the same CONST_INT 
sharing the rep for several different types and that just did not work 
unless you are willing to do substantial cleanups.

On 04/02/2013 11:04 AM, Richard Biener wrote:
> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> This patch contains a large number of the changes requested by Richi.   It
>> does not contain any of the changes that he requested to abstract the
>> storage layer.   That suggestion appears to be quite unworkable.
> I of course took this claim as a challenge ... with the following result.  It is
> of course quite workable ;)
>
> The attached patch implements the core wide-int class and three storage
> models (fixed size for things like plain HWI and double-int, variable size
> similar to how your wide-int works and an adaptor for the double-int as
> contained in trees).  With that you can now do
>
> HOST_WIDE_INT
> wi_test (tree x)
> {
>    // template argument deduction doesn't do the magic we want it to do
>    // to make this kind of implicit conversions work
>    // overload resolution considers this kind of conversions so we
>    // need some magic that combines both ... but seeding the overload
>    // set with some instantiations doesn't seem to be possible :/
>    // wide_int<> w = x + 1;
>    wide_int<> w;
>    w += x;
>    w += 1;
>    // template argument deduction doesn't deduce the return value type,
>    // not considering the template default argument either ...
>    // w = wi (x) + 1;
>    // we could support this by providing rvalue-to-lvalue promotion
>    // via a traits class?
>    // otoh it would lead to sub-optimal code anyway so we should
>    // make the result available as reference parameter and only support
>    // wide_int <> res; add (res, x, 1); ?
>    w = wi (x).operator+<wide_int<> >(1);
>    wide_int<>::add(w, x, 1);
>    return w.to_hwi ();
> }
>
> we are somewhat limited with C++ unless we want to get really fancy.
> Eventually providing operator+ just doesn't make much sense for
> generic wide-int combinations (though then the issue is its operands
> are no longer commutative which I think is the case with your wide-int
> or double-int as well - they don't suport "1 + wide_int" for obvious reasons).
>
> So there are implementation design choices left undecided.
>
> Oh, and the operation implementations are crap (they compute nonsense).
>
> But you should get the idea.
>
> Richard.
Richard Biener April 3, 2013, 9:17 a.m. UTC | #4
On Tue, Apr 2, 2013 at 7:35 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> Yes, I agree that you win the challenge that it can be done.    What you
> have always failed to address is why anyone would want to do this.  Or how
> this would at all be desirable.    But I completely agree that from a purely
> abstract point of view you can add a storage model.
>
> Now here is why we REALLY do not want to go down this road:
>
> 1)  The following comment from your earlier mail is completely wrong
>
>
>> +#ifdef NEW_REP_FOR_INT_CST
>> +  /* This is the code once the tree level is converted.  */
>> +  wide_int result;
>> +  int i;
>> +
>> +  tree type = TREE_TYPE (tcst);
>> +
>> +  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
>> +  result.precision = TYPE_PRECISION (type);
>> +  result.len = TREE_INT_CST_LEN (tcst);
>> +  for (i = 0; i < result.len; i++)
>> +    result.val[i] = TREE_INT_CST_ELT (tcst, i);
>> +
>> +  return result;
>> +#else
>
>
>> this also shows the main reason I was asking for storage abstraction.
>> The initialization from tree is way too expensive.
>
>
> In almost all cases, constants will fit in a single HWI.  Thus, the only
> thing that you are copying is the length and a single HWI. So you are
> dragging in a lot of machinery just to save these two copies?   Certainly
> there has to be more to it than that.

In the end you will have a variable-size storage in TREE_INT_CST thus
you will have at least to emit _code_ copying over meta-data and data
from the tree representation to the wide-int (similar for RTX CONST_DOUBLE/INT).
I'm objecting to the amount of code you emit and agree that the runtime
cost is copying the meta-data (hopefully optimizable via CSE / SRA)
and in most cases one (or two) iterations of the loop copying the data
(not optimizable).

> 2)  You present this as if the implementor actually should care about the
> implementation and you give 3 alternatives:  the double_int, the current
> one, and HWI.     We have tried to make it so that the client should not
> care.   Certainly in my experience here, I have not found a place to care.

Well, similar as for the copying overhead for tree your approach requires
overloading operations for HOST_WIDE_INT operands to be able to
say wi + 1 (which is certainly desirable), or the overhead of using
wide_int_one ().

> In my opinion double_int needs to go away.  That is the main thrust of my
> patches.   There is no place in a compiler for an abi that depends on
> constants fitting into 2 two words whose size is defined by the host.

That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
preserve a way to ask for an integer type on the host with (at least)
N bits.  Almost all double-int users really ask for an integer type on the
host that has at least as many bits as the pointer representation (or
word_mode) on
the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
precision - that is just what happens to be there.  Thus I am providing
a way to say get me a host integer with at least N bits (VRP asks for
this, for example).

What I was asking for is that whatever can provide the above should share
the functional interface with wide-int (or the othert way around).  And I
was claiming that wide-int is too fat, because current users of double-int
eventually store double-ints permanently.

> This is not a beauty contest argument, we have public ports are beginning to
> use modes that are larger than two x86-64 HWIs and i have a private port
> that has such modes and it is my experience that any pass that uses this
> interface has one of three behaviors: it silently gets the wrong answer, it
> ices, or it fails to do the transformation.  If we leave double_int as an
> available option, then any use of it potentially will have one of these
> three behaviors.  And so one of my strong objections to this direction is
> that i do not want to fight this kind of bug for the rest of my life.
> Having a single storage model that just always works is in my opinion a
> highly desirable option.  What you have never answered in a concrete manner
> is, if we decide to provide this generality, what it would be used for.
> There is no place in a portable compiler where the right answer for every
> target is two HOST wide integers.
>
> However, i will admit that the HWI option has some merits.   We try to
> address this in our implementation by dividing what is done inline in
> wide-int.h to the cases that fit in an HWI and then only drop into the heavy
> code in wide-int.c if mode is larger (which it rarely will be).   However, a
> case could be made that for certain kinds of things like string lengths and
> such, we could use another interface or as you argue, a different storage
> model with the same interface.   I just do not see that the cost of the
> conversion code is really going to show up on anyone's radar.

What's the issue with abstracting away the model so a fixed-size 'len'
is possible?  (let away the argument that this would easily allow an
adaptor to tree)

> 3) your trick will work at the tree level, but not at the rtl level.   The
> wide-int code cannot share storage with the CONST_INTs.    We tried this,
> and there are a million bugs that would have to be fixed to make it work.
> It could have worked if CONST_INTs had carried a mode around, but since they
> do not, you end up with the same CONST_INT sharing the rep for several
> different types and that just did not work unless you are willing to do
> substantial cleanups.

I don't believe you.  I am only asking for the adaptors to tree and RTL to
work in an RVALUE-ish way (no modification, as obviously RTL and tree
constants may be shared).  I think your claim is because you have that
precision and bitsize members in your wide-int which I believe is a
design red herring.  I suppose we should concentrate on addressing that
one first.  Thus, let me repeat a few questions on your design to eventually
let you understand my concerns:

Given two wide-ints, a and b, what precision will the result of
     a + b
have?  Consider a having precision 32 and b having precision 64
on a 32-bit HWI host.

You define wide-int to have no sign information:

+   The representation does not contain any information about
+   signedness of the represented value, so it can be used to represent
+   both signed and unsigned numbers.  For operations where the results
+   depend on signedness (division, comparisons), the signedness must
+   be specified separately.  For operations where the signness
+   matters, one of the operands to the operation specifies either
+   wide_int::SIGNED or wide_int::UNSIGNED.

but the result of operations on mixed precision operands _does_ depend
on the sign, nevertheless most operations do not get a signedness argument.
Nor would that help, because it depends on the signedness of the individual
operands!

double-int get's around this by having a common "precision" to which
all smaller precision constants have to be sign-/zero-extended.  So
does CONST_INT and CONST_DOUBLE.

Note that even with same precision you have introduced the same problem
with the variable len.

My proposal is simple - all wide-ints are signed!  wide-int is basically
an arbitrary precision signed integer format.  The sign is encoded in
the most significant bit of the last active HWI of the representation
(val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
with less precision than len * HOST_BITS_PER_WIDE_INT are
properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.

This let's you define mixed len operations by implicitely sign-/zero-extending
the operands to whatever len is required for the operation.

Furthermore it allows you to get rid of the precision member (and
bitsize anyway).
Conversion from tree / RTL requires information on the signedness of the
input (trivial for tree, for RTL all constants are signed - well,
sign-extended).
Whenever you want to transfer the wide-int to tree / RTL you have to
sign-/zero-extend according to the desired precision.  If you need sth else
than arbitrary precision arithmetic you have to explicitely truncate / extend
at suitable places - with overflow checking being trivial here.  For
optimization
purposes selected operations may benefit from a combined implementation
receiving a target precision and signedness.  Whatever extra meta-data
RTL requires does not belong in wide-int but in the RTX.  Trivially
a mode comes to my mind (on tree we have a type), and trivially
each RTX has a mode.  And each mode has a precision and bitsize.
It lacks a sign, so all RTL integer constants are sign-extended for
encoding efficiency purposes.  mixed-mode operations will not
occur (mixed len operations will), mixed-mode ops are exclusively
sign-/zero-extensions and truncations.

Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
precision would be { -1 }.

That was my proposal.  Now, can you please properly specify yours?

Thanks,
Richard.

>
> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>
>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> This patch contains a large number of the changes requested by Richi.
>>> It
>>> does not contain any of the changes that he requested to abstract the
>>> storage layer.   That suggestion appears to be quite unworkable.
>>
>> I of course took this claim as a challenge ... with the following result.
>> It is
>> of course quite workable ;)
>>
>> The attached patch implements the core wide-int class and three storage
>> models (fixed size for things like plain HWI and double-int, variable size
>> similar to how your wide-int works and an adaptor for the double-int as
>> contained in trees).  With that you can now do
>>
>> HOST_WIDE_INT
>> wi_test (tree x)
>> {
>>    // template argument deduction doesn't do the magic we want it to do
>>    // to make this kind of implicit conversions work
>>    // overload resolution considers this kind of conversions so we
>>    // need some magic that combines both ... but seeding the overload
>>    // set with some instantiations doesn't seem to be possible :/
>>    // wide_int<> w = x + 1;
>>    wide_int<> w;
>>    w += x;
>>    w += 1;
>>    // template argument deduction doesn't deduce the return value type,
>>    // not considering the template default argument either ...
>>    // w = wi (x) + 1;
>>    // we could support this by providing rvalue-to-lvalue promotion
>>    // via a traits class?
>>    // otoh it would lead to sub-optimal code anyway so we should
>>    // make the result available as reference parameter and only support
>>    // wide_int <> res; add (res, x, 1); ?
>>    w = wi (x).operator+<wide_int<> >(1);
>>    wide_int<>::add(w, x, 1);
>>    return w.to_hwi ();
>> }
>>
>> we are somewhat limited with C++ unless we want to get really fancy.
>> Eventually providing operator+ just doesn't make much sense for
>> generic wide-int combinations (though then the issue is its operands
>> are no longer commutative which I think is the case with your wide-int
>> or double-int as well - they don't suport "1 + wide_int" for obvious
>> reasons).
>>
>> So there are implementation design choices left undecided.
>>
>> Oh, and the operation implementations are crap (they compute nonsense).
>>
>> But you should get the idea.
>>
>> Richard.
>
>
Kenneth Zadeck April 3, 2013, 12:05 p.m. UTC | #5
On 04/03/2013 05:17 AM, Richard Biener wrote:

> In the end you will have a variable-size storage in TREE_INT_CST thus
> you will have at least to emit _code_ copying over meta-data and data
> from the tree representation to the wide-int (similar for RTX CONST_DOUBLE/INT).
> I'm objecting to the amount of code you emit and agree that the runtime
> cost is copying the meta-data (hopefully optimizable via CSE / SRA)
> and in most cases one (or two) iterations of the loop copying the data
> (not optimizable).
i did get rid of the bitsize in the wide-int patch so at this point the 
meta data is the precision and the len.
not really a lot here.   As usual we pay a high price in gcc for not 
pushing the tree rep down into the rtl level, then it would have been 
acceptable to have the tree type bleed into the wide-int code.


>> 2)  You present this as if the implementor actually should care about the
>> implementation and you give 3 alternatives:  the double_int, the current
>> one, and HWI.     We have tried to make it so that the client should not
>> care.   Certainly in my experience here, I have not found a place to care.
> Well, similar as for the copying overhead for tree your approach requires
> overloading operations for HOST_WIDE_INT operands to be able to
> say wi + 1 (which is certainly desirable), or the overhead of using
> wide_int_one ().
>
>> In my opinion double_int needs to go away.  That is the main thrust of my
>> patches.   There is no place in a compiler for an abi that depends on
>> constants fitting into 2 two words whose size is defined by the host.
> That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
> preserve a way to ask for an integer type on the host with (at least)
> N bits.  Almost all double-int users really ask for an integer type on the
> host that has at least as many bits as the pointer representation (or
> word_mode) on
> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
> targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
> precision - that is just what happens to be there.  Thus I am providing
> a way to say get me a host integer with at least N bits (VRP asks for
> this, for example).
>
> What I was asking for is that whatever can provide the above should share
> the functional interface with wide-int (or the othert way around).  And I
> was claiming that wide-int is too fat, because current users of double-int
> eventually store double-ints permanently.
The problem is that, in truth, double int is too fat. 99.something% of 
all constants fit in 1 hwi and that is likely to be true forever (i 
understand that tree vpn may need some thought here).  The rtl level, 
which has, for as long as i have known it, had 2 reps for integer 
constants. So it was relatively easy to slide the CONST_WIDE_INT in.  It 
seems like the right trickery here rather than adding a storage model 
for wide-ints might be a way to use the c++ to invisibly support several 
(and by "several" i really mean 2) classes of TREE_CSTs.

>
>> This is not a beauty contest argument, we have public ports are beginning to
>> use modes that are larger than two x86-64 HWIs and i have a private port
>> that has such modes and it is my experience that any pass that uses this
>> interface has one of three behaviors: it silently gets the wrong answer, it
>> ices, or it fails to do the transformation.  If we leave double_int as an
>> available option, then any use of it potentially will have one of these
>> three behaviors.  And so one of my strong objections to this direction is
>> that i do not want to fight this kind of bug for the rest of my life.
>> Having a single storage model that just always works is in my opinion a
>> highly desirable option.  What you have never answered in a concrete manner
>> is, if we decide to provide this generality, what it would be used for.
>> There is no place in a portable compiler where the right answer for every
>> target is two HOST wide integers.
>>
>> However, i will admit that the HWI option has some merits.   We try to
>> address this in our implementation by dividing what is done inline in
>> wide-int.h to the cases that fit in an HWI and then only drop into the heavy
>> code in wide-int.c if mode is larger (which it rarely will be).   However, a
>> case could be made that for certain kinds of things like string lengths and
>> such, we could use another interface or as you argue, a different storage
>> model with the same interface.   I just do not see that the cost of the
>> conversion code is really going to show up on anyone's radar.
> What's the issue with abstracting away the model so a fixed-size 'len'
> is possible?  (let away the argument that this would easily allow an
> adaptor to tree)
I have a particularly pessimistic perspective because i have already 
written most of this patch.   It is not that i do not want to change 
that code, it is that i have seen a certain set of mistakes that were 
made and i do not want to fix them more than once.   At the rtl level 
you can see the transition from only supporting 32 bit ints to 
supporting 64 bit its to finally supporting two HWIs and that transition 
code is not pretty.  My rtl patch fixes the places where an optimization 
was only made if the data type was 32 bits or smaller as well as the 
places where the optimization was made only if the data type is smaller 
than 64 bits (in addition to fixing all of the places where the code 
ices or simply gets the wrong answer if it is larger than TImode.)  The 
tree level is only modestly better, I believe only because it is newer. 
I have not seen any 32 bit only code, but it is littered with 
transformations that only work for 64 bits.   What is that 64 bit only 
code going to look like in 5 years?

I want to make it easier to write the general code than to write the 
code that only solves the problem for the size port that the implementor 
is currently working on.   So I perceive the storage model as a way to 
keep having to fight this battle forever because it will allow the 
implementor to make a decision that the optimization only needs to be 
done for a particular sized integer.

However, i get the fact that from your perspective, what you really want 
is a solution to the data structure problem in tree-vrp.  My patch for 
tree vrp scans the entire function to find the largest type used in the 
function and then does all of the math at 2x that size.  But i have to 
admit that i thought it was weird that you use tree cst as your long 
term storage.   If this were my pass, i would have allocated a 2d array 
of some type that was as large as the function in 1d and twice as large 
as the largest int used in the other dimension and not overloaded 
tree-cst and then had a set of friend functions in double int to get in 
and out. Of course you do not need friends in double int because the rep 
is exposed, but in wide-int that is now hidden since it now is purely 
functional.

I just have to believe that there is a better way to do tree-vrp than 
messing up wide-int for the rest of the compiler.

>> 3) your trick will work at the tree level, but not at the rtl level.   The
>> wide-int code cannot share storage with the CONST_INTs.    We tried this,
>> and there are a million bugs that would have to be fixed to make it work.
>> It could have worked if CONST_INTs had carried a mode around, but since they
>> do not, you end up with the same CONST_INT sharing the rep for several
>> different types and that just did not work unless you are willing to do
>> substantial cleanups.
> I don't believe you.  I am only asking for the adaptors to tree and RTL to
> work in an RVALUE-ish way (no modification, as obviously RTL and tree
> constants may be shared).  I think your claim is because you have that
> precision and bitsize members in your wide-int which I believe is a
> design red herring.  I suppose we should concentrate on addressing that
> one first.  Thus, let me repeat a few questions on your design to eventually
> let you understand my concerns:
>
> Given two wide-ints, a and b, what precision will the result of
>       a + b
> have?  Consider a having precision 32 and b having precision 64
> on a 32-bit HWI host.
>
> You define wide-int to have no sign information:
>
> +   The representation does not contain any information about
> +   signedness of the represented value, so it can be used to represent
> +   both signed and unsigned numbers.  For operations where the results
> +   depend on signedness (division, comparisons), the signedness must
> +   be specified separately.  For operations where the signness
> +   matters, one of the operands to the operation specifies either
> +   wide_int::SIGNED or wide_int::UNSIGNED.
>
> but the result of operations on mixed precision operands _does_ depend
> on the sign, nevertheless most operations do not get a signedness argument.
> Nor would that help, because it depends on the signedness of the individual
> operands!
>
> double-int get's around this by having a common "precision" to which
> all smaller precision constants have to be sign-/zero-extended.  So
> does CONST_INT and CONST_DOUBLE.
>
> Note that even with same precision you have introduced the same problem
> with the variable len.
>
> My proposal is simple - all wide-ints are signed!  wide-int is basically
> an arbitrary precision signed integer format.  The sign is encoded in
> the most significant bit of the last active HWI of the representation
> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
> with less precision than len * HOST_BITS_PER_WIDE_INT are
> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.
>
> This let's you define mixed len operations by implicitely sign-/zero-extending
> the operands to whatever len is required for the operation.
>
> Furthermore it allows you to get rid of the precision member (and
> bitsize anyway).
> Conversion from tree / RTL requires information on the signedness of the
> input (trivial for tree, for RTL all constants are signed - well,
> sign-extended).
> Whenever you want to transfer the wide-int to tree / RTL you have to
> sign-/zero-extend according to the desired precision.  If you need sth else
> than arbitrary precision arithmetic you have to explicitely truncate / extend
> at suitable places - with overflow checking being trivial here.  For
> optimization
> purposes selected operations may benefit from a combined implementation
> receiving a target precision and signedness.  Whatever extra meta-data
> RTL requires does not belong in wide-int but in the RTX.  Trivially
> a mode comes to my mind (on tree we have a type), and trivially
> each RTX has a mode.  And each mode has a precision and bitsize.
> It lacks a sign, so all RTL integer constants are sign-extended for
> encoding efficiency purposes.  mixed-mode operations will not
> occur (mixed len operations will), mixed-mode ops are exclusively
> sign-/zero-extensions and truncations.
>
> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
> precision would be { -1 }.
>
> That was my proposal.  Now, can you please properly specify yours?
>
> Thanks,
> Richard.
>
>> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>>> <zadeck@naturalbridge.com> wrote:
>>>> This patch contains a large number of the changes requested by Richi.
>>>> It
>>>> does not contain any of the changes that he requested to abstract the
>>>> storage layer.   That suggestion appears to be quite unworkable.
>>> I of course took this claim as a challenge ... with the following result.
>>> It is
>>> of course quite workable ;)
>>>
>>> The attached patch implements the core wide-int class and three storage
>>> models (fixed size for things like plain HWI and double-int, variable size
>>> similar to how your wide-int works and an adaptor for the double-int as
>>> contained in trees).  With that you can now do
>>>
>>> HOST_WIDE_INT
>>> wi_test (tree x)
>>> {
>>>     // template argument deduction doesn't do the magic we want it to do
>>>     // to make this kind of implicit conversions work
>>>     // overload resolution considers this kind of conversions so we
>>>     // need some magic that combines both ... but seeding the overload
>>>     // set with some instantiations doesn't seem to be possible :/
>>>     // wide_int<> w = x + 1;
>>>     wide_int<> w;
>>>     w += x;
>>>     w += 1;
>>>     // template argument deduction doesn't deduce the return value type,
>>>     // not considering the template default argument either ...
>>>     // w = wi (x) + 1;
>>>     // we could support this by providing rvalue-to-lvalue promotion
>>>     // via a traits class?
>>>     // otoh it would lead to sub-optimal code anyway so we should
>>>     // make the result available as reference parameter and only support
>>>     // wide_int <> res; add (res, x, 1); ?
>>>     w = wi (x).operator+<wide_int<> >(1);
>>>     wide_int<>::add(w, x, 1);
>>>     return w.to_hwi ();
>>> }
>>>
>>> we are somewhat limited with C++ unless we want to get really fancy.
>>> Eventually providing operator+ just doesn't make much sense for
>>> generic wide-int combinations (though then the issue is its operands
>>> are no longer commutative which I think is the case with your wide-int
>>> or double-int as well - they don't suport "1 + wide_int" for obvious
>>> reasons).
>>>
>>> So there are implementation design choices left undecided.
>>>
>>> Oh, and the operation implementations are crap (they compute nonsense).
>>>
>>> But you should get the idea.
>>>
>>> Richard.
>>
Richard Biener April 3, 2013, 1:53 p.m. UTC | #6
On Wed, Apr 3, 2013 at 2:05 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>
> On 04/03/2013 05:17 AM, Richard Biener wrote:
>
>> In the end you will have a variable-size storage in TREE_INT_CST thus
>> you will have at least to emit _code_ copying over meta-data and data
>> from the tree representation to the wide-int (similar for RTX
>> CONST_DOUBLE/INT).
>> I'm objecting to the amount of code you emit and agree that the runtime
>> cost is copying the meta-data (hopefully optimizable via CSE / SRA)
>> and in most cases one (or two) iterations of the loop copying the data
>> (not optimizable).
>
> i did get rid of the bitsize in the wide-int patch so at this point the meta
> data is the precision and the len.
> not really a lot here.   As usual we pay a high price in gcc for not pushing
> the tree rep down into the rtl level, then it would have been acceptable to
> have the tree type bleed into the wide-int code.
>
>
>
>>> 2)  You present this as if the implementor actually should care about the
>>> implementation and you give 3 alternatives:  the double_int, the current
>>> one, and HWI.     We have tried to make it so that the client should not
>>> care.   Certainly in my experience here, I have not found a place to
>>> care.
>>
>> Well, similar as for the copying overhead for tree your approach requires
>> overloading operations for HOST_WIDE_INT operands to be able to
>> say wi + 1 (which is certainly desirable), or the overhead of using
>> wide_int_one ().
>>
>>> In my opinion double_int needs to go away.  That is the main thrust of my
>>> patches.   There is no place in a compiler for an abi that depends on
>>> constants fitting into 2 two words whose size is defined by the host.
>>
>> That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
>> preserve a way to ask for an integer type on the host with (at least)
>> N bits.  Almost all double-int users really ask for an integer type on the
>> host that has at least as many bits as the pointer representation (or
>> word_mode) on
>> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
>> targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
>> precision - that is just what happens to be there.  Thus I am providing
>> a way to say get me a host integer with at least N bits (VRP asks for
>> this, for example).
>>
>> What I was asking for is that whatever can provide the above should share
>> the functional interface with wide-int (or the othert way around).  And I
>> was claiming that wide-int is too fat, because current users of double-int
>> eventually store double-ints permanently.
>
> The problem is that, in truth, double int is too fat. 99.something% of all
> constants fit in 1 hwi and that is likely to be true forever (i understand
> that tree vpn may need some thought here).  The rtl level, which has, for as
> long as i have known it, had 2 reps for integer constants. So it was
> relatively easy to slide the CONST_WIDE_INT in.  It seems like the right
> trickery here rather than adding a storage model for wide-ints might be a
> way to use the c++ to invisibly support several (and by "several" i really
> mean 2) classes of TREE_CSTs.

The truth is that _now_ TREE_INT_CSTs use double-ints and we have
CONST_INT and CONST_DOUBLE.  What I (and you) propose would
get us to use variable-size storage for both, allowing to just use a single
HOST_WIDE_INT in the majority of cases.  In my view the constant
length of the variable-size storage for TREE_INT_CSTs is determined
by its type (thus, it doesn't have "optimized" variable-size storage
but "unoptimized" fixed-size storage based on the maximum storage
requirement for the type).  Similar for RTX CONST_INT which would
have fixed-size storage based on the mode-size of the constant.
Using optimized space (thus using the encoding properties) requires you
to fit the 'short len' somewhere which possibly will not pay off in the end
(for tree we do have that storage available, so we could go with optimized
storage for it, not sure with RTL, I don't see available space there).

>>> This is not a beauty contest argument, we have public ports are beginning
>>> to
>>> use modes that are larger than two x86-64 HWIs and i have a private port
>>> that has such modes and it is my experience that any pass that uses this
>>> interface has one of three behaviors: it silently gets the wrong answer,
>>> it
>>> ices, or it fails to do the transformation.  If we leave double_int as an
>>> available option, then any use of it potentially will have one of these
>>> three behaviors.  And so one of my strong objections to this direction is
>>> that i do not want to fight this kind of bug for the rest of my life.
>>> Having a single storage model that just always works is in my opinion a
>>> highly desirable option.  What you have never answered in a concrete
>>> manner
>>> is, if we decide to provide this generality, what it would be used for.
>>> There is no place in a portable compiler where the right answer for every
>>> target is two HOST wide integers.
>>>
>>> However, i will admit that the HWI option has some merits.   We try to
>>> address this in our implementation by dividing what is done inline in
>>> wide-int.h to the cases that fit in an HWI and then only drop into the
>>> heavy
>>> code in wide-int.c if mode is larger (which it rarely will be).
>>> However, a
>>> case could be made that for certain kinds of things like string lengths
>>> and
>>> such, we could use another interface or as you argue, a different storage
>>> model with the same interface.   I just do not see that the cost of the
>>> conversion code is really going to show up on anyone's radar.
>>
>> What's the issue with abstracting away the model so a fixed-size 'len'
>> is possible?  (let away the argument that this would easily allow an
>> adaptor to tree)
>
> I have a particularly pessimistic perspective because i have already written
> most of this patch.   It is not that i do not want to change that code, it
> is that i have seen a certain set of mistakes that were made and i do not
> want to fix them more than once.   At the rtl level you can see the
> transition from only supporting 32 bit ints to supporting 64 bit its to
> finally supporting two HWIs and that transition code is not pretty.  My rtl
> patch fixes the places where an optimization was only made if the data type
> was 32 bits or smaller as well as the places where the optimization was made
> only if the data type is smaller than 64 bits (in addition to fixing all of
> the places where the code ices or simply gets the wrong answer if it is
> larger than TImode.)  The tree level is only modestly better, I believe only
> because it is newer. I have not seen any 32 bit only code, but it is
> littered with transformations that only work for 64 bits.   What is that 64
> bit only code going to look like in 5 years?

The user interface of wide-int does not depend on whether a storage model
is abstracted or not.  If you take advantage of the storage model by
making its interface leaner then it will.  But I guess converting everything
before settling on the wide-int interface may not have been the wisest
choice in the end (providing a wide-int that can literally replace double-int
would have got you testing coverage without any change besides
double-int.[ch] and wide-int.[ch]).

> I want to make it easier to write the general code than to write the code
> that only solves the problem for the size port that the implementor is
> currently working on.   So I perceive the storage model as a way to keep
> having to fight this battle forever because it will allow the implementor to
> make a decision that the optimization only needs to be done for a particular
> sized integer.
>
> However, i get the fact that from your perspective, what you really want is
> a solution to the data structure problem in tree-vrp.

No, that's just a convenient example.  What I really want is a wide-int
that is less visibly a replacement for CONST_DOUBLE.

>  My patch for tree vrp
> scans the entire function to find the largest type used in the function and
> then does all of the math at 2x that size.  But i have to admit that i
> thought it was weird that you use tree cst as your long term storage.   If
> this were my pass, i would have allocated a 2d array of some type that was
> as large as the function in 1d and twice as large as the largest int used in
> the other dimension and not overloaded tree-cst and then had a set of friend
> functions in double int to get in and out. Of course you do not need friends
> in double int because the rep is exposed, but in wide-int that is now hidden
> since it now is purely functional.
>
> I just have to believe that there is a better way to do tree-vrp than
> messing up wide-int for the rest of the compiler.

It's not "messing up", it's making wide-int a generally useful thing and
not tying it so closely to RTL.

>>> 3) your trick will work at the tree level, but not at the rtl level.
>>> The
>>> wide-int code cannot share storage with the CONST_INTs.    We tried this,
>>> and there are a million bugs that would have to be fixed to make it work.
>>> It could have worked if CONST_INTs had carried a mode around, but since
>>> they
>>> do not, you end up with the same CONST_INT sharing the rep for several
>>> different types and that just did not work unless you are willing to do
>>> substantial cleanups.
>>
>> I don't believe you.  I am only asking for the adaptors to tree and RTL to
>> work in an RVALUE-ish way (no modification, as obviously RTL and tree
>> constants may be shared).  I think your claim is because you have that
>> precision and bitsize members in your wide-int which I believe is a
>> design red herring.  I suppose we should concentrate on addressing that
>> one first.  Thus, let me repeat a few questions on your design to
>> eventually
>> let you understand my concerns:
>>
>> Given two wide-ints, a and b, what precision will the result of
>>       a + b
>> have?  Consider a having precision 32 and b having precision 64
>> on a 32-bit HWI host.
>>
>> You define wide-int to have no sign information:
>>
>> +   The representation does not contain any information about
>> +   signedness of the represented value, so it can be used to represent
>> +   both signed and unsigned numbers.  For operations where the results
>> +   depend on signedness (division, comparisons), the signedness must
>> +   be specified separately.  For operations where the signness
>> +   matters, one of the operands to the operation specifies either
>> +   wide_int::SIGNED or wide_int::UNSIGNED.
>>
>> but the result of operations on mixed precision operands _does_ depend
>> on the sign, nevertheless most operations do not get a signedness
>> argument.
>> Nor would that help, because it depends on the signedness of the
>> individual
>> operands!
>>
>> double-int get's around this by having a common "precision" to which
>> all smaller precision constants have to be sign-/zero-extended.  So
>> does CONST_INT and CONST_DOUBLE.
>>
>> Note that even with same precision you have introduced the same problem
>> with the variable len.
>>
>> My proposal is simple - all wide-ints are signed!  wide-int is basically
>> an arbitrary precision signed integer format.  The sign is encoded in
>> the most significant bit of the last active HWI of the representation
>> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
>> with less precision than len * HOST_BITS_PER_WIDE_INT are
>> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.
>>
>> This let's you define mixed len operations by implicitely
>> sign-/zero-extending
>> the operands to whatever len is required for the operation.
>>
>> Furthermore it allows you to get rid of the precision member (and
>> bitsize anyway).
>> Conversion from tree / RTL requires information on the signedness of the
>> input (trivial for tree, for RTL all constants are signed - well,
>> sign-extended).
>> Whenever you want to transfer the wide-int to tree / RTL you have to
>> sign-/zero-extend according to the desired precision.  If you need sth
>> else
>> than arbitrary precision arithmetic you have to explicitely truncate /
>> extend
>> at suitable places - with overflow checking being trivial here.  For
>> optimization
>> purposes selected operations may benefit from a combined implementation
>> receiving a target precision and signedness.  Whatever extra meta-data
>> RTL requires does not belong in wide-int but in the RTX.  Trivially
>> a mode comes to my mind (on tree we have a type), and trivially
>> each RTX has a mode.  And each mode has a precision and bitsize.
>> It lacks a sign, so all RTL integer constants are sign-extended for
>> encoding efficiency purposes.  mixed-mode operations will not
>> occur (mixed len operations will), mixed-mode ops are exclusively
>> sign-/zero-extensions and truncations.
>>
>> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
>> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
>> precision would be { -1 }.
>>
>> That was my proposal.  Now, can you please properly specify yours?

And you chose to not answer that fundamental question of how your
wide-int is _supposed_ to work?  Ok, maybe I shouldn't have distracted
you with the bits before this.

Richard.

>> Thanks,
>> Richard.
>>
>>> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>>>
>>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>>>> <zadeck@naturalbridge.com> wrote:
>>>>>
>>>>> This patch contains a large number of the changes requested by Richi.
>>>>> It
>>>>> does not contain any of the changes that he requested to abstract the
>>>>> storage layer.   That suggestion appears to be quite unworkable.
>>>>
>>>> I of course took this claim as a challenge ... with the following
>>>> result.
>>>> It is
>>>> of course quite workable ;)
>>>>
>>>> The attached patch implements the core wide-int class and three storage
>>>> models (fixed size for things like plain HWI and double-int, variable
>>>> size
>>>> similar to how your wide-int works and an adaptor for the double-int as
>>>> contained in trees).  With that you can now do
>>>>
>>>> HOST_WIDE_INT
>>>> wi_test (tree x)
>>>> {
>>>>     // template argument deduction doesn't do the magic we want it to do
>>>>     // to make this kind of implicit conversions work
>>>>     // overload resolution considers this kind of conversions so we
>>>>     // need some magic that combines both ... but seeding the overload
>>>>     // set with some instantiations doesn't seem to be possible :/
>>>>     // wide_int<> w = x + 1;
>>>>     wide_int<> w;
>>>>     w += x;
>>>>     w += 1;
>>>>     // template argument deduction doesn't deduce the return value type,
>>>>     // not considering the template default argument either ...
>>>>     // w = wi (x) + 1;
>>>>     // we could support this by providing rvalue-to-lvalue promotion
>>>>     // via a traits class?
>>>>     // otoh it would lead to sub-optimal code anyway so we should
>>>>     // make the result available as reference parameter and only support
>>>>     // wide_int <> res; add (res, x, 1); ?
>>>>     w = wi (x).operator+<wide_int<> >(1);
>>>>     wide_int<>::add(w, x, 1);
>>>>     return w.to_hwi ();
>>>> }
>>>>
>>>> we are somewhat limited with C++ unless we want to get really fancy.
>>>> Eventually providing operator+ just doesn't make much sense for
>>>> generic wide-int combinations (though then the issue is its operands
>>>> are no longer commutative which I think is the case with your wide-int
>>>> or double-int as well - they don't suport "1 + wide_int" for obvious
>>>> reasons).
>>>>
>>>> So there are implementation design choices left undecided.
>>>>
>>>> Oh, and the operation implementations are crap (they compute nonsense).
>>>>
>>>> But you should get the idea.
>>>>
>>>> Richard.
>>>
>>>
>
Kenneth Zadeck April 3, 2013, 4:16 p.m. UTC | #7
On 04/03/2013 09:53 AM, Richard Biener wrote:
> On Wed, Apr 3, 2013 at 2:05 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> On 04/03/2013 05:17 AM, Richard Biener wrote:
>>
>>> In the end you will have a variable-size storage in TREE_INT_CST thus
>>> you will have at least to emit _code_ copying over meta-data and data
>>> from the tree representation to the wide-int (similar for RTX
>>> CONST_DOUBLE/INT).
>>> I'm objecting to the amount of code you emit and agree that the runtime
>>> cost is copying the meta-data (hopefully optimizable via CSE / SRA)
>>> and in most cases one (or two) iterations of the loop copying the data
>>> (not optimizable).
>> i did get rid of the bitsize in the wide-int patch so at this point the meta
>> data is the precision and the len.
>> not really a lot here.   As usual we pay a high price in gcc for not pushing
>> the tree rep down into the rtl level, then it would have been acceptable to
>> have the tree type bleed into the wide-int code.
>>
>>
>>
>>>> 2)  You present this as if the implementor actually should care about the
>>>> implementation and you give 3 alternatives:  the double_int, the current
>>>> one, and HWI.     We have tried to make it so that the client should not
>>>> care.   Certainly in my experience here, I have not found a place to
>>>> care.
>>> Well, similar as for the copying overhead for tree your approach requires
>>> overloading operations for HOST_WIDE_INT operands to be able to
>>> say wi + 1 (which is certainly desirable), or the overhead of using
>>> wide_int_one ().
>>>
>>>> In my opinion double_int needs to go away.  That is the main thrust of my
>>>> patches.   There is no place in a compiler for an abi that depends on
>>>> constants fitting into 2 two words whose size is defined by the host.
>>> That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
>>> preserve a way to ask for an integer type on the host with (at least)
>>> N bits.  Almost all double-int users really ask for an integer type on the
>>> host that has at least as many bits as the pointer representation (or
>>> word_mode) on
>>> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
>>> targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
>>> precision - that is just what happens to be there.  Thus I am providing
>>> a way to say get me a host integer with at least N bits (VRP asks for
>>> this, for example).
>>>
>>> What I was asking for is that whatever can provide the above should share
>>> the functional interface with wide-int (or the othert way around).  And I
>>> was claiming that wide-int is too fat, because current users of double-int
>>> eventually store double-ints permanently.
>> The problem is that, in truth, double int is too fat. 99.something% of all
>> constants fit in 1 hwi and that is likely to be true forever (i understand
>> that tree vpn may need some thought here).  The rtl level, which has, for as
>> long as i have known it, had 2 reps for integer constants. So it was
>> relatively easy to slide the CONST_WIDE_INT in.  It seems like the right
>> trickery here rather than adding a storage model for wide-ints might be a
>> way to use the c++ to invisibly support several (and by "several" i really
>> mean 2) classes of TREE_CSTs.
> The truth is that _now_ TREE_INT_CSTs use double-ints and we have
> CONST_INT and CONST_DOUBLE.  What I (and you) propose would
> get us to use variable-size storage for both, allowing to just use a single
> HOST_WIDE_INT in the majority of cases.  In my view the constant
> length of the variable-size storage for TREE_INT_CSTs is determined
> by its type (thus, it doesn't have "optimized" variable-size storage
> but "unoptimized" fixed-size storage based on the maximum storage
> requirement for the type).  Similar for RTX CONST_INT which would
> have fixed-size storage based on the mode-size of the constant.
> Using optimized space (thus using the encoding properties) requires you
> to fit the 'short len' somewhere which possibly will not pay off in the end
> (for tree we do have that storage available, so we could go with optimized
> storage for it, not sure with RTL, I don't see available space there).
There are two questions here:   one is the fact that you object to the 
fact that we represent small constants efficiently and the second is 
that we take advantage of the fact that fixed size stack allocation is 
effectively free for short lived objects like wide-ints (as i use them).

At the rtl level your idea does not work.   rtl constants do not have a 
mode or type.    So if you do not compress, how are you going to 
determine how many words you need for the constant 1.   I would love to 
have a rep that had the mode in it.    But it is a huge change that 
requires a lot of hacking to every port.

I understand that this makes me vulnerable to the argument that we 
should not let the rtl level ever dictate anything about the tree level, 
but the truth is that a variable len rep is almost always used for big 
integers.   In our code, most constants of large types are small 
numbers.   (Remember i got into this because the tree constant prop 
thinks that left shifting any number by anything greater than 128 is 
always 0 and discovered that that was just the tip of the iceberg.) But 
mostly i support the decision to canonize numbers to the smallest number 
of HWIs because most of the algorithms to do the math can be short 
circuited.    I admit that if i had to effectively unpack most numbers 
to do the math, that the canonization would be a waste.   However, this 
is not really relevant to this conversation.   Yes, you could get rid of 
the len, but this such a small part of picture.

Furthermore, I am constrained at the rtl level because it is just too 
dirty to share the storage.   We tried that and the amount of whack a 
mole we were playing was killing us.

I am comfortable making big changes at the portable level because i can 
easily test them, but changes to fundamental data structures inside 
every port is more than i am willing to do.   If you are going to do 
that, then you are going to have start giving rtl constants modes and 
that is a huge change (that while desirable i cannot do).

At the tree level you could share the storage, I admit that that is 
easy, i just do not think that it is desirable.  The stack allocation 
inside the wide-int is very cheap and the fact that the number that 
comes out is almost always one hwi makes this very efficient.   Even if 
I was starting from scratch this would be a strong contender to be the 
right representation.

Fundamentally, i do not believe that the amount of copying that i am 
proposing at the tree level will be significant.  Yes, if you force me 
to use an uncompressed format you can make what i propose expensive.

>>>> This is not a beauty contest argument, we have public ports are beginning
>>>> to
>>>> use modes that are larger than two x86-64 HWIs and i have a private port
>>>> that has such modes and it is my experience that any pass that uses this
>>>> interface has one of three behaviors: it silently gets the wrong answer,
>>>> it
>>>> ices, or it fails to do the transformation.  If we leave double_int as an
>>>> available option, then any use of it potentially will have one of these
>>>> three behaviors.  And so one of my strong objections to this direction is
>>>> that i do not want to fight this kind of bug for the rest of my life.
>>>> Having a single storage model that just always works is in my opinion a
>>>> highly desirable option.  What you have never answered in a concrete
>>>> manner
>>>> is, if we decide to provide this generality, what it would be used for.
>>>> There is no place in a portable compiler where the right answer for every
>>>> target is two HOST wide integers.
>>>>
>>>> However, i will admit that the HWI option has some merits.   We try to
>>>> address this in our implementation by dividing what is done inline in
>>>> wide-int.h to the cases that fit in an HWI and then only drop into the
>>>> heavy
>>>> code in wide-int.c if mode is larger (which it rarely will be).
>>>> However, a
>>>> case could be made that for certain kinds of things like string lengths
>>>> and
>>>> such, we could use another interface or as you argue, a different storage
>>>> model with the same interface.   I just do not see that the cost of the
>>>> conversion code is really going to show up on anyone's radar.
>>> What's the issue with abstracting away the model so a fixed-size 'len'
>>> is possible?  (let away the argument that this would easily allow an
>>> adaptor to tree)
>> I have a particularly pessimistic perspective because i have already written
>> most of this patch.   It is not that i do not want to change that code, it
>> is that i have seen a certain set of mistakes that were made and i do not
>> want to fix them more than once.   At the rtl level you can see the
>> transition from only supporting 32 bit ints to supporting 64 bit its to
>> finally supporting two HWIs and that transition code is not pretty.  My rtl
>> patch fixes the places where an optimization was only made if the data type
>> was 32 bits or smaller as well as the places where the optimization was made
>> only if the data type is smaller than 64 bits (in addition to fixing all of
>> the places where the code ices or simply gets the wrong answer if it is
>> larger than TImode.)  The tree level is only modestly better, I believe only
>> because it is newer. I have not seen any 32 bit only code, but it is
>> littered with transformations that only work for 64 bits.   What is that 64
>> bit only code going to look like in 5 years?
> The user interface of wide-int does not depend on whether a storage model
> is abstracted or not.  If you take advantage of the storage model by
> making its interface leaner then it will.  But I guess converting everything
> before settling on the wide-int interface may not have been the wisest
> choice in the end (providing a wide-int that can literally replace double-int
> would have got you testing coverage without any change besides
> double-int.[ch] and wide-int.[ch]).
I think that it was exactly the correct decision.   We have made 
significant changes to the structure as we have gone along.   I have 
basically done most of what you have suggested, (the interface is 
completely functional, i got rid of the bitsize...)   What you are 
running into is that mike stump, richard sandiford and myself actually 
believe that the storage model is a fundamentally bad idea.

>> I want to make it easier to write the general code than to write the code
>> that only solves the problem for the size port that the implementor is
>> currently working on.   So I perceive the storage model as a way to keep
>> having to fight this battle forever because it will allow the implementor to
>> make a decision that the optimization only needs to be done for a particular
>> sized integer.
>>
>> However, i get the fact that from your perspective, what you really want is
>> a solution to the data structure problem in tree-vrp.
> No, that's just a convenient example.  What I really want is a wide-int
> that is less visibly a replacement for CONST_DOUBLE.
I think that that is unfair.   Variable length reps are the standard 
technique for doing wide math.  I am just proposing using data 
structures that are common best practices in the rest of the world and 
adapting them so that they match the gcc world better than just hacking 
in a gmp interface.
>
>>   My patch for tree vrp
>> scans the entire function to find the largest type used in the function and
>> then does all of the math at 2x that size.  But i have to admit that i
>> thought it was weird that you use tree cst as your long term storage.   If
>> this were my pass, i would have allocated a 2d array of some type that was
>> as large as the function in 1d and twice as large as the largest int used in
>> the other dimension and not overloaded tree-cst and then had a set of friend
>> functions in double int to get in and out. Of course you do not need friends
>> in double int because the rep is exposed, but in wide-int that is now hidden
>> since it now is purely functional.
>>
>> I just have to believe that there is a better way to do tree-vrp than
>> messing up wide-int for the rest of the compiler.
> It's not "messing up", it's making wide-int a generally useful thing and
> not tying it so closely to RTL.
Again, this is unfair.
>
>>>> 3) your trick will work at the tree level, but not at the rtl level.
>>>> The
>>>> wide-int code cannot share storage with the CONST_INTs.    We tried this,
>>>> and there are a million bugs that would have to be fixed to make it work.
>>>> It could have worked if CONST_INTs had carried a mode around, but since
>>>> they
>>>> do not, you end up with the same CONST_INT sharing the rep for several
>>>> different types and that just did not work unless you are willing to do
>>>> substantial cleanups.
>>> I don't believe you.  I am only asking for the adaptors to tree and RTL to
>>> work in an RVALUE-ish way (no modification, as obviously RTL and tree
>>> constants may be shared).  I think your claim is because you have that
>>> precision and bitsize members in your wide-int which I believe is a
>>> design red herring.  I suppose we should concentrate on addressing that
>>> one first.  Thus, let me repeat a few questions on your design to
>>> eventually
>>> let you understand my concerns:
>>>
>>> Given two wide-ints, a and b, what precision will the result of
>>>        a + b
>>> have?  Consider a having precision 32 and b having precision 64
>>> on a 32-bit HWI host.
>>>
>>> You define wide-int to have no sign information:
>>>
>>> +   The representation does not contain any information about
>>> +   signedness of the represented value, so it can be used to represent
>>> +   both signed and unsigned numbers.  For operations where the results
>>> +   depend on signedness (division, comparisons), the signedness must
>>> +   be specified separately.  For operations where the signness
>>> +   matters, one of the operands to the operation specifies either
>>> +   wide_int::SIGNED or wide_int::UNSIGNED.
>>>
>>> but the result of operations on mixed precision operands _does_ depend
>>> on the sign, nevertheless most operations do not get a signedness
>>> argument.
>>> Nor would that help, because it depends on the signedness of the
>>> individual
>>> operands!
>>>
>>> double-int get's around this by having a common "precision" to which
>>> all smaller precision constants have to be sign-/zero-extended.  So
>>> does CONST_INT and CONST_DOUBLE.
>>>
>>> Note that even with same precision you have introduced the same problem
>>> with the variable len.
>>>
>>> My proposal is simple - all wide-ints are signed!  wide-int is basically
>>> an arbitrary precision signed integer format.  The sign is encoded in
>>> the most significant bit of the last active HWI of the representation
>>> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
>>> with less precision than len * HOST_BITS_PER_WIDE_INT are
>>> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.
>>>
>>> This let's you define mixed len operations by implicitely
>>> sign-/zero-extending
>>> the operands to whatever len is required for the operation.
>>>
>>> Furthermore it allows you to get rid of the precision member (and
>>> bitsize anyway).
>>> Conversion from tree / RTL requires information on the signedness of the
>>> input (trivial for tree, for RTL all constants are signed - well,
>>> sign-extended).
>>> Whenever you want to transfer the wide-int to tree / RTL you have to
>>> sign-/zero-extend according to the desired precision.  If you need sth
>>> else
>>> than arbitrary precision arithmetic you have to explicitely truncate /
>>> extend
>>> at suitable places - with overflow checking being trivial here.  For
>>> optimization
>>> purposes selected operations may benefit from a combined implementation
>>> receiving a target precision and signedness.  Whatever extra meta-data
>>> RTL requires does not belong in wide-int but in the RTX.  Trivially
>>> a mode comes to my mind (on tree we have a type), and trivially
>>> each RTX has a mode.  And each mode has a precision and bitsize.
>>> It lacks a sign, so all RTL integer constants are sign-extended for
>>> encoding efficiency purposes.  mixed-mode operations will not
>>> occur (mixed len operations will), mixed-mode ops are exclusively
>>> sign-/zero-extensions and truncations.
>>>
>>> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
>>> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
>>> precision would be { -1 }.
>>>
>>> That was my proposal.  Now, can you please properly specify yours?
> And you chose to not answer that fundamental question of how your
> wide-int is _supposed_ to work?  Ok, maybe I shouldn't have distracted
> you with the bits before this.
sorry, i missed this question by accident.

we did not do infinite precision by design.   We looked at the set of 
programming languages that gcc either compiles or might compile and we 
looked at the set of machines that gcc targets and neither of these two 
sets of entities define their operations in terms of infinite precision 
math.   They always do math within a particular precision.    Scripting 
languages do use infinite precision but gcc does not compile any of 
them.   So unless you are going to restrict your set of operations to 
those that satisfy the properties of a ring, it is generally not 
strictly safe to do your math in infinite precision.

we do all of the math in the precision defined by the types or modes 
that are passed in.

constants are defined to be the size of the precision unless they can be 
compressed.   The len field tells how many words are actually needed to 
exactly represent the constant if (len-1)*HOST_WIDE_BITS_PER_WIDE_INT is 
less than the precision. The decompression process is to add a number of 
words to the high order side of the number.   These words must contain a 
zero if the highest represented bit is 0 and -1 if the highest 
represented bit is 1.

i.e. this looks a lot like sign extension, but the numbers them selves 
are not inherently signed or unsigned as in your representation.    It 
is up to the operators to imply the signess. So the unsigned multiply 
operation is different than the signed multiply operation.    But these 
are applied to the bits without an interpretation of their sign.

> Richard.
>
>>> Thanks,
>>> Richard.
>>>
>>>> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>> This patch contains a large number of the changes requested by Richi.
>>>>>> It
>>>>>> does not contain any of the changes that he requested to abstract the
>>>>>> storage layer.   That suggestion appears to be quite unworkable.
>>>>> I of course took this claim as a challenge ... with the following
>>>>> result.
>>>>> It is
>>>>> of course quite workable ;)
>>>>>
>>>>> The attached patch implements the core wide-int class and three storage
>>>>> models (fixed size for things like plain HWI and double-int, variable
>>>>> size
>>>>> similar to how your wide-int works and an adaptor for the double-int as
>>>>> contained in trees).  With that you can now do
>>>>>
>>>>> HOST_WIDE_INT
>>>>> wi_test (tree x)
>>>>> {
>>>>>      // template argument deduction doesn't do the magic we want it to do
>>>>>      // to make this kind of implicit conversions work
>>>>>      // overload resolution considers this kind of conversions so we
>>>>>      // need some magic that combines both ... but seeding the overload
>>>>>      // set with some instantiations doesn't seem to be possible :/
>>>>>      // wide_int<> w = x + 1;
>>>>>      wide_int<> w;
>>>>>      w += x;
>>>>>      w += 1;
>>>>>      // template argument deduction doesn't deduce the return value type,
>>>>>      // not considering the template default argument either ...
>>>>>      // w = wi (x) + 1;
>>>>>      // we could support this by providing rvalue-to-lvalue promotion
>>>>>      // via a traits class?
>>>>>      // otoh it would lead to sub-optimal code anyway so we should
>>>>>      // make the result available as reference parameter and only support
>>>>>      // wide_int <> res; add (res, x, 1); ?
>>>>>      w = wi (x).operator+<wide_int<> >(1);
>>>>>      wide_int<>::add(w, x, 1);
>>>>>      return w.to_hwi ();
>>>>> }
>>>>>
>>>>> we are somewhat limited with C++ unless we want to get really fancy.
>>>>> Eventually providing operator+ just doesn't make much sense for
>>>>> generic wide-int combinations (though then the issue is its operands
>>>>> are no longer commutative which I think is the case with your wide-int
>>>>> or double-int as well - they don't suport "1 + wide_int" for obvious
>>>>> reasons).
>>>>>
>>>>> So there are implementation design choices left undecided.
>>>>>
>>>>> Oh, and the operation implementations are crap (they compute nonsense).
>>>>>
>>>>> But you should get the idea.
>>>>>
>>>>> Richard.
>>>>
Richard Biener April 4, 2013, 9:34 a.m. UTC | #8
On Wed, Apr 3, 2013 at 6:16 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> On 04/03/2013 09:53 AM, Richard Biener wrote:
>>
>> On Wed, Apr 3, 2013 at 2:05 PM, Kenneth Zadeck <zadeck@naturalbridge.com>
>> wrote:
>>>
>>> On 04/03/2013 05:17 AM, Richard Biener wrote:
>>>
>>>> In the end you will have a variable-size storage in TREE_INT_CST thus
>>>> you will have at least to emit _code_ copying over meta-data and data
>>>> from the tree representation to the wide-int (similar for RTX
>>>> CONST_DOUBLE/INT).
>>>> I'm objecting to the amount of code you emit and agree that the runtime
>>>> cost is copying the meta-data (hopefully optimizable via CSE / SRA)
>>>> and in most cases one (or two) iterations of the loop copying the data
>>>> (not optimizable).
>>>
>>> i did get rid of the bitsize in the wide-int patch so at this point the
>>> meta
>>> data is the precision and the len.
>>> not really a lot here.   As usual we pay a high price in gcc for not
>>> pushing
>>> the tree rep down into the rtl level, then it would have been acceptable
>>> to
>>> have the tree type bleed into the wide-int code.
>>>
>>>
>>>
>>>>> 2)  You present this as if the implementor actually should care about
>>>>> the
>>>>> implementation and you give 3 alternatives:  the double_int, the
>>>>> current
>>>>> one, and HWI.     We have tried to make it so that the client should
>>>>> not
>>>>> care.   Certainly in my experience here, I have not found a place to
>>>>> care.
>>>>
>>>> Well, similar as for the copying overhead for tree your approach
>>>> requires
>>>> overloading operations for HOST_WIDE_INT operands to be able to
>>>> say wi + 1 (which is certainly desirable), or the overhead of using
>>>> wide_int_one ().
>>>>
>>>>> In my opinion double_int needs to go away.  That is the main thrust of
>>>>> my
>>>>> patches.   There is no place in a compiler for an abi that depends on
>>>>> constants fitting into 2 two words whose size is defined by the host.
>>>>
>>>> That's true.  I'm not arguing to preserve "double-int" - I'm arguing to
>>>> preserve a way to ask for an integer type on the host with (at least)
>>>> N bits.  Almost all double-int users really ask for an integer type on
>>>> the
>>>> host that has at least as many bits as the pointer representation (or
>>>> word_mode) on
>>>> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer
>>>> targets).  No double-int user specifically wants 2 * HOST_WIDE_INT
>>>> precision - that is just what happens to be there.  Thus I am providing
>>>> a way to say get me a host integer with at least N bits (VRP asks for
>>>> this, for example).
>>>>
>>>> What I was asking for is that whatever can provide the above should
>>>> share
>>>> the functional interface with wide-int (or the othert way around).  And
>>>> I
>>>> was claiming that wide-int is too fat, because current users of
>>>> double-int
>>>> eventually store double-ints permanently.
>>>
>>> The problem is that, in truth, double int is too fat. 99.something% of
>>> all
>>> constants fit in 1 hwi and that is likely to be true forever (i
>>> understand
>>> that tree vpn may need some thought here).  The rtl level, which has, for
>>> as
>>> long as i have known it, had 2 reps for integer constants. So it was
>>> relatively easy to slide the CONST_WIDE_INT in.  It seems like the right
>>> trickery here rather than adding a storage model for wide-ints might be a
>>> way to use the c++ to invisibly support several (and by "several" i
>>> really
>>> mean 2) classes of TREE_CSTs.
>>
>> The truth is that _now_ TREE_INT_CSTs use double-ints and we have
>> CONST_INT and CONST_DOUBLE.  What I (and you) propose would
>> get us to use variable-size storage for both, allowing to just use a
>> single
>> HOST_WIDE_INT in the majority of cases.  In my view the constant
>> length of the variable-size storage for TREE_INT_CSTs is determined
>> by its type (thus, it doesn't have "optimized" variable-size storage
>> but "unoptimized" fixed-size storage based on the maximum storage
>> requirement for the type).  Similar for RTX CONST_INT which would
>> have fixed-size storage based on the mode-size of the constant.
>> Using optimized space (thus using the encoding properties) requires you
>> to fit the 'short len' somewhere which possibly will not pay off in the
>> end
>> (for tree we do have that storage available, so we could go with optimized
>> storage for it, not sure with RTL, I don't see available space there).
>
> There are two questions here:   one is the fact that you object to the fact
> that we represent small constants efficiently

Huh?  Where do I object to that?  I question that for the storage in tree
and RTX the encoding trick pays off if you need another HWI-aligned
word to store the len.  But see below.

> and the second is that we take
> advantage of the fact that fixed size stack allocation is effectively free
> for short lived objects like wide-ints (as i use them).

I don't question that and I am not asking you to change that.  As part of
what I ask for a more optimal (smaller) stack allocation would be _possible_
(but not required).

> At the rtl level your idea does not work.   rtl constants do not have a mode
> or type.    So if you do not compress, how are you going to determine how
> many words you need for the constant 1.   I would love to have a rep that
> had the mode in it.    But it is a huge change that requires a lot of
> hacking to every port.

Quoting from your RTL parts patch:

+struct GTY((variable_size)) hwivec_def {
+  int num_elem;                /* number of elements */
+  HOST_WIDE_INT elem[1];
+};
+
+#define HWI_GET_NUM_ELEM(HWIVEC)       ((HWIVEC)->num_elem)
+#define HWI_PUT_NUM_ELEM(HWIVEC, NUM)  ((HWIVEC)->num_elem = (NUM))
+
 /* RTL expression ("rtx").  */

 struct GTY((chain_next ("RTX_NEXT (&%h)"),
@@ -343,6 +352,7 @@ struct GTY((chain_next ("RTX_NEXT (&%h)"),
     struct block_symbol block_sym;
     struct real_value rv;
     struct fixed_value fv;
+    struct hwivec_def hwiv;
   } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
 };

that 'wastes' one HOST_WIDE_INT.  So the most efficient encoding
of 0 will require two HOST_WIDE_INT size storage.  Same as for
double-int currently.  99.9% of all constants will fit into two
HOST_WIDE_INTs, thus having the variable size encoding for
the storage in RTX (as opposed to having num_elem == GET_MODE_BITSIZE
/ HOST_BITS_PER_WIDE_INT) is a waste of time.  num_elem
for a non-optimized encoding is available from the mode stored in
the integer constant RTX...

You say

> At the rtl level your idea does not work.   rtl constants do not have a mode
> or type.

Which is not true and does not matter.  I tell you why.  Quote:

+#if TARGET_SUPPORTS_WIDE_INT
+
+/* Match CONST_*s that can represent compile-time constant integers.  */
+#define CASE_CONST_SCALAR_INT \
+   case CONST_INT: \
+   case CONST_WIDE_INT

which means you are only replacing CONST_DOUBLE with wide-int.
And _all_ CONST_DOUBLE have a mode.  Otherwise you'd have no
way of creating the wide-int in the first place.  Which means any
future transition of CONST_INT to the wide-int code (which is desirable!)
will need to fix the fact that CONST_INTs do not have a mode.
Or even simpler - just retain CONST_INT behavior!  CONST_INTs
need a single HWI storage (moving that to wide-int with a len field
regresses here) and CONST_INTs are always sign-extended.
Preserve that!  Make VOIDmode integer constants have a single
HWI as storage, sign-extended.

> I understand that this makes me vulnerable to the argument that we should
> not let the rtl level ever dictate anything about the tree level, but the
> truth is that a variable len rep is almost always used for big integers.
> In our code, most constants of large types are small numbers.   (Remember i
> got into this because the tree constant prop thinks that left shifting any
> number by anything greater than 128 is always 0 and discovered that that was
> just the tip of the iceberg.) But mostly i support the decision to canonize
> numbers to the smallest number of HWIs because most of the algorithms to do
> the math can be short circuited.    I admit that if i had to effectively
> unpack most numbers to do the math, that the canonization would be a waste.
> However, this is not really relevant to this conversation.   Yes, you could
> get rid of the len, but this such a small part of picture.

Getting rid of 'len' in the RTX storage was only a question of whether it
is an efficient way to go forward.  And with considering to unify
CONST_INTs and CONST_WIDE_INTs it is not.  And even for CONST_WIDE_INTs
(which most of the time would be 2 HWI storage, as otherwise you'd use
a CONST_INT) it would be an improvement.

> Furthermore, I am constrained at the rtl level because it is just too dirty
> to share the storage.   We tried that and the amount of whack a mole we were
> playing was killing us.

The above argument (not encode len explicitely in the RTX) is unrelated
to "sharing the storage" (whatever you exactly refer to here).

> I am comfortable making big changes at the portable level because i can
> easily test them, but changes to fundamental data structures inside every
> port is more than i am willing to do.   If you are going to do that, then
> you are going to have start giving rtl constants modes and that is a huge
> change (that while desirable i cannot do).

All CONST_DOUBLEs have modes.  You do not touch CONST_INTs.
VOIDmode CONST_WIDE_INTs seem perfectly possible to me (see above).
So nothing to fix, even if that is desirable as you say and as I agree.

> At the tree level you could share the storage, I admit that that is easy, i
> just do not think that it is desirable.  The stack allocation inside the
> wide-int is very cheap and the fact that the number that comes out is almost
> always one hwi makes this very efficient.   Even if I was starting from
> scratch this would be a strong contender to be the right representation.
>
> Fundamentally, i do not believe that the amount of copying that i am
> proposing at the tree level will be significant.  Yes, if you force me to
> use an uncompressed format you can make what i propose expensive.

An uncompressed format at tree / RTX will in 99.9% of all cases be
one or two HWIs.  Realize that all my arguments and requests are
really independent (even though I make them together).

As we now have the power of C++ and templates I see no good reason
to not do the optimization of eliding the copying from the RTX / tree
representation.  It can be done as a followup I guess (with the disadvantage
of leaving the possibility to "optimize" existing converted client code).

Btw, is the current state of the patches accessible as a branch somewhere?

>>>>> This is not a beauty contest argument, we have public ports are
>>>>> beginning
>>>>> to
>>>>> use modes that are larger than two x86-64 HWIs and i have a private
>>>>> port
>>>>> that has such modes and it is my experience that any pass that uses
>>>>> this
>>>>> interface has one of three behaviors: it silently gets the wrong
>>>>> answer,
>>>>> it
>>>>> ices, or it fails to do the transformation.  If we leave double_int as
>>>>> an
>>>>> available option, then any use of it potentially will have one of these
>>>>> three behaviors.  And so one of my strong objections to this direction
>>>>> is
>>>>> that i do not want to fight this kind of bug for the rest of my life.
>>>>> Having a single storage model that just always works is in my opinion a
>>>>> highly desirable option.  What you have never answered in a concrete
>>>>> manner
>>>>> is, if we decide to provide this generality, what it would be used for.
>>>>> There is no place in a portable compiler where the right answer for
>>>>> every
>>>>> target is two HOST wide integers.
>>>>>
>>>>> However, i will admit that the HWI option has some merits.   We try to
>>>>> address this in our implementation by dividing what is done inline in
>>>>> wide-int.h to the cases that fit in an HWI and then only drop into the
>>>>> heavy
>>>>> code in wide-int.c if mode is larger (which it rarely will be).
>>>>> However, a
>>>>> case could be made that for certain kinds of things like string lengths
>>>>> and
>>>>> such, we could use another interface or as you argue, a different
>>>>> storage
>>>>> model with the same interface.   I just do not see that the cost of the
>>>>> conversion code is really going to show up on anyone's radar.
>>>>
>>>> What's the issue with abstracting away the model so a fixed-size 'len'
>>>> is possible?  (let away the argument that this would easily allow an
>>>> adaptor to tree)
>>>
>>> I have a particularly pessimistic perspective because i have already
>>> written
>>> most of this patch.   It is not that i do not want to change that code,
>>> it
>>> is that i have seen a certain set of mistakes that were made and i do not
>>> want to fix them more than once.   At the rtl level you can see the
>>> transition from only supporting 32 bit ints to supporting 64 bit its to
>>> finally supporting two HWIs and that transition code is not pretty.  My
>>> rtl
>>> patch fixes the places where an optimization was only made if the data
>>> type
>>> was 32 bits or smaller as well as the places where the optimization was
>>> made
>>> only if the data type is smaller than 64 bits (in addition to fixing all
>>> of
>>> the places where the code ices or simply gets the wrong answer if it is
>>> larger than TImode.)  The tree level is only modestly better, I believe
>>> only
>>> because it is newer. I have not seen any 32 bit only code, but it is
>>> littered with transformations that only work for 64 bits.   What is that
>>> 64
>>> bit only code going to look like in 5 years?
>>
>> The user interface of wide-int does not depend on whether a storage model
>> is abstracted or not.  If you take advantage of the storage model by
>> making its interface leaner then it will.  But I guess converting
>> everything
>> before settling on the wide-int interface may not have been the wisest
>> choice in the end (providing a wide-int that can literally replace
>> double-int
>> would have got you testing coverage without any change besides
>> double-int.[ch] and wide-int.[ch]).
>
> I think that it was exactly the correct decision.   We have made significant
> changes to the structure as we have gone along.   I have basically done most
> of what you have suggested, (the interface is completely functional, i got
> rid of the bitsize...)   What you are running into is that mike stump,
> richard sandiford and myself actually believe that the storage model is a
> fundamentally bad idea.

Well, that must be because of C++ ignorance.  Let's keep that issue
aside for now.  Be sure I'll come back to it.

>>> I want to make it easier to write the general code than to write the code
>>> that only solves the problem for the size port that the implementor is
>>> currently working on.   So I perceive the storage model as a way to keep
>>> having to fight this battle forever because it will allow the implementor
>>> to
>>> make a decision that the optimization only needs to be done for a
>>> particular
>>> sized integer.
>>>
>>> However, i get the fact that from your perspective, what you really want
>>> is
>>> a solution to the data structure problem in tree-vrp.
>>
>> No, that's just a convenient example.  What I really want is a wide-int
>> that is less visibly a replacement for CONST_DOUBLE.
>
> I think that that is unfair.   Variable length reps are the standard
> technique for doing wide math.  I am just proposing using data structures
> that are common best practices in the rest of the world and adapting them so
> that they match the gcc world better than just hacking in a gmp interface.

I don't object to variable-length reps.  I question the need of 'bitsize'
(gone! yay!) and now 'precision' in class wide_int.  I question the need
and efficiency of encoding 'len' in RTX CONST_WIDE_INT.

>>>   My patch for tree vrp
>>> scans the entire function to find the largest type used in the function
>>> and
>>> then does all of the math at 2x that size.  But i have to admit that i
>>> thought it was weird that you use tree cst as your long term storage.
>>> If
>>> this were my pass, i would have allocated a 2d array of some type that
>>> was
>>> as large as the function in 1d and twice as large as the largest int used
>>> in
>>> the other dimension and not overloaded tree-cst and then had a set of
>>> friend
>>> functions in double int to get in and out. Of course you do not need
>>> friends
>>> in double int because the rep is exposed, but in wide-int that is now
>>> hidden
>>> since it now is purely functional.
>>>
>>> I just have to believe that there is a better way to do tree-vrp than
>>> messing up wide-int for the rest of the compiler.
>>
>> It's not "messing up", it's making wide-int a generally useful thing and
>> not tying it so closely to RTL.
>
> Again, this is unfair.
>
>>
>>>>> 3) your trick will work at the tree level, but not at the rtl level.
>>>>> The
>>>>> wide-int code cannot share storage with the CONST_INTs.    We tried
>>>>> this,
>>>>> and there are a million bugs that would have to be fixed to make it
>>>>> work.
>>>>> It could have worked if CONST_INTs had carried a mode around, but since
>>>>> they
>>>>> do not, you end up with the same CONST_INT sharing the rep for several
>>>>> different types and that just did not work unless you are willing to do
>>>>> substantial cleanups.
>>>>
>>>> I don't believe you.  I am only asking for the adaptors to tree and RTL
>>>> to
>>>> work in an RVALUE-ish way (no modification, as obviously RTL and tree
>>>> constants may be shared).  I think your claim is because you have that
>>>> precision and bitsize members in your wide-int which I believe is a
>>>> design red herring.  I suppose we should concentrate on addressing that
>>>> one first.  Thus, let me repeat a few questions on your design to
>>>> eventually
>>>> let you understand my concerns:
>>>>
>>>> Given two wide-ints, a and b, what precision will the result of
>>>>        a + b
>>>> have?  Consider a having precision 32 and b having precision 64
>>>> on a 32-bit HWI host.
>>>>
>>>> You define wide-int to have no sign information:
>>>>
>>>> +   The representation does not contain any information about
>>>> +   signedness of the represented value, so it can be used to represent
>>>> +   both signed and unsigned numbers.  For operations where the results
>>>> +   depend on signedness (division, comparisons), the signedness must
>>>> +   be specified separately.  For operations where the signness
>>>> +   matters, one of the operands to the operation specifies either
>>>> +   wide_int::SIGNED or wide_int::UNSIGNED.
>>>>
>>>> but the result of operations on mixed precision operands _does_ depend
>>>> on the sign, nevertheless most operations do not get a signedness
>>>> argument.
>>>> Nor would that help, because it depends on the signedness of the
>>>> individual
>>>> operands!
>>>>
>>>> double-int get's around this by having a common "precision" to which
>>>> all smaller precision constants have to be sign-/zero-extended.  So
>>>> does CONST_INT and CONST_DOUBLE.
>>>>
>>>> Note that even with same precision you have introduced the same problem
>>>> with the variable len.
>>>>
>>>> My proposal is simple - all wide-ints are signed!  wide-int is basically
>>>> an arbitrary precision signed integer format.  The sign is encoded in
>>>> the most significant bit of the last active HWI of the representation
>>>> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)).  All values
>>>> with less precision than len * HOST_BITS_PER_WIDE_INT are
>>>> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT.
>>>>
>>>> This let's you define mixed len operations by implicitely
>>>> sign-/zero-extending
>>>> the operands to whatever len is required for the operation.
>>>>
>>>> Furthermore it allows you to get rid of the precision member (and
>>>> bitsize anyway).
>>>> Conversion from tree / RTL requires information on the signedness of the
>>>> input (trivial for tree, for RTL all constants are signed - well,
>>>> sign-extended).
>>>> Whenever you want to transfer the wide-int to tree / RTL you have to
>>>> sign-/zero-extend according to the desired precision.  If you need sth
>>>> else
>>>> than arbitrary precision arithmetic you have to explicitely truncate /
>>>> extend
>>>> at suitable places - with overflow checking being trivial here.  For
>>>> optimization
>>>> purposes selected operations may benefit from a combined implementation
>>>> receiving a target precision and signedness.  Whatever extra meta-data
>>>> RTL requires does not belong in wide-int but in the RTX.  Trivially
>>>> a mode comes to my mind (on tree we have a type), and trivially
>>>> each RTX has a mode.  And each mode has a precision and bitsize.
>>>> It lacks a sign, so all RTL integer constants are sign-extended for
>>>> encoding efficiency purposes.  mixed-mode operations will not
>>>> occur (mixed len operations will), mixed-mode ops are exclusively
>>>> sign-/zero-extensions and truncations.
>>>>
>>>> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily
>>>> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any
>>>> precision would be { -1 }.
>>>>
>>>> That was my proposal.  Now, can you please properly specify yours?
>>
>> And you chose to not answer that fundamental question of how your
>> wide-int is _supposed_ to work?  Ok, maybe I shouldn't have distracted
>> you with the bits before this.
>
> sorry, i missed this question by accident.
>
> we did not do infinite precision by design.   We looked at the set of
> programming languages that gcc either compiles or might compile and we
> looked at the set of machines that gcc targets and neither of these two sets
> of entities define their operations in terms of infinite precision math.
> They always do math within a particular precision.    Scripting languages do
> use infinite precision but gcc does not compile any of them.   So unless you
> are going to restrict your set of operations to those that satisfy the
> properties of a ring, it is generally not strictly safe to do your math in
> infinite precision.
>
> we do all of the math in the precision defined by the types or modes that
> are passed in.
>
> constants are defined to be the size of the precision unless they can be
> compressed.   The len field tells how many words are actually needed to
> exactly represent the constant if (len-1)*HOST_WIDE_BITS_PER_WIDE_INT is
> less than the precision. The decompression process is to add a number of
> words to the high order side of the number.   These words must contain a
> zero if the highest represented bit is 0 and -1 if the highest represented
> bit is 1.

So the compressed representation is signed.  Please document that.

> i.e. this looks a lot like sign extension, but the numbers them selves are
> not inherently signed or unsigned as in your representation.    It is up to
> the operators to imply the signess. So the unsigned multiply operation is
> different than the signed multiply operation.    But these are applied to
> the bits without an interpretation of their sign.

Which means that the inconsistency of ops getting a precision argument
vs. arbitrarily taking the precision of the first operand is a design bug.
Note that with your encoding scheme as far as I understand the "sign"
of { -1 } depends on the precision.  If the precision is less than that of HWI
then { -1 } is signed, if it is exactly the precision of HWI the sign is
unspecified, it could be -1U or -1 (as far as I understand it is not an
invalid encoding of -1U?), if it has bigger precision than that of HWI
then the sign is unspecified (it could be -1U or -1 in larger precision)?

"Unspecified sign" means that you cannot combine two operands with
different 'len' / 'precision' without knowing the sign of the operands you
need to extend.  Note that knowing the sign of the operation is not enough.

So I believe that for correctness you need to assert at least
gcc_assert that operands have the same precision - as you assume
here for example:

+wide_int
+wide_int::operator + (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] + op1.val[0];
+      if (precision < HOST_BITS_PER_WIDE_INT)
+       result.val[0] = sext_hwi (result.val[0], precision);
+    }

which is clearly wrong if op1.precision != precision.  It also explains
why wide_int::one needs a precision ... (I hope you got rid of the
overloads)

I'm not sure to what extent you got rid of the precision/bitsize taking
functions like

+  inline wide_int lshift (unsigned int y, ShiftOp z, unsigned int bitsize,
+                         unsigned int precision) const;
+  wide_int popcount (unsigned int bitsize, unsigned int prec) const;

already.  They don't make sense as *this already has a precision.

Btw, my proposal would be to get rid of 'precision', as 'precision' is
not needed to interpret the encoding as I would define it (the MSB
would _always_ be a sign-bit) and as you say the operation
has a sign.  Truncating to a desired target precision can be done
after the fact or as part of the operation.  You already have to
deal with extension as even with equal precision operands can
have a different len.

I'd like to see the current state of wide-int somewhere btw., given
you've done significant changes since the last post.

Thanks,
Richard.

>
>> Richard.
>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> This patch contains a large number of the changes requested by Richi.
>>>>>>> It
>>>>>>> does not contain any of the changes that he requested to abstract the
>>>>>>> storage layer.   That suggestion appears to be quite unworkable.
>>>>>>
>>>>>> I of course took this claim as a challenge ... with the following
>>>>>> result.
>>>>>> It is
>>>>>> of course quite workable ;)
>>>>>>
>>>>>> The attached patch implements the core wide-int class and three
>>>>>> storage
>>>>>> models (fixed size for things like plain HWI and double-int, variable
>>>>>> size
>>>>>> similar to how your wide-int works and an adaptor for the double-int
>>>>>> as
>>>>>> contained in trees).  With that you can now do
>>>>>>
>>>>>> HOST_WIDE_INT
>>>>>> wi_test (tree x)
>>>>>> {
>>>>>>      // template argument deduction doesn't do the magic we want it to
>>>>>> do
>>>>>>      // to make this kind of implicit conversions work
>>>>>>      // overload resolution considers this kind of conversions so we
>>>>>>      // need some magic that combines both ... but seeding the
>>>>>> overload
>>>>>>      // set with some instantiations doesn't seem to be possible :/
>>>>>>      // wide_int<> w = x + 1;
>>>>>>      wide_int<> w;
>>>>>>      w += x;
>>>>>>      w += 1;
>>>>>>      // template argument deduction doesn't deduce the return value
>>>>>> type,
>>>>>>      // not considering the template default argument either ...
>>>>>>      // w = wi (x) + 1;
>>>>>>      // we could support this by providing rvalue-to-lvalue promotion
>>>>>>      // via a traits class?
>>>>>>      // otoh it would lead to sub-optimal code anyway so we should
>>>>>>      // make the result available as reference parameter and only
>>>>>> support
>>>>>>      // wide_int <> res; add (res, x, 1); ?
>>>>>>      w = wi (x).operator+<wide_int<> >(1);
>>>>>>      wide_int<>::add(w, x, 1);
>>>>>>      return w.to_hwi ();
>>>>>> }
>>>>>>
>>>>>> we are somewhat limited with C++ unless we want to get really fancy.
>>>>>> Eventually providing operator+ just doesn't make much sense for
>>>>>> generic wide-int combinations (though then the issue is its operands
>>>>>> are no longer commutative which I think is the case with your wide-int
>>>>>> or double-int as well - they don't suport "1 + wide_int" for obvious
>>>>>> reasons).
>>>>>>
>>>>>> So there are implementation design choices left undecided.
>>>>>>
>>>>>> Oh, and the operation implementations are crap (they compute
>>>>>> nonsense).
>>>>>>
>>>>>> But you should get the idea.
>>>>>>
>>>>>> Richard.
>>>>>
>>>>>
>
Kenneth Zadeck April 5, 2013, 12:34 p.m. UTC | #9
Richard,

There has been something that has bothered me about you proposal for the 
storage manager and i think i can now characterize that problem.  Say i 
want to compute the expression

(a + b) / c

converting from tree values, using wide-int as the engine and then 
storing the result in a tree.   (A very common operation for the various 
simplifiers in gcc.)

in my version of wide-int where there is only the stack allocated fix 
size allocation for the data, the compiler arranges for 6 instances of 
wide-int that are "statically" allocated on the stack when the function 
is entered.    There would be 3 copies of the precision and data to get 
things started and one allocation variable sized object at the end when 
the INT_CST is built and one copy to put it back.   As i have argued, 
these copies are of negligible size.

In your world, to get things started, you would do 3 pointer copies to 
get the values out of the tree to set the expression leaves but then you 
will call the allocator 3 times to get space to hold the intermediate 
nodes before you get to pointer copy the result back into the result cst 
which still needs an allocation to build it. I am assuming that we can 
play the same game at the tree level that we do at the rtl level where 
we do 1 variable sized allocation to get the entire INT_CST rather than 
doing 1 fixed sized allocation and 1 variable sized one.

even if we take the simpler example of a + b, you still loose.   The 
cost of the extra allocation and it's subsequent recovery is more than 
my copies.   In fact, even in the simplest case of someone going from a 
HWI thru wide_int into tree, you have 2 allocations vs my 1.

I just do not see the cost savings and if there are no cost savings, you 
certainly cannot say that having these templates is simpler than not 
having the templates.

Kenny

On 04/02/2013 11:04 AM, Richard Biener wrote:
> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> This patch contains a large number of the changes requested by Richi.   It
>> does not contain any of the changes that he requested to abstract the
>> storage layer.   That suggestion appears to be quite unworkable.
> I of course took this claim as a challenge ... with the following result.  It is
> of course quite workable ;)
>
> The attached patch implements the core wide-int class and three storage
> models (fixed size for things like plain HWI and double-int, variable size
> similar to how your wide-int works and an adaptor for the double-int as
> contained in trees).  With that you can now do
>
> HOST_WIDE_INT
> wi_test (tree x)
> {
>    // template argument deduction doesn't do the magic we want it to do
>    // to make this kind of implicit conversions work
>    // overload resolution considers this kind of conversions so we
>    // need some magic that combines both ... but seeding the overload
>    // set with some instantiations doesn't seem to be possible :/
>    // wide_int<> w = x + 1;
>    wide_int<> w;
>    w += x;
>    w += 1;
>    // template argument deduction doesn't deduce the return value type,
>    // not considering the template default argument either ...
>    // w = wi (x) + 1;
>    // we could support this by providing rvalue-to-lvalue promotion
>    // via a traits class?
>    // otoh it would lead to sub-optimal code anyway so we should
>    // make the result available as reference parameter and only support
>    // wide_int <> res; add (res, x, 1); ?
>    w = wi (x).operator+<wide_int<> >(1);
>    wide_int<>::add(w, x, 1);
>    return w.to_hwi ();
> }
>
> we are somewhat limited with C++ unless we want to get really fancy.
> Eventually providing operator+ just doesn't make much sense for
> generic wide-int combinations (though then the issue is its operands
> are no longer commutative which I think is the case with your wide-int
> or double-int as well - they don't suport "1 + wide_int" for obvious reasons).
>
> So there are implementation design choices left undecided.
>
> Oh, and the operation implementations are crap (they compute nonsense).
>
> But you should get the idea.
>
> Richard.
Kenneth Zadeck April 7, 2013, 5:16 p.m. UTC | #10
Richard,

You advocate that I should be using an infinite precision
representation and I advocate a finite precision representation where
the precision is taken from the context.  I would like to make the
case for my position here, in a separate thread, because the other
thread is just getting too messy.

At both the tree level and the rtl level you have a type (mode is just
bad rep for types) and both of those explicitly have precisions. The
semantics of the programming languages that we implement define, or at
least recommend, that most operations be done in a precision that is
implementation dependent (or like java a particular machine
independent precision).  Each hardware platform specifies exactly how
every operation is done.  I will admit that infinite precision is more
esthetically pleasing than what i have done, but exact precision
matches the needs of these clients.  The problem is that the results
from infinite precision arithmetic differ in many significant ways
from finite precision math.  And the number of places where you have
to inject a precision to get the expected answer, ultimately makes the
infinite precision representation unattractive.

As I said on Thursday, whenever you do operations that do not satisfy
the requirements of a mathematical ring (add sub and mul are in a
ring, divide, shift and comparisons are not) you run the risk of
getting a result that is not what would have been obtained with either
a strict interpretation of the semantics or the machine. Intuitively
any operation that looks at the bits above the precision does not
qualify as an operation that works in a ring.

The poster child for operations that do not belong to a ring is division.
For my example, I am using 4 bit integers because it makes the
examples easy, but similar examples exist for any fixed precision.

Consider 8 * 10 / 4

in an infinite precision world the result is 20, but in a 4 bit
precision world the answer is 0.

another example is to ask if

-10 * 10 is less than 0?

again you get a different answer with infinite precision.   I would argue
that if i declare a variable of type uint32 and scale my examples i have
the right to expect the compiler to produce the same result as the
machine would.

While C and C++ may have enough wiggle room in their standards so that
this is just an unexpected, but legal, result as opposed to being wrong,
everyone will hate you (us) if we do this.  Furthermore, Java explicitly 
does
not allow this (not that anyone actually uses gcj).  I do not know
enough about go, ada and fortran to say how it would effect them.

In looking at the double-int class, the only operation that does not
fit in a ring that is done properly is shifting.  There we explicitly
pass in the precision.

The reason that we rarely see this kind of problem even though
double-int implements 128 bit infinite precision is that currently
very little of the compiler actually uses infinite precision in a
robust way.   In a large number of places, the code looks like:

if (TYPE_PRECISION (TREE_TYPE (...)) < HOST_BITS_PER_WIDE_INT)
    do something using inline operators.
else
    either do not do something or use const-double,

such code clears out most of these issues before the two passes that
embrace infinite precision get a chance to do much damage.  However,
my patch at the rtl level gets rid of most of this kind of code and
replaces it with calls to wide-int that currently uses only operations
within the precision.  I assume that if i went down the infinite
precision road at the tree level, that all of this would come to the
surface very quickly.  I prefer to not change my rep and not have to
deal with this later.

Add, subtract, multiply and the logicals are all safe.  But divide,
remainder, and all of the comparisons need explicit precisions.  In
addition operations like clz, ctl and clrsb need precisions.  In total
about half of the functions would need a precision passed in.  My
point is that once you have to start passing in the precision in for all
of those operations, it seems to be cleaner to get the precision from
the leaves of the tree as I currently do.

Once you buy into the math in a particular precision world, a lot of
the other issues that you raise are just settled.  Asking how to extend
a value beyond it's precision is like asking what the universe was like 
before
the big bang.  It is just something you do not need to know.

I understand that you would like to have functions like x + 1 work,
and so do I. I just could not figure out how to make them have
unsurprising semantics.  In particular, g++ did not seem to be happy
with me defining two plus operators, one for each of signed and
unsigned HWIs.  It seems like if someone explicitly added a wide_int
and an unsigned HWI that they had a right to have the unsigned hwi not
be sign extended.  But if you can show me how to do this, i am happy
to go down that road.

Kenny
Florian Weimer April 8, 2013, 8:56 a.m. UTC | #11
On 04/07/2013 07:16 PM, Kenneth Zadeck wrote:
> The poster child for operations that do not belong to a ring is division.
> For my example, I am using 4 bit integers because it makes the
> examples easy, but similar examples exist for any fixed precision.
>
> Consider 8 * 10 / 4
>
> in an infinite precision world the result is 20, but in a 4 bit
> precision world the answer is 0.

I think you mean "4" instead of "20".

> another example is to ask if
>
> -10 * 10 is less than 0?
>
> again you get a different answer with infinite precision.

Actually, for C/C++ ,you don't—because of undefined signed overflow
(at least with default compiler flags).  But similar examples with 
unsigned types exist, so this point isn't too relevant.

> I would argue
> that if i declare a variable of type uint32 and scale my examples i have
> the right to expect the compiler to produce the same result as the
> machine would.

In my very, very limited experience, the signed/unsigned mismatch is 
more confusing.  With infinite precision, this confusion would not arise 
(but adjustment would be needed to get limited-precision results, as you 
write).  With finite precision, you either need separate types for 
signed/unsigned, or separate operations.

> While C and C++ may have enough wiggle room in their standards so that
> this is just an unexpected, but legal, result as opposed to being wrong,
> everyone will hate you (us) if we do this.  Furthermore, Java explicitly
> does
> not allow this (not that anyone actually uses gcj).  I do not know
> enough about go,

Go specified two's-complement signed arithmetic and does not 
automatically promote to int (i.e., it performs arithmetic in the type, 
and mixed arguments are not supported).

Go constant arithmetic is infinite precision.

 > ada and fortran to say how it would effect them.

Ada requires trapping arithmetic for signed integers.  Currently, this 
is implemented in the front end.  Arithmetic happens in the base range 
of a type (which is symmetric around zero and chosen to correspond to a 
machine type).  Ada allows omitting intermediate overflow checks as long 
as you produce the infinite precision result (or raise an overflow 
exception).

I think this applies to Ada constant arithmetic as well.

(GNAT has a mode where comparisons are computed with infinite precision, 
which is extremely useful for writing bounds checking code.)

Considering the range of different arithmetic operations we need to 
support, I'm not convinced that the ring model is appropriate.
Richard Biener April 8, 2013, 10:32 a.m. UTC | #12
On Fri, Apr 5, 2013 at 2:34 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> Richard,
>
> There has been something that has bothered me about you proposal for the
> storage manager and i think i can now characterize that problem.  Say i want
> to compute the expression
>
> (a + b) / c
>
> converting from tree values, using wide-int as the engine and then storing
> the result in a tree.   (A very common operation for the various simplifiers
> in gcc.)
>
> in my version of wide-int where there is only the stack allocated fix size
> allocation for the data, the compiler arranges for 6 instances of wide-int
> that are "statically" allocated on the stack when the function is entered.
> There would be 3 copies of the precision and data to get things started and
> one allocation variable sized object at the end when the INT_CST is built
> and one copy to put it back.   As i have argued, these copies are of
> negligible size.
>
> In your world, to get things started, you would do 3 pointer copies to get
> the values out of the tree to set the expression leaves but then you will
> call the allocator 3 times to get space to hold the intermediate nodes
> before you get to pointer copy the result back into the result cst which
> still needs an allocation to build it. I am assuming that we can play the
> same game at the tree level that we do at the rtl level where we do 1
> variable sized allocation to get the entire INT_CST rather than doing 1
> fixed sized allocation and 1 variable sized one.
>
> even if we take the simpler example of a + b, you still loose.   The cost of
> the extra allocation and it's subsequent recovery is more than my copies.
> In fact, even in the simplest case of someone going from a HWI thru wide_int
> into tree, you have 2 allocations vs my 1.

Just to clarify, my code wouldn't handle

  tree a, b, c;
  tree res = (a + b) / c;

transparently.  The most "complex" form of the above that I think would
be reasonable to handle would be

  tree a, b, c;
  wide_int wires = (wi (a) + b) / c;
  tree res = build_int_cst (TREE_TYPE (a), wires);

and the code as posted would even require you to specify the
return type of operator+ and operator/ explicitely like

 wide_int<> wires = (wi (a).operator+<wi_embed_var>
(b)).operator/<wi_embed_var> (c);

but as I said I just didn't bother to "decide" that the return type is
always of wide_int <variable-len-storage> kind.

Now, the only real allocation that happens is done by build_int_cst.
There is one wide_int on the stack to hold the a + b result and one
separate wide_int to hold wires (it's literally written in the code).
There are no "pointer copies" involved in the end - the result from
converting a tree to a wide_int<tree-storage> is the original 'tree'
pointer itself, thus a register.

> I just do not see the cost savings and if there are no cost savings, you
> certainly cannot say that having these templates is simpler than not having
> the templates.

I think you are missing the point - by abstracting away the storage
you don't necessarily need to add the templates.  But you open up
a very easy route for doing so and you make the operations _trivially_
work on the tree / RTL storage with no overhead in generated code
and minimal overhead in the amount of code in GCC itself.  In my
prototype the overhead of adding 'tree' support is to place

class wi_tree_int_cst
{
  tree cst;
public:
  void construct (tree c) { cst = c; }
  const HOST_WIDE_INT *storage() const { return reinterpret_cast
<HOST_WIDE_INT *>(&TREE_INT_CST (cst)); }
  unsigned len() const { return 2; }
};

template <>
class wi_traits <tree>
{
public:
    typedef wide_int <wi_tree_int_cst> wi_t;
    wi_traits(tree t)
  {
    wi_tree_int_cst ws;
    ws.construct (t);
    w.construct (ws);
  }
    wi_t* operator->() { return &w; }
private:
    wi_t w;
};

into tree.h.

Richard.

> Kenny
>
>
> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>
>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> This patch contains a large number of the changes requested by Richi.
>>> It
>>> does not contain any of the changes that he requested to abstract the
>>> storage layer.   That suggestion appears to be quite unworkable.
>>
>> I of course took this claim as a challenge ... with the following result.
>> It is
>> of course quite workable ;)
>>
>> The attached patch implements the core wide-int class and three storage
>> models (fixed size for things like plain HWI and double-int, variable size
>> similar to how your wide-int works and an adaptor for the double-int as
>> contained in trees).  With that you can now do
>>
>> HOST_WIDE_INT
>> wi_test (tree x)
>> {
>>    // template argument deduction doesn't do the magic we want it to do
>>    // to make this kind of implicit conversions work
>>    // overload resolution considers this kind of conversions so we
>>    // need some magic that combines both ... but seeding the overload
>>    // set with some instantiations doesn't seem to be possible :/
>>    // wide_int<> w = x + 1;
>>    wide_int<> w;
>>    w += x;
>>    w += 1;
>>    // template argument deduction doesn't deduce the return value type,
>>    // not considering the template default argument either ...
>>    // w = wi (x) + 1;
>>    // we could support this by providing rvalue-to-lvalue promotion
>>    // via a traits class?
>>    // otoh it would lead to sub-optimal code anyway so we should
>>    // make the result available as reference parameter and only support
>>    // wide_int <> res; add (res, x, 1); ?
>>    w = wi (x).operator+<wide_int<> >(1);
>>    wide_int<>::add(w, x, 1);
>>    return w.to_hwi ();
>> }
>>
>> we are somewhat limited with C++ unless we want to get really fancy.
>> Eventually providing operator+ just doesn't make much sense for
>> generic wide-int combinations (though then the issue is its operands
>> are no longer commutative which I think is the case with your wide-int
>> or double-int as well - they don't suport "1 + wide_int" for obvious
>> reasons).
>>
>> So there are implementation design choices left undecided.
>>
>> Oh, and the operation implementations are crap (they compute nonsense).
>>
>> But you should get the idea.
>>
>> Richard.
>
>
Richard Biener April 8, 2013, 10:46 a.m. UTC | #13
On Sun, Apr 7, 2013 at 7:16 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> Richard,
>
> You advocate that I should be using an infinite precision
> representation and I advocate a finite precision representation where
> the precision is taken from the context.  I would like to make the
> case for my position here, in a separate thread, because the other
> thread is just getting too messy.
>
> At both the tree level and the rtl level you have a type (mode is just
> bad rep for types) and both of those explicitly have precisions. The
> semantics of the programming languages that we implement define, or at
> least recommend, that most operations be done in a precision that is
> implementation dependent (or like java a particular machine
> independent precision).  Each hardware platform specifies exactly how
> every operation is done.  I will admit that infinite precision is more
> esthetically pleasing than what i have done, but exact precision
> matches the needs of these clients.  The problem is that the results
> from infinite precision arithmetic differ in many significant ways
> from finite precision math.  And the number of places where you have
> to inject a precision to get the expected answer, ultimately makes the
> infinite precision representation unattractive.
>
> As I said on Thursday, whenever you do operations that do not satisfy
> the requirements of a mathematical ring (add sub and mul are in a
> ring, divide, shift and comparisons are not) you run the risk of
> getting a result that is not what would have been obtained with either
> a strict interpretation of the semantics or the machine. Intuitively
> any operation that looks at the bits above the precision does not
> qualify as an operation that works in a ring.
>
> The poster child for operations that do not belong to a ring is division.
> For my example, I am using 4 bit integers because it makes the
> examples easy, but similar examples exist for any fixed precision.
>
> Consider 8 * 10 / 4
>
> in an infinite precision world the result is 20, but in a 4 bit
> precision world the answer is 0.
>
> another example is to ask if
>
> -10 * 10 is less than 0?
>
> again you get a different answer with infinite precision.   I would argue
> that if i declare a variable of type uint32 and scale my examples i have
> the right to expect the compiler to produce the same result as the
> machine would.
>
> While C and C++ may have enough wiggle room in their standards so that
> this is just an unexpected, but legal, result as opposed to being wrong,
> everyone will hate you (us) if we do this.  Furthermore, Java explicitly
> does
> not allow this (not that anyone actually uses gcj).  I do not know
> enough about go, ada and fortran to say how it would effect them.
>
> In looking at the double-int class, the only operation that does not
> fit in a ring that is done properly is shifting.  There we explicitly
> pass in the precision.
>
> The reason that we rarely see this kind of problem even though
> double-int implements 128 bit infinite precision is that currently
> very little of the compiler actually uses infinite precision in a
> robust way.   In a large number of places, the code looks like:
>
> if (TYPE_PRECISION (TREE_TYPE (...)) < HOST_BITS_PER_WIDE_INT)
>    do something using inline operators.
> else
>    either do not do something or use const-double,
>
> such code clears out most of these issues before the two passes that
> embrace infinite precision get a chance to do much damage.  However,
> my patch at the rtl level gets rid of most of this kind of code and
> replaces it with calls to wide-int that currently uses only operations
> within the precision.  I assume that if i went down the infinite
> precision road at the tree level, that all of this would come to the
> surface very quickly.  I prefer to not change my rep and not have to
> deal with this later.
>
> Add, subtract, multiply and the logicals are all safe.  But divide,
> remainder, and all of the comparisons need explicit precisions.  In
> addition operations like clz, ctl and clrsb need precisions.  In total
> about half of the functions would need a precision passed in.  My
> point is that once you have to start passing in the precision in for all
> of those operations, it seems to be cleaner to get the precision from
> the leaves of the tree as I currently do.
>
> Once you buy into the math in a particular precision world, a lot of
> the other issues that you raise are just settled.  Asking how to extend
> a value beyond it's precision is like asking what the universe was like
> before
> the big bang.  It is just something you do not need to know.
>
> I understand that you would like to have functions like x + 1 work,
> and so do I. I just could not figure out how to make them have
> unsurprising semantics.  In particular, g++ did not seem to be happy
> with me defining two plus operators, one for each of signed and
> unsigned HWIs.  It seems like if someone explicitly added a wide_int
> and an unsigned HWI that they had a right to have the unsigned hwi not
> be sign extended.  But if you can show me how to do this, i am happy
> to go down that road.

I advocate the infinite precision signed representation as one solution
to avoid the issues that come up with your implementation (as I currently
have access to) which has a representation with N bits of precision
encoded with M <= N bits and no sign information.  That obviously
leaves operations on numbers of that representation with differing
N undefined.  You define it by having coded the operations which as far
as I can see simply assume N is equal for any two operands and
the effective sign for extending the M-bits encoding to the common
N-bits precision is "available".  A thorough specification of both
the encoding scheme and the operation semantics is missing.
I can side-step both of these issues nicely by simply using
a infinite precision signed representation and requiring the client to
explicitely truncate / extend to a specific precision when required.
I also leave open the possibility to have the _encoding_ be always
the same as an infinite precision signed representation but to always
require an explicitely specified target precision for each operation
(which rules out the use of operator overloading).

Citing your example:

  8 * 10 / 4

and transforming it slightly into a commonly used pattern:

  (byte-size * 8 + bit-size) / 8

then I argue that what people want here is this carried out in
_infinite_ precision!  Even if byte-size happens to come from
a sizetype TREE_INT_CST with 64bit precision.  So either
choice - having a fixed-precision representation or an
infinite-precision representation - can and will lead to errors
done by the programmer.  And as you can easily build a
finite precision wrapper around an infinite precision implementation
but not the other way around it's obvious to me what the
implementation should provide.

Richard.

> Kenny
Kenneth Zadeck April 8, 2013, 12:43 p.m. UTC | #14
On 04/08/2013 06:46 AM, Richard Biener wrote:
> On Sun, Apr 7, 2013 at 7:16 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> Richard,
>>
>> You advocate that I should be using an infinite precision
>> representation and I advocate a finite precision representation where
>> the precision is taken from the context.  I would like to make the
>> case for my position here, in a separate thread, because the other
>> thread is just getting too messy.
>>
>> At both the tree level and the rtl level you have a type (mode is just
>> bad rep for types) and both of those explicitly have precisions. The
>> semantics of the programming languages that we implement define, or at
>> least recommend, that most operations be done in a precision that is
>> implementation dependent (or like java a particular machine
>> independent precision).  Each hardware platform specifies exactly how
>> every operation is done.  I will admit that infinite precision is more
>> esthetically pleasing than what i have done, but exact precision
>> matches the needs of these clients.  The problem is that the results
>> from infinite precision arithmetic differ in many significant ways
>> from finite precision math.  And the number of places where you have
>> to inject a precision to get the expected answer, ultimately makes the
>> infinite precision representation unattractive.
>>
>> As I said on Thursday, whenever you do operations that do not satisfy
>> the requirements of a mathematical ring (add sub and mul are in a
>> ring, divide, shift and comparisons are not) you run the risk of
>> getting a result that is not what would have been obtained with either
>> a strict interpretation of the semantics or the machine. Intuitively
>> any operation that looks at the bits above the precision does not
>> qualify as an operation that works in a ring.
>>
>> The poster child for operations that do not belong to a ring is division.
>> For my example, I am using 4 bit integers because it makes the
>> examples easy, but similar examples exist for any fixed precision.
>>
>> Consider 8 * 10 / 4
>>
>> in an infinite precision world the result is 20, but in a 4 bit
>> precision world the answer is 0.
>>
>> another example is to ask if
>>
>> -10 * 10 is less than 0?
>>
>> again you get a different answer with infinite precision.   I would argue
>> that if i declare a variable of type uint32 and scale my examples i have
>> the right to expect the compiler to produce the same result as the
>> machine would.
>>
>> While C and C++ may have enough wiggle room in their standards so that
>> this is just an unexpected, but legal, result as opposed to being wrong,
>> everyone will hate you (us) if we do this.  Furthermore, Java explicitly
>> does
>> not allow this (not that anyone actually uses gcj).  I do not know
>> enough about go, ada and fortran to say how it would effect them.
>>
>> In looking at the double-int class, the only operation that does not
>> fit in a ring that is done properly is shifting.  There we explicitly
>> pass in the precision.
>>
>> The reason that we rarely see this kind of problem even though
>> double-int implements 128 bit infinite precision is that currently
>> very little of the compiler actually uses infinite precision in a
>> robust way.   In a large number of places, the code looks like:
>>
>> if (TYPE_PRECISION (TREE_TYPE (...)) < HOST_BITS_PER_WIDE_INT)
>>     do something using inline operators.
>> else
>>     either do not do something or use const-double,
>>
>> such code clears out most of these issues before the two passes that
>> embrace infinite precision get a chance to do much damage.  However,
>> my patch at the rtl level gets rid of most of this kind of code and
>> replaces it with calls to wide-int that currently uses only operations
>> within the precision.  I assume that if i went down the infinite
>> precision road at the tree level, that all of this would come to the
>> surface very quickly.  I prefer to not change my rep and not have to
>> deal with this later.
>>
>> Add, subtract, multiply and the logicals are all safe.  But divide,
>> remainder, and all of the comparisons need explicit precisions.  In
>> addition operations like clz, ctl and clrsb need precisions.  In total
>> about half of the functions would need a precision passed in.  My
>> point is that once you have to start passing in the precision in for all
>> of those operations, it seems to be cleaner to get the precision from
>> the leaves of the tree as I currently do.
>>
>> Once you buy into the math in a particular precision world, a lot of
>> the other issues that you raise are just settled.  Asking how to extend
>> a value beyond it's precision is like asking what the universe was like
>> before
>> the big bang.  It is just something you do not need to know.
>>
>> I understand that you would like to have functions like x + 1 work,
>> and so do I. I just could not figure out how to make them have
>> unsurprising semantics.  In particular, g++ did not seem to be happy
>> with me defining two plus operators, one for each of signed and
>> unsigned HWIs.  It seems like if someone explicitly added a wide_int
>> and an unsigned HWI that they had a right to have the unsigned hwi not
>> be sign extended.  But if you can show me how to do this, i am happy
>> to go down that road.
> I advocate the infinite precision signed representation as one solution
> to avoid the issues that come up with your implementation (as I currently
> have access to) which has a representation with N bits of precision
> encoded with M <= N bits and no sign information.  That obviously
> leaves operations on numbers of that representation with differing
> N undefined.  You define it by having coded the operations which as far
> as I can see simply assume N is equal for any two operands and
> the effective sign for extending the M-bits encoding to the common
> N-bits precision is "available".  A thorough specification of both
> the encoding scheme and the operation semantics is missing.
This is a perfectly reasonable request.
> I can side-step both of these issues nicely by simply using
> a infinite precision signed representation and requiring the client to
> explicitely truncate / extend to a specific precision when required.
> I also leave open the possibility to have the _encoding_ be always
> the same as an infinite precision signed representation but to always
> require an explicitely specified target precision for each operation
> (which rules out the use of operator overloading).
>
> Citing your example:
>
>    8 * 10 / 4
>
> and transforming it slightly into a commonly used pattern:
>
>    (byte-size * 8 + bit-size) / 8
Patterns like this are generally not encoded in either double int or 
wide-int.   I do not think that anyone has taken the position that all 
math be done in a wide math, only the math on the variables that appear 
in the programs.


> then I argue that what people want here is this carried out in
> _infinite_ precision!  Even if byte-size happens to come from
> a sizetype TREE_INT_CST with 64bit precision.  So either
> choice - having a fixed-precision representation or an
> infinite-precision representation - can and will lead to errors
> done by the programmer.  And as you can easily build a
> finite precision wrapper around an infinite precision implementation
> but not the other way around it's obvious to me what the
> implementation should provide.
>
> Richard.
Infinite precision does get rid of the issue that you have to specify 
the precision at the leaves.    However, for the vast majority 
expression trees that get built, the leaves already have their precision 
in the type or the mode.   On the other hand to keep the math sane, the 
infinite precision requires either external truncation (ugly) or 
specification of the precision and many interior nodes of the expression 
tree (just as ugly as my putting it at the constant leaves).

The other problem, which i invite you to use the full power of your c++ 
sorcery on, is the one where defining an operator so that wide-int + 
unsigned hwi is either rejected or properly zero extended.    If you can 
do this, I will go along with your suggestion that the internal rep 
should be sign extended.  Saying that constants are always sign extended 
seems ok, but there are a huge number of places where we convert 
unsigned hwis as the second operand and i do not want that to be a 
trap.  I went thru a round of this, where i did not post the patch 
because i could not make this work.   And the number of places where you 
want to use an hwi as the second operand dwarfs the number of places 
where you want to use a small integer constant.
>> Kenny
Robert Dewar April 8, 2013, 1:03 p.m. UTC | #15
It may be interesting to look at what we have done in
Ada with regard to overflow in intermediate expressions.
Briefly we allow specification of three modes

all intermediate arithmetic is done in the base type,
with overflow signalled if an intermediate value is
outside this range.

all intermediate arithmetic is done in the widest
integer type, with overflow signalled if an intermediate
value is outside this range.

all intermediate arithmetic uses an infinite precision
arithmetic package built for this purpose.

In the second and third cases we do range analysis that
allows smaller intermediate precision if we know it's
safe.

We also allow separate specification of the mode inside
and outside assertions (e.g. preconditions and postconditions)
since in the latter you often want to regard integers as
mathematical, not subject to intermediate overflow.
Kenneth Zadeck April 8, 2013, 1:15 p.m. UTC | #16
On 04/08/2013 04:56 AM, Florian Weimer wrote:
> On 04/07/2013 07:16 PM, Kenneth Zadeck wrote:
>> The poster child for operations that do not belong to a ring is 
>> division.
>> For my example, I am using 4 bit integers because it makes the
>> examples easy, but similar examples exist for any fixed precision.
>>
>> Consider 8 * 10 / 4
>>
>> in an infinite precision world the result is 20, but in a 4 bit
>> precision world the answer is 0.
>
> I think you mean "4" instead of "20".
>
oops
>> another example is to ask if
>>
>> -10 * 10 is less than 0?
>>
>> again you get a different answer with infinite precision.
>
> Actually, for C/C++ ,you don't—because of undefined signed overflow
> (at least with default compiler flags).  But similar examples with 
> unsigned types exist, so this point isn't too relevant.
>
>> I would argue
>> that if i declare a variable of type uint32 and scale my examples i have
>> the right to expect the compiler to produce the same result as the
>> machine would.
>
> In my very, very limited experience, the signed/unsigned mismatch is 
> more confusing.  With infinite precision, this confusion would not 
> arise (but adjustment would be needed to get limited-precision 
> results, as you write).  With finite precision, you either need 
> separate types for signed/unsigned, or separate operations.
>
I come from a world where people write code where they expect full 
control of the horizon and vertical when they program.   Hank Warren, 
the author of Hacker Delight is in my group and a huge number of those 
tricks require understanding what is going on in the machine.  If the 
compiler decides that it wants to do things differently, you are dead.
>> While C and C++ may have enough wiggle room in their standards so that
>> this is just an unexpected, but legal, result as opposed to being wrong,
>> everyone will hate you (us) if we do this.  Furthermore, Java explicitly
>> does
>> not allow this (not that anyone actually uses gcj).  I do not know
>> enough about go,
>
> Go specified two's-complement signed arithmetic and does not 
> automatically promote to int (i.e., it performs arithmetic in the 
> type, and mixed arguments are not supported).
>
> Go constant arithmetic is infinite precision.
>
> > ada and fortran to say how it would effect them.
>
> Ada requires trapping arithmetic for signed integers.  Currently, this 
> is implemented in the front end.  Arithmetic happens in the base range 
> of a type (which is symmetric around zero and chosen to correspond to 
> a machine type).  Ada allows omitting intermediate overflow checks as 
> long as you produce the infinite precision result (or raise an 
> overflow exception).
>
> I think this applies to Ada constant arithmetic as well.
>
> (GNAT has a mode where comparisons are computed with infinite 
> precision, which is extremely useful for writing bounds checking code.)
>
> Considering the range of different arithmetic operations we need to 
> support, I'm not convinced that the ring model is appropriate.
>
I will answer this in Robert's email.
Robert Dewar April 8, 2013, 1:19 p.m. UTC | #17
On 4/8/2013 9:15 AM, Kenneth Zadeck wrote:

>> I think this applies to Ada constant arithmetic as well.

Ada constant arithmetic (at compile time) is always infinite
precision (for float as well as for integer).
Kenneth Zadeck April 8, 2013, 1:23 p.m. UTC | #18
On 04/08/2013 09:19 AM, Robert Dewar wrote:
> On 4/8/2013 9:15 AM, Kenneth Zadeck wrote:
>
>>> I think this applies to Ada constant arithmetic as well.
>
> Ada constant arithmetic (at compile time) is always infinite
> precision (for float as well as for integer).
>
What do you mean when you say "constant arithmetic"?    Do you mean 
places where there is an explicit 8 * 6 in the source or do you mean any 
arithmetic that a compiler, using the full power of interprocedural 
constant propagation can discover?
Kenneth Zadeck April 8, 2013, 1:24 p.m. UTC | #19
On 04/08/2013 09:03 AM, Robert Dewar wrote:
> It may be interesting to look at what we have done in
> Ada with regard to overflow in intermediate expressions.
> Briefly we allow specification of three modes
>
> all intermediate arithmetic is done in the base type,
> with overflow signalled if an intermediate value is
> outside this range.
>
> all intermediate arithmetic is done in the widest
> integer type, with overflow signalled if an intermediate
> value is outside this range.
>
> all intermediate arithmetic uses an infinite precision
> arithmetic package built for this purpose.
>
> In the second and third cases we do range analysis that
> allows smaller intermediate precision if we know it's
> safe.
>
> We also allow separate specification of the mode inside
> and outside assertions (e.g. preconditions and postconditions)
> since in the latter you often want to regard integers as
> mathematical, not subject to intermediate overflow.
So then how does a language like ada work in gcc?   My assumption is 
that most of what you describe here is done in the front end and by the 
time you get to the middle end of the compiler, you have chosen types 
for which you are comfortable to have any remaining math done in along 
with explicit checks for overflow where the programmer asked for them.

Otherwise, how could ada have ever worked with gcc?

kenny
Robert Dewar April 8, 2013, 1:36 p.m. UTC | #20
On 4/8/2013 9:24 AM, Kenneth Zadeck wrote:

> So then how does a language like ada work in gcc?   My assumption is
> that most of what you describe here is done in the front end and by the
> time you get to the middle end of the compiler, you have chosen types
> for which you are comfortable to have any remaining math done in along
> with explicit checks for overflow where the programmer asked for them.

That's right, the front end does all the promotion of types
>
> Otherwise, how could ada have ever worked with gcc?

Sometimes we do have to make changes to gcc to accomodate Ada
specific requirements, but this was not one of those cases. Of
course the back end would do a better job of the range analysis
to remove some unnecessary use of infinite precision, but the
front end in practice does a good enough job.
Robert Dewar April 8, 2013, 1:52 p.m. UTC | #21
On 4/8/2013 9:23 AM, Kenneth Zadeck wrote:
> On 04/08/2013 09:19 AM, Robert Dewar wrote:
>> On 4/8/2013 9:15 AM, Kenneth Zadeck wrote:
>>
>>>> I think this applies to Ada constant arithmetic as well.
>>
>> Ada constant arithmetic (at compile time) is always infinite
>> precision (for float as well as for integer).
>>
> What do you mean when you say "constant arithmetic"?    Do you mean
> places where there is an explicit 8 * 6 in the source or do you mean any
> arithmetic that a compiler, using the full power of interprocedural
> constant propagation can discover?

Somewhere between the two. Ada has a very well defined notion of
what is and what is not a "static expression", it definitely does not
include everything the compiler can discover, but it goes beyond just
explicit literal arithmetic, e.g. declared constants

    X : Integer := 75;

are considered static. It is static expressions that must be computed
with full precision at compile time. For expressions the compiler can
tell are constant even though not officially static, it is fine to
compute at compile time for integer, but NOT for float, since you want
to use target precision for all non-static float-operations.
>
Kenneth Zadeck April 8, 2013, 1:58 p.m. UTC | #22
On 04/08/2013 09:52 AM, Robert Dewar wrote:
> On 4/8/2013 9:23 AM, Kenneth Zadeck wrote:
>> On 04/08/2013 09:19 AM, Robert Dewar wrote:
>>> On 4/8/2013 9:15 AM, Kenneth Zadeck wrote:
>>>
>>>>> I think this applies to Ada constant arithmetic as well.
>>>
>>> Ada constant arithmetic (at compile time) is always infinite
>>> precision (for float as well as for integer).
>>>
>> What do you mean when you say "constant arithmetic"?    Do you mean
>> places where there is an explicit 8 * 6 in the source or do you mean any
>> arithmetic that a compiler, using the full power of interprocedural
>> constant propagation can discover?
>
> Somewhere between the two. Ada has a very well defined notion of
> what is and what is not a "static expression", it definitely does not
> include everything the compiler can discover, but it goes beyond just
> explicit literal arithmetic, e.g. declared constants
>
>    X : Integer := 75;
>
I actually guessed that it was something like this but i did not want to 
spend the time trying to figure this bit of ada syntax out.
> are considered static. It is static expressions that must be computed
> with full precision at compile time. For expressions the compiler can
> tell are constant even though not officially static, it is fine to
> compute at compile time for integer, but NOT for float, since you want
> to use target precision for all non-static float-operations.
>>
>
yes but the relevant question for the not officially static integer 
constants is "in what precision are those operations to be performed 
in?    I assume that you choose gcc types for these operations and you 
expect the math to be done within that type, i.e. exactly the way you 
expect the machine to perform.
Robert Dewar April 8, 2013, 2:12 p.m. UTC | #23
On 4/8/2013 9:58 AM, Kenneth Zadeck wrote:

> yes but the relevant question for the not officially static integer
> constants is "in what precision are those operations to be performed
> in?    I assume that you choose gcc types for these operations and you
> expect the math to be done within that type, i.e. exactly the way you
> expect the machine to perform.

As I explained in an earlier message, *within* a single expression, we
are free to use higher precision, and we provide modes that allow this
up to and including the usea of infinite precision. That applies not
just to constant expressions but to all expressions.
>
Kenneth Zadeck April 8, 2013, 2:26 p.m. UTC | #24
On 04/08/2013 10:12 AM, Robert Dewar wrote:
> On 4/8/2013 9:58 AM, Kenneth Zadeck wrote:
>
>> yes but the relevant question for the not officially static integer
>> constants is "in what precision are those operations to be performed
>> in?    I assume that you choose gcc types for these operations and you
>> expect the math to be done within that type, i.e. exactly the way you
>> expect the machine to perform.
>
> As I explained in an earlier message, *within* a single expression, we
> are free to use higher precision, and we provide modes that allow this
> up to and including the usea of infinite precision. That applies not
> just to constant expressions but to all expressions.
>>
>
My confusion is what you mean by "we"?   Do you mean "we" the writer of 
the program, "we" the person invoking the compiler by the use command 
line options or "we", your company's implementation of ada?

My interpretation of your first email was that it was possible for the 
programmer to do something equivalent to adding attributes surrounding a 
block in the program to control the precision and overflow detection of 
the expressions in the block.   And if this is so, then by the time the 
expression is seen by the middle end of gcc, those attributes will have 
been converted into tree code will evaluate the code in a well defined 
way by both the optimization passes and the target machine.

Kenny
Richard Biener April 8, 2013, 2:34 p.m. UTC | #25
On Mon, Apr 8, 2013 at 2:43 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>
> On 04/08/2013 06:46 AM, Richard Biener wrote:
>>
>> On Sun, Apr 7, 2013 at 7:16 PM, Kenneth Zadeck <zadeck@naturalbridge.com>
>> wrote:
>>>
>>> Richard,
>>>
>>> You advocate that I should be using an infinite precision
>>> representation and I advocate a finite precision representation where
>>> the precision is taken from the context.  I would like to make the
>>> case for my position here, in a separate thread, because the other
>>> thread is just getting too messy.
>>>
>>> At both the tree level and the rtl level you have a type (mode is just
>>> bad rep for types) and both of those explicitly have precisions. The
>>> semantics of the programming languages that we implement define, or at
>>> least recommend, that most operations be done in a precision that is
>>> implementation dependent (or like java a particular machine
>>> independent precision).  Each hardware platform specifies exactly how
>>> every operation is done.  I will admit that infinite precision is more
>>> esthetically pleasing than what i have done, but exact precision
>>> matches the needs of these clients.  The problem is that the results
>>> from infinite precision arithmetic differ in many significant ways
>>> from finite precision math.  And the number of places where you have
>>> to inject a precision to get the expected answer, ultimately makes the
>>> infinite precision representation unattractive.
>>>
>>> As I said on Thursday, whenever you do operations that do not satisfy
>>> the requirements of a mathematical ring (add sub and mul are in a
>>> ring, divide, shift and comparisons are not) you run the risk of
>>> getting a result that is not what would have been obtained with either
>>> a strict interpretation of the semantics or the machine. Intuitively
>>> any operation that looks at the bits above the precision does not
>>> qualify as an operation that works in a ring.
>>>
>>> The poster child for operations that do not belong to a ring is division.
>>> For my example, I am using 4 bit integers because it makes the
>>> examples easy, but similar examples exist for any fixed precision.
>>>
>>> Consider 8 * 10 / 4
>>>
>>> in an infinite precision world the result is 20, but in a 4 bit
>>> precision world the answer is 0.
>>>
>>> another example is to ask if
>>>
>>> -10 * 10 is less than 0?
>>>
>>> again you get a different answer with infinite precision.   I would argue
>>> that if i declare a variable of type uint32 and scale my examples i have
>>> the right to expect the compiler to produce the same result as the
>>> machine would.
>>>
>>> While C and C++ may have enough wiggle room in their standards so that
>>> this is just an unexpected, but legal, result as opposed to being wrong,
>>> everyone will hate you (us) if we do this.  Furthermore, Java explicitly
>>> does
>>> not allow this (not that anyone actually uses gcj).  I do not know
>>> enough about go, ada and fortran to say how it would effect them.
>>>
>>> In looking at the double-int class, the only operation that does not
>>> fit in a ring that is done properly is shifting.  There we explicitly
>>> pass in the precision.
>>>
>>> The reason that we rarely see this kind of problem even though
>>> double-int implements 128 bit infinite precision is that currently
>>> very little of the compiler actually uses infinite precision in a
>>> robust way.   In a large number of places, the code looks like:
>>>
>>> if (TYPE_PRECISION (TREE_TYPE (...)) < HOST_BITS_PER_WIDE_INT)
>>>     do something using inline operators.
>>> else
>>>     either do not do something or use const-double,
>>>
>>> such code clears out most of these issues before the two passes that
>>> embrace infinite precision get a chance to do much damage.  However,
>>> my patch at the rtl level gets rid of most of this kind of code and
>>> replaces it with calls to wide-int that currently uses only operations
>>> within the precision.  I assume that if i went down the infinite
>>> precision road at the tree level, that all of this would come to the
>>> surface very quickly.  I prefer to not change my rep and not have to
>>> deal with this later.
>>>
>>> Add, subtract, multiply and the logicals are all safe.  But divide,
>>> remainder, and all of the comparisons need explicit precisions.  In
>>> addition operations like clz, ctl and clrsb need precisions.  In total
>>> about half of the functions would need a precision passed in.  My
>>> point is that once you have to start passing in the precision in for all
>>> of those operations, it seems to be cleaner to get the precision from
>>> the leaves of the tree as I currently do.
>>>
>>> Once you buy into the math in a particular precision world, a lot of
>>> the other issues that you raise are just settled.  Asking how to extend
>>> a value beyond it's precision is like asking what the universe was like
>>> before
>>> the big bang.  It is just something you do not need to know.
>>>
>>> I understand that you would like to have functions like x + 1 work,
>>> and so do I. I just could not figure out how to make them have
>>> unsurprising semantics.  In particular, g++ did not seem to be happy
>>> with me defining two plus operators, one for each of signed and
>>> unsigned HWIs.  It seems like if someone explicitly added a wide_int
>>> and an unsigned HWI that they had a right to have the unsigned hwi not
>>> be sign extended.  But if you can show me how to do this, i am happy
>>> to go down that road.
>>
>> I advocate the infinite precision signed representation as one solution
>> to avoid the issues that come up with your implementation (as I currently
>> have access to) which has a representation with N bits of precision
>> encoded with M <= N bits and no sign information.  That obviously
>> leaves operations on numbers of that representation with differing
>> N undefined.  You define it by having coded the operations which as far
>> as I can see simply assume N is equal for any two operands and
>> the effective sign for extending the M-bits encoding to the common
>> N-bits precision is "available".  A thorough specification of both
>> the encoding scheme and the operation semantics is missing.
>
> This is a perfectly reasonable request.
>
>> I can side-step both of these issues nicely by simply using
>> a infinite precision signed representation and requiring the client to
>> explicitely truncate / extend to a specific precision when required.
>> I also leave open the possibility to have the _encoding_ be always
>> the same as an infinite precision signed representation but to always
>> require an explicitely specified target precision for each operation
>> (which rules out the use of operator overloading).
>>
>> Citing your example:
>>
>>    8 * 10 / 4
>>
>> and transforming it slightly into a commonly used pattern:
>>
>>    (byte-size * 8 + bit-size) / 8
>
> Patterns like this are generally not encoded in either double int or
> wide-int.   I do not think that anyone has taken the position that all math
> be done in a wide math, only the math on the variables that appear in the
> programs.

Especially sizetype arithmetic - which appears in programs - is generally
done in double_int _exactly_ to cater for the above issue.  And that
assumes (usually without checking) that a double-int can encode
sizeof(size_t) * 8 + 3 bits in precision.  Which would not be true for
a 32bit HWI host and a 64bit target (which is why we have need_hwint64).

>> then I argue that what people want here is this carried out in
>> _infinite_ precision!  Even if byte-size happens to come from
>> a sizetype TREE_INT_CST with 64bit precision.  So either
>> choice - having a fixed-precision representation or an
>> infinite-precision representation - can and will lead to errors
>> done by the programmer.  And as you can easily build a
>> finite precision wrapper around an infinite precision implementation
>> but not the other way around it's obvious to me what the
>> implementation should provide.
>>
>> Richard.
>
> Infinite precision does get rid of the issue that you have to specify the
> precision at the leaves.    However, for the vast majority expression trees
> that get built, the leaves already have their precision in the type or the
> mode.   On the other hand to keep the math sane, the infinite precision
> requires either external truncation (ugly) or specification of the precision
> and many interior nodes of the expression tree (just as ugly as my putting
> it at the constant leaves).

Indeed the only "pretty" operation is with fully infinite precision computes.
But as you most of the time have a single operation per stmt performing
one truncation/extension for the result isn't that bad.  And only supporting
explicit truncation/extension is the 2nd most "pretty" option.  Certainly
the need to provide a precision when constructing wide_int::one () isn't
pretty either ...

> The other problem, which i invite you to use the full power of your c++
> sorcery on, is the one where defining an operator so that wide-int +
> unsigned hwi is either rejected or properly zero extended.    If you can do
> this, I will go along with your suggestion that the internal rep should be
> sign extended.

That's very easy with template specialization (not with overloading as
you might have figured out).

typdef signed long HOST_WIDE_INT;
template <typename T>
struct to_shwi
{
  void to_shwi (HOST_WIDE_INT *s, HOST_WIDE_INT x)
    { s[0] = x; }
};

template <>
struct to_shwi<unsigned HOST_WIDE_INT>
{
  void to_shwi (HOST_WIDE_INT *s, unsigned HOST_WIDE_INT x)
    { s[0] = x; s[1] = 0; }
};

gives a method to properly initialize the sign-extended HWI representation
from any integer type.  Needs additional specializations if init from
long long is suported and long long is bigger than HOST_WIDE_INT.

That is, I suppose your issue was stuff like

   wide_int operator+(unsigned HOST_WIDE_INT);
   wide_int operator+(HOST_WIDE_INT);

then you'd write that as

   template <class T>
   wide_int operator+(T);

and in the implementation use the to_swhi class (or a class providing
exactly what you need) that is specialized for the unsigned HOST_WIDE_INT
case (smaller operands automatically are promoted in the right way to
signed HOST_WIDE_INT).

>  Saying that constants are always sign extended seems ok, but
> there are a huge number of places where we convert unsigned hwis as the
> second operand and i do not want that to be a trap.  I went thru a round of
> this, where i did not post the patch because i could not make this work.
> And the number of places where you want to use an hwi as the second operand
> dwarfs the number of places where you want to use a small integer constant.

See above - this isn't an issue if you use templates for the "overloads".

Richard.

>>>
>>> Kenny
>
>
Robert Dewar April 8, 2013, 2:35 p.m. UTC | #26
On 4/8/2013 10:26 AM, Kenneth Zadeck wrote:

> My confusion is what you mean by "we"?   Do you mean "we" the writer of
> the program, "we" the person invoking the compiler by the use command
> line options or "we", your company's implementation of ada?

Sorry, bad usage, The gcc implementation of Ada allows the user to
specify by pragmas how intermediate overflow is handled.
>
> My interpretation of your first email was that it was possible for the
> programmer to do something equivalent to adding attributes surrounding a
> block in the program to control the precision and overflow detection of
> the expressions in the block.   And if this is so, then by the time the
> expression is seen by the middle end of gcc, those attributes will have
> been converted into tree code will evaluate the code in a well defined
> way by both the optimization passes and the target machine.

Yes, that's a correct understanding
>
> Kenny
>
Lawrence Crowl April 8, 2013, 8:39 p.m. UTC | #27
On 4/8/13, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> The other problem, which i invite you to use the full power of
> your c++ sorcery on, is the one where defining an operator so
> that wide-int + unsigned hwi is either rejected or properly
> zero extended.  If you can do this, I will go along with
> your suggestion that the internal rep should be sign extended.
> Saying that constants are always sign extended seems ok, but there
> are a huge number of places where we convert unsigned hwis as
> the second operand and i do not want that to be a trap.  I went
> thru a round of this, where i did not post the patch because i
> could not make this work.  And the number of places where you
> want to use an hwi as the second operand dwarfs the number of
> places where you want to use a small integer constant.

You can use overloading, as in the following, which actually ignores
handling the sign in the representation.

class number {
        unsigned int rep1;
        int representation;
public:
        number(int arg) : representation(arg) {}
        number(unsigned int arg) : representation(arg) {}
        friend number operator+(number, int);
        friend number operator+(number, unsigned int);
        friend number operator+(int, number);
        friend number operator+(unsigned int, number);
};

number operator+(number n, int si) {
    return n.representation + si;
}

number operator+(number n, unsigned int ui) {
    return n.representation + ui;
}

number operator+(int si, number n) {
    return n.representation + si;
}

number operator+(unsigned int ui, number n) {
    return n.representation + ui;
}

If the argument type is of a template type parameter, then
you can test the template type via

    if (std::is_signed<T>::value)
      .... // sign extend
    else
      .... // zero extend

See http://www.cplusplus.com/reference/type_traits/is_signed/.

If you want to handle non-builtin types that are asigne dor unsigned,
then you need to add a specialization for is_signed.
Lawrence Crowl April 8, 2013, 9 p.m. UTC | #28
On 4/8/13, Richard Biener <richard.guenther@gmail.com> wrote:
> I advocate the infinite precision signed representation as one
> solution to avoid the issues that come up with your implementation
> (as I currently have access to) which has a representation
> with N bits of precision encoded with M <= N bits and no sign
> information.  That obviously leaves operations on numbers of
> that representation with differing N undefined.  You define it
> by having coded the operations which as far as I can see simply
> assume N is equal for any two operands and the effective sign for
> extending the M-bits encoding to the common N-bits precision is
> "available".  A thorough specification of both the encoding scheme
> and the operation semantics is missing.  I can side-step both of
> these issues nicely by simply using a infinite precision signed
> representation and requiring the client to explicitely truncate /
> extend to a specific precision when required.  I also leave open
> the possibility to have the _encoding_ be always the same as an
> infinite precision signed representation but to always require
> an explicitely specified target precision for each operation
> (which rules out the use of operator overloading).

For efficiency, the machine representation of an infinite precision
number should allow for a compact one-word representation.

  class infinite
  {
    int length;
    union representation
    {
         int inside_word;
         int *outside_words;
    } field;
  public:
    int mod_one_word()
    {
      if (length == 1)
        return field.inside_word;
      else
        return field.outside_word[0];
    }
  };

Also for efficiency, you want to know the modulus at the time you
do the last normal operation on it, not as a subsequent operation.

> Citing your example:
>
>   8 * 10 / 4
>
> and transforming it slightly into a commonly used pattern:
>
>   (byte-size * 8 + bit-size) / 8
>
> then I argue that what people want here is this carried out in
> _infinite_ precision!

But what people want isn't really relevant, what is relevant is
what the language and/or compatiblity requires.  Ideally, gcc
should accurately represent languages with both finite size and
infinite size.

> Even if byte-size happens to come from
> a sizetype TREE_INT_CST with 64bit precision.  So either
> choice - having a fixed-precision representation or an
> infinite-precision representation - can and will lead to errors
> done by the programmer.  And as you can easily build a
> finite precision wrapper around an infinite precision implementation
> but not the other way around it's obvious to me what the
> implementation should provide.

IIUC, the issue here is not the logical chain of implementation, but
the interface that is most helpful to the programmers in getting to
performant, correct code.  I expect we need the infinite precision
forms, but also that having more concise coding for fixed-precision
would be helpful.

For mixed operations, all the languages that I know of promote
smaller operands to larger operands, so I think a reasonable
definition is possible here.
Lawrence Crowl April 8, 2013, 9:12 p.m. UTC | #29
On 4/8/13, Robert Dewar <dewar@adacore.com> wrote:
> On 4/8/2013 10:26 AM, Kenneth Zadeck wrote:
> > On 04/08/2013 10:12 AM, Robert Dewar wrote:
> > > On 4/8/2013 9:58 AM, Kenneth Zadeck wrote:
> > > > yes but the relevant question for the not officially
> > > > static integer constants is "in what precision are those
> > > > operations to be performed in?  I assume that you choose
> > > > gcc types for these operations and you expect the math to
> > > > be done within that type, i.e. exactly the way you expect
> > > > the machine to perform.
> > >
> > > As I explained in an earlier message, *within* a single
> > > expression, we are free to use higher precision, and we provide
> > > modes that allow this up to and including the usea of infinite
> > > precision. That applies not just to constant expressions but
> > > to all expressions.
> >
> > My confusion is what you mean by "we"?  Do you mean "we" the
> > writer of the program, "we" the person invoking the compiler
> > by the use command line options or "we", your company's
> > implementation of ada?
>
> Sorry, bad usage, The gcc implementation of Ada allows the user
> to specify by pragmas how intermediate overflow is handled.

Correct me if I'm wrong, but the Ada standard doesn't require any
particular maximum evaluation precision, but only that you get an
exception if the values exceed the chosen maximum.

> > My interpretation of your first email was that it was possible
> > for the programmer to do something equivalent to adding
> > attributes surrounding a block in the program to control the
> > precision and overflow detection of the expressions in the block.
> > And if this is so, then by the time the expression is seen by
> > the middle end of gcc, those attributes will have been converted
> > into tree code will evaluate the code in a well defined way by
> > both the optimization passes and the target machine.
>
> Yes, that's a correct understanding

In essence, you have moved some of the optimization from the back
end to the front end.  Correct?
Robert Dewar April 8, 2013, 9:36 p.m. UTC | #30
On 4/8/2013 5:12 PM, Lawrence Crowl wrote:

(BTW, you *really* don't need to quote entire messages, I find
it rather redundant for the entire thread to be in every message,
we all have thread following mail readers!)

> Correct me if I'm wrong, but the Ada standard doesn't require any
> particular maximum evaluation precision, but only that you get an
> exception if the values exceed the chosen maximum.

Right, that's at run-time, at compile-time for static expressions,
infinite precision is required.

But at run-time, all three of the modes we provide are
standard conforming.

> In essence, you have moved some of the optimization from the back
> end to the front end.  Correct?

Sorry, I don't quite understand that. If you are syaing that the
back end could handle this widening for intermediate values, sure
it could, this is the kind of thing that can be done at various
different places.
>
Kenneth Zadeck April 8, 2013, 9:46 p.m. UTC | #31
In some sense you have to think in terms of three worlds:
1) what you call "compile-time static expressions" is one world which in 
gcc is almost always done by the front ends.
2) the second world is what the optimizers can do.   This is not 
compile-time static expressions because that is what the front end has 
already done.
3) there is run time.

My view on this is that optimization is just doing what is normally done 
at run time but doing it early.   From that point of view, we are if not 
required, morally obligated to do thing in the same way that the 
hardware would have done them.    This is why i am so against richi on 
wanting to do infinite precision.    By the time the middle or the back 
end sees the representation, all of the things that are allowed to be 
done in infinite precision have already been done.   What we are left 
with is a (mostly) strongly typed language that pretty much says exactly 
what must be done. Anything that we do in the middle end or back ends in 
infinite precision will only surprise the programmer and make them want 
to use llvm.

Kenny

On 04/08/2013 05:36 PM, Robert Dewar wrote:
> On 4/8/2013 5:12 PM, Lawrence Crowl wrote:
>
> (BTW, you *really* don't need to quote entire messages, I find
> it rather redundant for the entire thread to be in every message,
> we all have thread following mail readers!)
>
>> Correct me if I'm wrong, but the Ada standard doesn't require any
>> particular maximum evaluation precision, but only that you get an
>> exception if the values exceed the chosen maximum.
>
> Right, that's at run-time, at compile-time for static expressions,
> infinite precision is required.
>
> But at run-time, all three of the modes we provide are
> standard conforming.
>
>> In essence, you have moved some of the optimization from the back
>> end to the front end.  Correct?
>
> Sorry, I don't quite understand that. If you are syaing that the
> back end could handle this widening for intermediate values, sure
> it could, this is the kind of thing that can be done at various
> different places.
>>
>
Robert Dewar April 8, 2013, 9:48 p.m. UTC | #32
On 4/8/2013 5:46 PM, Kenneth Zadeck wrote:
> In some sense you have to think in terms of three worlds:
> 1) what you call "compile-time static expressions" is one world which in
> gcc is almost always done by the front ends.
> 2) the second world is what the optimizers can do.   This is not
> compile-time static expressions because that is what the front end has
> already done.
> 3) there is run time.
>
> My view on this is that optimization is just doing what is normally done
> at run time but doing it early.   From that point of view, we are if not
> required, morally obligated to do thing in the same way that the
> hardware would have done them.    This is why i am so against richi on
> wanting to do infinite precision.    By the time the middle or the back
> end sees the representation, all of the things that are allowed to be
> done in infinite precision have already been done.   What we are left
> with is a (mostly) strongly typed language that pretty much says exactly
> what must be done. Anything that we do in the middle end or back ends in
> infinite precision will only surprise the programmer and make them want
> to use llvm.

That may be so in C, in Ada it would be perfectly reasonable to use
infinite precision for intermediate results in some cases, since the
language standard specifically encourages this approach.
Mike Stump April 8, 2013, 10:34 p.m. UTC | #33
On Apr 8, 2013, at 2:48 PM, Robert Dewar <dewar@adacore.com> wrote:
> That may be so in C, in Ada it would be perfectly reasonable to use
> infinite precision for intermediate results in some cases, since the
> language standard specifically encourages this approach.

gcc lacks an infinite precision plus operator?!  :-)
Robert Dewar April 8, 2013, 10:45 p.m. UTC | #34
On 4/8/2013 6:34 PM, Mike Stump wrote:
> On Apr 8, 2013, at 2:48 PM, Robert Dewar <dewar@adacore.com> wrote:
>> That may be so in C, in Ada it would be perfectly reasonable to use
>> infinite precision for intermediate results in some cases, since the
>> language standard specifically encourages this approach.
>
> gcc lacks an infinite precision plus operator?!  :-)
>
Right, that's why we do everything in the front end in the
case of Ada. But it would be perfectly reasonable for the
back end to do this substitution.
Kenneth Zadeck April 8, 2013, 11:46 p.m. UTC | #35
On 04/08/2013 06:45 PM, Robert Dewar wrote:
> On 4/8/2013 6:34 PM, Mike Stump wrote:
>> On Apr 8, 2013, at 2:48 PM, Robert Dewar <dewar@adacore.com> wrote:
>>> That may be so in C, in Ada it would be perfectly reasonable to use
>>> infinite precision for intermediate results in some cases, since the
>>> language standard specifically encourages this approach.
>>
>> gcc lacks an infinite precision plus operator?!  :-)
>>
> Right, that's why we do everything in the front end in the
> case of Ada. But it would be perfectly reasonable for the
> back end to do this substitution.
but there is no way in the current tree language to convey which ones 
you can and which ones you cannot.
Robert Dewar April 8, 2013, 11:47 p.m. UTC | #36
On 4/8/2013 7:46 PM, Kenneth Zadeck wrote:
>
> On 04/08/2013 06:45 PM, Robert Dewar wrote:
>> On 4/8/2013 6:34 PM, Mike Stump wrote:
>>> On Apr 8, 2013, at 2:48 PM, Robert Dewar <dewar@adacore.com> wrote:
>>>> That may be so in C, in Ada it would be perfectly reasonable to use
>>>> infinite precision for intermediate results in some cases, since the
>>>> language standard specifically encourages this approach.
>>>
>>> gcc lacks an infinite precision plus operator?!  :-)
>>>
>> Right, that's why we do everything in the front end in the
>> case of Ada. But it would be perfectly reasonable for the
>> back end to do this substitution.
> but there is no way in the current tree language to convey which ones
> you can and which ones you cannot.

Well the back end has all the information to figure this out I think!
But anyway, for Ada, the current situation is just fine, and has
the advantage that the -gnatG expanded code listing clearly shows in
Ada source form, what is going on.
>
Richard Biener April 9, 2013, 9:02 a.m. UTC | #37
On Mon, Apr 8, 2013 at 10:39 PM, Lawrence Crowl <crowl@google.com> wrote:
> On 4/8/13, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> The other problem, which i invite you to use the full power of
>> your c++ sorcery on, is the one where defining an operator so
>> that wide-int + unsigned hwi is either rejected or properly
>> zero extended.  If you can do this, I will go along with
>> your suggestion that the internal rep should be sign extended.
>> Saying that constants are always sign extended seems ok, but there
>> are a huge number of places where we convert unsigned hwis as
>> the second operand and i do not want that to be a trap.  I went
>> thru a round of this, where i did not post the patch because i
>> could not make this work.  And the number of places where you
>> want to use an hwi as the second operand dwarfs the number of
>> places where you want to use a small integer constant.
>
> You can use overloading, as in the following, which actually ignores
> handling the sign in the representation.
>
> class number {
>         unsigned int rep1;
>         int representation;
> public:
>         number(int arg) : representation(arg) {}
>         number(unsigned int arg) : representation(arg) {}
>         friend number operator+(number, int);
>         friend number operator+(number, unsigned int);
>         friend number operator+(int, number);
>         friend number operator+(unsigned int, number);
> };
>
> number operator+(number n, int si) {
>     return n.representation + si;
> }
>
> number operator+(number n, unsigned int ui) {
>     return n.representation + ui;
> }
>
> number operator+(int si, number n) {
>     return n.representation + si;
> }
>
> number operator+(unsigned int ui, number n) {
>     return n.representation + ui;
> }

That does not work for types larger than int/unsigned int as HOST_WIDE_INT
usually is (it's long / unsigned long).  When you pass an int or unsigned int
to

number operator+(unsigned long ui, number n);
number operator+(long ui, number n)

you get an ambiguity.  You can "fix" that by relying on template argument
deduction and specialization instead of on overloading and integer conversion
rules.

> If the argument type is of a template type parameter, then
> you can test the template type via
>
>     if (std::is_signed<T>::value)
>       .... // sign extend
>     else
>       .... // zero extend
>
> See http://www.cplusplus.com/reference/type_traits/is_signed/.

Yes, if we want to use the standard library.  For what integer types
is std::is_signed required to be implemented in C++98 (long long?)?
Consider non-GCC host compilers.

Richard.

> If you want to handle non-builtin types that are asigne dor unsigned,
> then you need to add a specialization for is_signed.
>
> --
> Lawrence Crowl
Florian Weimer April 9, 2013, 9:39 a.m. UTC | #38
On 04/09/2013 01:47 AM, Robert Dewar wrote:
> Well the back end has all the information to figure this out I think!
> But anyway, for Ada, the current situation is just fine, and has
> the advantage that the -gnatG expanded code listing clearly shows in
> Ada source form, what is going on.

Isn't this a bit optimistic, considering that run-time overflow checking 
currently does not use existing hardware support?
Robert Dewar April 9, 2013, 12:41 p.m. UTC | #39
On 4/9/2013 5:39 AM, Florian Weimer wrote:
> On 04/09/2013 01:47 AM, Robert Dewar wrote:
>> Well the back end has all the information to figure this out I think!
>> But anyway, for Ada, the current situation is just fine, and has
>> the advantage that the -gnatG expanded code listing clearly shows in
>> Ada source form, what is going on.
>
> Isn't this a bit optimistic, considering that run-time overflow checking
> currently does not use existing hardware support?

Not clear what you mean here, we don't rely on the back end for run-time
overflow checking. What is over-optimistic here?

BTW, existing hardware support can be a dubious thing, you have
to be careful to evaluate costs, for instance you don't want to
use INTO on modern x86 targets!
>
Florian Weimer April 9, 2013, 2:05 p.m. UTC | #40
On 04/09/2013 02:41 PM, Robert Dewar wrote:
> On 4/9/2013 5:39 AM, Florian Weimer wrote:
>> On 04/09/2013 01:47 AM, Robert Dewar wrote:
>>> Well the back end has all the information to figure this out I think!
>>> But anyway, for Ada, the current situation is just fine, and has
>>> the advantage that the -gnatG expanded code listing clearly shows in
>>> Ada source form, what is going on.
>>
>> Isn't this a bit optimistic, considering that run-time overflow checking
>> currently does not use existing hardware support?
>
> Not clear what you mean here, we don't rely on the back end for run-time
> overflow checking. What is over-optimistic here?
>
> BTW, existing hardware support can be a dubious thing, you have
> to be careful to evaluate costs, for instance you don't want to
> use INTO on modern x86 targets!

It's not aobut INTO, just access to the overflow flag.

A simple function which adds its two Integer arguments compiles to this 
machine code (with -gnato -O2):

	.cfi_startproc
	movslq	%esi, %rax
	movslq	%edi, %rdx
	addq	%rax, %rdx
	movl	$2147483648, %eax
	addq	%rax, %rdx
	movl	$4294967295, %eax
	cmpq	%rax, %rdx
	ja	.L5
	leal	(%rsi,%rdi), %eax
	ret
.L5:
	pushq	%rax
	.cfi_def_cfa_offset 16
	movl	$3, %esi
	movl	$.LC0, %edi
	xorl	%eax, %eax
	call	__gnat_rcheck_10
	.cfi_endproc

While it could be like this:

	.cfi_startproc
	movl    %edi, %eax
	addl    %esi, %eax
	jo	.L5
	ret
.L5:
	pushq	%rax
	.cfi_def_cfa_offset 16
	movl	$3, %esi
	movl	$.LC0, %edi
	xorl	%eax, %eax
	call	__gnat_rcheck_10
	.cfi_endproc

Admittedly, I haven't benchmarked it, but at least from the code size 
aspect, it's a significant improvement.
Richard Biener April 10, 2013, 7:38 a.m. UTC | #41
On Tue, Apr 9, 2013 at 5:36 PM, Lawrence Crowl <crowl@google.com> wrote:
>
> On Apr 9, 2013 2:02 AM, "Richard Biener" <richard.guenther@gmail.com> wrote:
>>
>> On Mon, Apr 8, 2013 at 10:39 PM, Lawrence Crowl <crowl@google.com> wrote:
>> > On 4/8/13, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> >> The other problem, which i invite you to use the full power of
>> >> your c++ sorcery on, is the one where defining an operator so
>> >> that wide-int + unsigned hwi is either rejected or properly
>> >> zero extended.  If you can do this, I will go along with
>> >> your suggestion that the internal rep should be sign extended.
>> >> Saying that constants are always sign extended seems ok, but there
>> >> are a huge number of places where we convert unsigned hwis as
>> >> the second operand and i do not want that to be a trap.  I went
>> >> thru a round of this, where i did not post the patch because i
>> >> could not make this work.  And the number of places where you
>> >> want to use an hwi as the second operand dwarfs the number of
>> >> places where you want to use a small integer constant.
>> >
>> > You can use overloading, as in the following, which actually ignores
>> > handling the sign in the representation.
>> >
>> > class number {
>> >         unsigned int rep1;
>> >         int representation;
>> > public:
>> >         number(int arg) : representation(arg) {}
>> >         number(unsigned int arg) : representation(arg) {}
>> >         friend number operator+(number, int);
>> >         friend number operator+(number, unsigned int);
>> >         friend number operator+(int, number);
>> >         friend number operator+(unsigned int, number);
>> > };
>> >
>> > number operator+(number n, int si) {
>> >     return n.representation + si;
>> > }
>> >
>> > number operator+(number n, unsigned int ui) {
>> >     return n.representation + ui;
>> > }
>> >
>> > number operator+(int si, number n) {
>> >     return n.representation + si;
>> > }
>> >
>> > number operator+(unsigned int ui, number n) {
>> >     return n.representation + ui;
>> > }
>>
>> That does not work for types larger than int/unsigned int as HOST_WIDE_INT
>> usually is (it's long / unsigned long).  When you pass an int or unsigned
>> int
>> to
>>
>> number operator+(unsigned long ui, number n);
>> number operator+(long ui, number n)
>>
>> you get an ambiguity.  You can "fix" that by relying on template argument
>> deduction and specialization instead of on overloading and integer
>> conversion
>> rules.
>
> Ah, I hadn't quite gotten the point. This problem is being fixed in the
> standard, but that won't help GCC anytime soon.
>
>>
>> > If the argument type is of a template type parameter, then
>> > you can test the template type via
>> >
>> >     if (std::is_signed<T>::value)
>> >       .... // sign extend
>> >     else
>> >       .... // zero extend
>> >
>> > See http://www.cplusplus.com/reference/type_traits/is_signed/.
>>
>> Yes, if we want to use the standard library.  For what integer types
>> is std::is_signed required to be implemented in C++98 (long long?)?
>
> It is in C++03/TR1, which is our base requirement. Otherwise, we can test
> ~(T)0<(T)0.

Yeah, I think we want to test ~(T)0<(T)0 here.  Relying on C++03/TR1 is
too obscure if there is an easy workaround.

Richard.

>> Consider non-GCC host compilers.
>>
>> Richard.
>>
>> > If you want to handle non-builtin types that are asigne dor unsigned,
>> > then you need to add a specialization for is_signed.
>> >
>> > --
>> > Lawrence Crowl
Mike Stump April 10, 2013, 4:02 p.m. UTC | #42
On Apr 10, 2013, at 12:38 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> Yeah, I think we want to test ~(T)0<(T)0 here.

Thanks Lawrence, in the next version of the patch, you will discover this at the bottom if you look hard.  :-)
Kenneth Zadeck April 10, 2013, 4:03 p.m. UTC | #43
On 04/10/2013 12:02 PM, Mike Stump wrote:
> On Apr 10, 2013, at 12:38 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> Yeah, I think we want to test ~(T)0<(T)0 here.
> Thanks Lawrence, in the next version of the patch, you will discover this at the bottom if you look hard.  :-)
actually closer to the middle.
Richard Sandiford April 22, 2013, 7:39 p.m. UTC | #44
Richard Biener <richard.guenther@gmail.com> writes:
>> At the rtl level your idea does not work.   rtl constants do not have a mode
>> or type.
>
> Which is not true and does not matter.  I tell you why.  Quote:

It _is_ true, as long as you read "rtl constants" as "rtl integer constants" :-)

> +#if TARGET_SUPPORTS_WIDE_INT
> +
> +/* Match CONST_*s that can represent compile-time constant integers.  */
> +#define CASE_CONST_SCALAR_INT \
> +   case CONST_INT: \
> +   case CONST_WIDE_INT
>
> which means you are only replacing CONST_DOUBLE with wide-int.
> And _all_ CONST_DOUBLE have a mode.  Otherwise you'd have no
> way of creating the wide-int in the first place.

No, integer CONST_DOUBLEs have VOIDmode, just like CONST_INT.
Only floating-point CONST_DOUBLEs have a "real" mode.

>> I understand that this makes me vulnerable to the argument that we should
>> not let the rtl level ever dictate anything about the tree level, but the
>> truth is that a variable len rep is almost always used for big integers.
>> In our code, most constants of large types are small numbers.   (Remember i
>> got into this because the tree constant prop thinks that left shifting any
>> number by anything greater than 128 is always 0 and discovered that that was
>> just the tip of the iceberg.) But mostly i support the decision to canonize
>> numbers to the smallest number of HWIs because most of the algorithms to do
>> the math can be short circuited.    I admit that if i had to effectively
>> unpack most numbers to do the math, that the canonization would be a waste.
>> However, this is not really relevant to this conversation.   Yes, you could
>> get rid of the len, but this such a small part of picture.
>
> Getting rid of 'len' in the RTX storage was only a question of whether it
> is an efficient way to go forward.  And with considering to unify
> CONST_INTs and CONST_WIDE_INTs it is not.  And even for CONST_WIDE_INTs
> (which most of the time would be 2 HWI storage, as otherwise you'd use
> a CONST_INT) it would be an improvement.

FWIW, I don't really see any advantage in unifying CONST_INT and
CONST_WIDE_INT, for the reasons Kenny has already given.  CONST_INT
can represent a large majority of the integers and it is already a
fairly efficient representation.

It's more important that we don't pick a design that forces one
choice or the other.  And I think Kenny's patch achieves that goal,
because the choice is hidden behind macros and behind the wide_int
interface.

Thanks,
Richard
Richard Biener April 22, 2013, 8:37 p.m. UTC | #45
Richard Sandiford <rdsandiford@googlemail.com> wrote:

>Richard Biener <richard.guenther@gmail.com> writes:
>>> At the rtl level your idea does not work.   rtl constants do not
>have a mode
>>> or type.
>>
>> Which is not true and does not matter.  I tell you why.  Quote:
>
>It _is_ true, as long as you read "rtl constants" as "rtl integer
>constants" :-)
>
>> +#if TARGET_SUPPORTS_WIDE_INT
>> +
>> +/* Match CONST_*s that can represent compile-time constant integers.
> */
>> +#define CASE_CONST_SCALAR_INT \
>> +   case CONST_INT: \
>> +   case CONST_WIDE_INT
>>
>> which means you are only replacing CONST_DOUBLE with wide-int.
>> And _all_ CONST_DOUBLE have a mode.  Otherwise you'd have no
>> way of creating the wide-int in the first place.
>
>No, integer CONST_DOUBLEs have VOIDmode, just like CONST_INT.
>Only floating-point CONST_DOUBLEs have a "real" mode.

I stand corrected. Now that's one more argument for infinite precision constants, as the mode is then certainly provided by the operations similar to the sign. That is, the mode (or size, or precision) of 1 certainly does not matter.

>>> I understand that this makes me vulnerable to the argument that we
>should
>>> not let the rtl level ever dictate anything about the tree level,
>but the
>>> truth is that a variable len rep is almost always used for big
>integers.
>>> In our code, most constants of large types are small numbers.  
>(Remember i
>>> got into this because the tree constant prop thinks that left
>shifting any
>>> number by anything greater than 128 is always 0 and discovered that
>that was
>>> just the tip of the iceberg.) But mostly i support the decision to
>canonize
>>> numbers to the smallest number of HWIs because most of the
>algorithms to do
>>> the math can be short circuited.    I admit that if i had to
>effectively
>>> unpack most numbers to do the math, that the canonization would be a
>waste.
>>> However, this is not really relevant to this conversation.   Yes,
>you could
>>> get rid of the len, but this such a small part of picture.
>>
>> Getting rid of 'len' in the RTX storage was only a question of
>whether it
>> is an efficient way to go forward.  And with considering to unify
>> CONST_INTs and CONST_WIDE_INTs it is not.  And even for
>CONST_WIDE_INTs
>> (which most of the time would be 2 HWI storage, as otherwise you'd
>use
>> a CONST_INT) it would be an improvement.
>
>FWIW, I don't really see any advantage in unifying CONST_INT and
>CONST_WIDE_INT, for the reasons Kenny has already given.  CONST_INT
>can represent a large majority of the integers and it is already a
>fairly efficient representation.
>
>It's more important that we don't pick a design that forces one
>choice or the other.  And I think Kenny's patch achieves that goal,
>because the choice is hidden behind macros and behind the wide_int
>interface.

Not unifying const-int and double-int in the end would be odd.

Richard.

>Thanks,
>Richard
Richard Sandiford April 22, 2013, 10 p.m. UTC | #46
Richard Biener <richard.guenther@gmail.com> writes:
> Richard Sandiford <rdsandiford@googlemail.com> wrote:
>>Richard Biener <richard.guenther@gmail.com> writes:
>>>> At the rtl level your idea does not work.   rtl constants do not
>>have a mode
>>>> or type.
>>>
>>> Which is not true and does not matter.  I tell you why.  Quote:
>>
>>It _is_ true, as long as you read "rtl constants" as "rtl integer
>>constants" :-)
>>
>>> +#if TARGET_SUPPORTS_WIDE_INT
>>> +
>>> +/* Match CONST_*s that can represent compile-time constant integers.
>> */
>>> +#define CASE_CONST_SCALAR_INT \
>>> +   case CONST_INT: \
>>> +   case CONST_WIDE_INT
>>>
>>> which means you are only replacing CONST_DOUBLE with wide-int.
>>> And _all_ CONST_DOUBLE have a mode.  Otherwise you'd have no
>>> way of creating the wide-int in the first place.
>>
>>No, integer CONST_DOUBLEs have VOIDmode, just like CONST_INT.
>>Only floating-point CONST_DOUBLEs have a "real" mode.
>
> I stand corrected. Now that's one more argument for infinite precision
> constants, as the mode is then certainly provided by the operations
> similar to the sign. That is, the mode (or size, or precision) of 1
> certainly does not matter.

I disagree.  Although CONST_INT and CONST_DOUBLE don't _store_ a mode,
they are always interpreted according to a particular mode.  It's just
that that mode has to be specified separately.  That's why so many
rtl functions have (enum machine_mode, rtx) pairs.

Infinite precision seems very alien to rtl, where everything is
interpreted according to a particular mode (whether that mode is
stored in the rtx or not).

For one thing, I don't see how infinite precision could work in an
environment where signedness often isn't defined.  E.g. if you optimise
an addition of two rtl constants, you don't know (and aren't supposed
to know) whether the values involved are "signed" or "unsigned".  With
fixed-precision arithmetic it doesn't matter, because both operands must
have the same precision, and because bits outside the precision are not
significant.  With infinite precision arithmetic, the choice carries
over to the next operation.  E.g., to take a 4-bit example, you don't
know when constructing a wide_int from an rtx whether 0b1000 represents
8 or -8.  But if you have no precision to say how many bits are significant,
you have to pick one.  Which do you choose?  And why should we have to
make a choice at all?  (Note that this is a different question to
whether the internal wide_int representation is sign-extending or not,
which is purely an implementation detail.  The same implementation
principle applies to CONST_INTs: the HWI in a CONST_INT is always
sign-extended from the msb of the represented value, although of course
the CONST_INT itself doesn't tell you which bit the msb is; that has to
be determined separately.)

A particular wide_int isn't, and IMO shouldn't be, inherently signed
or unsigned.  The rtl model is that signedness is a question of
interpretation rather than representation.  I realise trees are
different, because signedness is a property of the type rather
than operations on the type, but I still think fixed precision
works with both tree and rtl whereas infinite precision doesn't
work with rtl.

I also fear there are going to be lots of bugs where we forget to
truncate the result of an N-bit operation from "infinite" precision
to N bits before using it in the next operation (as per Kenny's ring
explanation).  With finite precision, and with all-important asserts
that the operands have consistent precisions, we shouldn't have any
hidden bugs like that.

If there are parts of gcc that really want to do infinite-precision
arithmetic, mpz_t ought to be as good as anything.

Thanks,
Richard
diff mbox

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f3bb168..ce4bc93 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -852,7 +852,7 @@  COMMON_TARGET_DEF_H = common/common-target-def.h \
 RTL_BASE_H = coretypes.h rtl.h rtl.def $(MACHMODE_H) reg-notes.def \
   insn-notes.def $(INPUT_H) $(REAL_H) statistics.h $(VEC_H) \
   $(FIXED_VALUE_H) alias.h $(HASHTAB_H)
-FIXED_VALUE_H = fixed-value.h $(MACHMODE_H) double-int.h
+FIXED_VALUE_H = fixed-value.h $(MACHMODE_H) double-int.h wide-int.h
 RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h
 RTL_ERROR_H = rtl-error.h $(RTL_H) $(DIAGNOSTIC_CORE_H)
 READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
@@ -864,7 +864,7 @@  INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
 TREE_H = coretypes.h tree.h all-tree.def tree.def c-family/c-common.def \
 	$(lang_tree_files) $(MACHMODE_H) tree-check.h $(BUILTINS_DEF) \
 	$(INPUT_H) statistics.h $(VEC_H) treestruct.def $(HASHTAB_H) \
-	double-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \
+	double-int.h wide-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \
 	$(REAL_H) $(FIXED_VALUE_H)
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \
@@ -1453,6 +1453,7 @@  OBJS = \
 	varpool.o \
 	vmsdbgout.o \
 	web.o \
+	wide-int.o \
 	xcoffout.o \
 	$(out_object_file) \
 	$(EXTRA_OBJS) \
@@ -2674,6 +2675,7 @@  targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
    tree-ssa-alias.h $(TREE_FLOW_H)
 common/common-targhooks.o : common/common-targhooks.c $(CONFIG_H) $(SYSTEM_H) \
    coretypes.h $(INPUT_H) $(TM_H) $(COMMON_TARGET_H) common/common-targhooks.h
+wide-int.o: wide-int.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H)
 
 bversion.h: s-bversion; @true
 s-bversion: BASE-VER
@@ -3910,15 +3912,16 @@  CFLAGS-gengtype-parse.o += -DGENERATOR_FILE
 build/gengtype-parse.o: $(BCONFIG_H)
 
 gengtype-state.o build/gengtype-state.o: gengtype-state.c $(SYSTEM_H) \
-  gengtype.h errors.h double-int.h version.h $(HASHTAB_H) $(OBSTACK_H) \
-  $(XREGEX_H)
+  gengtype.h errors.h double-int.h wide-int.h version.h $(HASHTAB_H)    \
+  $(OBSTACK_H) $(XREGEX_H)
 gengtype-state.o: $(CONFIG_H)
 CFLAGS-gengtype-state.o += -DGENERATOR_FILE
 build/gengtype-state.o: $(BCONFIG_H)
+wide-int.h: $(GTM_H) insn-modes.h
 
 gengtype.o build/gengtype.o : gengtype.c $(SYSTEM_H) gengtype.h 	\
-  rtl.def insn-notes.def errors.h double-int.h version.h $(HASHTAB_H) \
-  $(OBSTACK_H) $(XREGEX_H)
+  rtl.def insn-notes.def errors.h double-int.h wide-int.h version.h     \
+  $(HASHTAB_H) $(OBSTACK_H) $(XREGEX_H)
 gengtype.o: $(CONFIG_H)
 CFLAGS-gengtype.o += -DGENERATOR_FILE
 build/gengtype.o: $(BCONFIG_H)
diff --git a/gcc/wide-int.c b/gcc/wide-int.c
new file mode 100644
index 0000000..191fb17
--- /dev/null
+++ b/gcc/wide-int.c
@@ -0,0 +1,3541 @@ 
+/* Operations with very long integers.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Kenneth Zadeck <zadeck@naturalbridge.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "hwint.h"
+#include "wide-int.h"
+#include "rtl.h"
+#include "tree.h"
+#include "dumpfile.h"
+
+// using wide_int::;
+
+/* Debugging routines.  */
+
+/* This is the maximal size of the buffer needed for dump.  */
+const int MAX_SIZE = 4 * (MAX_BITSIZE_MODE_ANY_INT / 4
+		     + MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 32);
+
+/*
+ * Internal utilities.
+ */
+
+/* Quantities to deal with values that hold half of a wide int.  Used
+   in multiply and divide.  */
+#define HALF_INT_MASK (((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1)
+
+#define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT)
+#define BLOCKS_NEEDED(PREC) \
+  (((PREC) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT)
+
+/*
+ * Conversion routines in and out of wide-int.
+ */
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  The overflow bit are set if the number was too
+   large to fit in the mode.  */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
+		     unsigned int precision, bool *overflow)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      HOST_WIDE_INT t = sext_hwi (op0, precision);
+      if (t != op0)
+	*overflow = true; 
+      op0 = t;
+    }
+
+  result.val[0] = op0;
+  result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wh ("wide_int::from_shwi %s " HOST_WIDE_INT_PRINT_HEX ")\n",
+	      result, op0);
+#endif
+
+  return result;
+}
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  The overflow bit are set if the number was too
+   large to fit in the mode.  */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0, unsigned int bitsize, 
+		     unsigned int precision, bool *overflow)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      unsigned HOST_WIDE_INT t = zext_hwi (op0, precision);
+      if (t != op0)
+	*overflow = true; 
+      op0 = t;
+    }
+
+  result.val[0] = op0;
+
+  /* If the top bit is a 1, we need to add another word of 0s since
+     that would not expand the right value since the infinite
+     expansion of any unsigned number must have 0s at the top.  */
+  if ((HOST_WIDE_INT)op0 < 0 && precision > HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[1] = 0;
+      result.len = 2;
+    }
+  else
+    result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wh ("wide_int::from_uhwi %s " HOST_WIDE_INT_PRINT_HEX ")\n",
+	      result, op0);
+#endif
+
+  return result;
+}
+
+/* Create a wide_int from an array of host_wide_ints in OP1 of LEN.
+   The result has BITSIZE and PRECISION.  */
+
+wide_int
+wide_int::from_array (const HOST_WIDE_INT *op1, unsigned int len, 
+		      unsigned int bitsize, unsigned int precision)
+{
+  unsigned int i;
+  wide_int result;
+  
+  result.len = len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  for (i=0; i < len; i++)
+    result.val[i] = op1[i];
+
+  result.canonize ();
+  return result;
+}
+
+/* Convert a double int into a wide int with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::from_double_int (double_int di, unsigned int bs, unsigned int prec)
+{
+  HOST_WIDE_INT op = di.low;
+  wide_int result;
+
+  result.bitsize = bs;
+  result.precision = prec;
+  result.len = (prec <= HOST_BITS_PER_WIDE_INT) ? 1 : 2;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    result.val[0] = sext_hwi (op, prec);
+  else
+    {
+      result.val[0] = op;
+      if (prec > HOST_BITS_PER_WIDE_INT)
+	{
+	  if (prec < HOST_BITS_PER_DOUBLE_INT)
+	    result.val[1] = sext_hwi (di.high, prec);
+	  else
+	    result.val[1] = di.high;
+	}
+    }
+
+  if (result.len == 2)
+    result.canonize ();
+
+  return result;
+}
+
+/* Convert a integer cst into a wide int.  */
+
+wide_int
+wide_int::from_tree (const_tree tcst)
+{
+#ifdef NEW_REP_FOR_INT_CST
+  /* This is the code once the tree level is converted.  */
+  wide_int result;
+  int i;
+
+  tree type = TREE_TYPE (tcst);
+
+  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  result.precision = TYPE_PRECISION (type);
+  result.len = TREE_INT_CST_LEN (tcst);
+  for (i = 0; i < result.len; i++)
+    result.val[i] = TREE_INT_CST_ELT (tcst, i);
+
+  return result;
+#else
+  wide_int result;
+  tree type = TREE_TYPE (tcst);
+  unsigned int prec = TYPE_PRECISION (type);
+  HOST_WIDE_INT op = TREE_INT_CST_LOW (tcst);
+
+  result.precision = prec;
+  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  result.len = (prec <= HOST_BITS_PER_WIDE_INT) ? 1 : 2;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    if (TYPE_UNSIGNED (type))
+      result.val[0] = zext_hwi (op, prec);
+    else
+      result.val[0] = sext_hwi (op, prec);
+  else
+    {
+      result.val[0] = op;
+      if (prec > HOST_BITS_PER_WIDE_INT)
+	{
+	  if (prec < HOST_BITS_PER_DOUBLE_INT)
+	    {
+	      op = TREE_INT_CST_HIGH (tcst);
+	      if (TYPE_UNSIGNED (type))
+		result.val[1] = zext_hwi (op, prec);
+	      else
+		result.val[1] = sext_hwi (op, prec);
+	    }
+	  else
+	    result.val[1] = TREE_INT_CST_HIGH (tcst);
+	}
+      else
+	result.len = 1;
+    }
+  
+  if (result.len == 2)
+    result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    {
+      debug_whh ("wide_int:: %s = from_tree ("HOST_WIDE_INT_PRINT_HEX" "HOST_WIDE_INT_PRINT_HEX")\n",
+		 result, TREE_INT_CST_HIGH (tcst), TREE_INT_CST_LOW (tcst));
+    }
+#endif
+
+  return result;
+#endif
+}
+
+/* Convert a integer cst into a wide int expanded to BITSIZE and
+   PRECISION.  This call is used by tree passes like vrp that expect
+   that the math is done in an infinite precision style.  BITSIZE and
+   PRECISION are generally determined to be twice the largest type
+   seen in the function.  */
+
+wide_int
+wide_int::from_tree_as_infinite_precision (const_tree tcst, 
+					   unsigned int bitsize, 
+					   unsigned int precision)
+{
+  /* The plan here is to extend the value in the cst, using signed or
+     unsigned based on the type, from the precision in the type to
+     PRECISION and then canonize from there.  */
+  wide_int result;
+  tree type = TREE_TYPE (tcst);
+  unsigned int prec = TYPE_PRECISION (type);
+
+
+#ifdef NEW_REP_FOR_INT_CST
+  /* This is the code once the tree level is converted.  */
+  int i;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  result.len = TREE_INT_CST_LEN (tcst);
+  for (i = 0; i < result.len; i++)
+    result.val[i] = TREE_INT_CST_ELT (tcst, i);
+
+  if (TYPE_UNSIGNED (type))
+    result = result.zext (prec);
+  else
+    result = result.sext (prec);
+
+#else
+  HOST_WIDE_INT op = TREE_INT_CST_LOW (tcst);
+
+  gcc_assert (prec <= precision);
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  result.len = 1;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    result.val[0] = TYPE_UNSIGNED (type) 
+      ? zext_hwi (op, prec) : sext_hwi (op, prec);
+  else
+    {
+      result.val[0] = op;
+      if (prec > HOST_BITS_PER_WIDE_INT)
+	{
+	  if (prec < HOST_BITS_PER_DOUBLE_INT)
+	    {
+	      op = TREE_INT_CST_HIGH (tcst);
+	      prec -= HOST_BITS_PER_WIDE_INT;
+	      result.val[1] = TYPE_UNSIGNED (type) 
+		? zext_hwi (op, prec) : sext_hwi (op, prec);
+	      result.len = 2;
+	    }
+	  else
+	    {
+	      result.val[1] = TREE_INT_CST_HIGH (tcst);
+	      if (TYPE_UNSIGNED (type) && result.val[1] < 0)
+		{
+		  result.val[2] = 0;
+		  result.len = 2;
+		}
+	      else
+		result.len = 2;
+	    }
+	}
+      else
+	if (TYPE_UNSIGNED (type) && result.val[0] < 0)
+	  {
+	    result.val[1] = 0;
+	    result.len = 2;
+	  }
+    }
+
+#endif
+
+  return result;
+}
+
+/* Extract a constant integer from the X of type MODE.  The bits of
+   the integer are returned.  */
+
+wide_int
+wide_int::from_rtx (const_rtx x, enum machine_mode mode)
+{
+  wide_int result;
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  gcc_assert (mode != VOIDmode);
+
+  result.bitsize = GET_MODE_BITSIZE (mode);
+  result.precision = prec;
+
+  switch (GET_CODE (x))
+    {
+    case CONST_INT:
+      if ((prec & (HOST_BITS_PER_WIDE_INT - 1)) != 0)
+	result.val[0] = sext_hwi (INTVAL (x), prec);
+      else
+	result.val[0] = INTVAL (x);
+      result.len = 1;
+      break;
+
+#if TARGET_SUPPORTS_WIDE_INT
+    case CONST_WIDE_INT:
+      {
+	int i;
+	result.len = CONST_WIDE_INT_NUNITS (x);
+	
+	for (i = 0; i < result.len; i++)
+	  result.val[i] = CONST_WIDE_INT_ELT (x, i);
+      }
+      break;
+#else
+    case CONST_DOUBLE:
+      result.len = 2;
+      result.val[0] = CONST_DOUBLE_LOW (x);
+      result.val[1] = CONST_DOUBLE_HIGH (x);
+      result.canonize ();
+      break;
+#endif
+
+    default:
+      gcc_unreachable ();
+    }
+
+  return result;
+}
+
+/*
+ * Largest and smallest values in a mode.
+ */
+
+/* Produce the largest SGNed number that is represented in PREC.  The
+   resulting number is placed in a wide int of size BITSIZE and
+   PRECISION.  */
+
+wide_int
+wide_int::max_value (unsigned int prec, 
+		     unsigned int bitsize, unsigned int precision, 
+		     SignOp sgn)
+{
+  wide_int result;
+  
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (sgn == UNSIGNED)
+    {
+      /* The unsigned max is just all ones, for which the compressed
+	 rep is just a single HWI.  */ 
+      result.len = 1;
+      result.val[0] = (HOST_WIDE_INT)-1;
+    }
+  else
+    {
+      /* The signed max is all ones except the top bit.  This must be
+	 explicitly represented.  */
+      int i;
+      int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+      int shift = (small_prec == 0) 
+	? HOST_BITS_PER_WIDE_INT - 1 : small_prec - 1;
+
+      result.len = BLOCKS_NEEDED (prec);
+      for (i = 0; i < result.len - 1; i++)
+	result.val[i] = (HOST_WIDE_INT)-1;
+
+      result.val[result.len - 1] = ((HOST_WIDE_INT)1 << shift) - 1;
+    }
+  
+  return result;
+}
+
+/* Produce the smallest SGNed number that is represented in PREC.  The
+   resulting number is placed in a wide int of size BITSIZE and
+   PRECISION.  */
+
+wide_int
+wide_int::min_value (unsigned int prec, 
+		     unsigned int bitsize, unsigned int precision, 
+		     SignOp sgn)
+{
+  if (sgn == UNSIGNED)
+    {
+      /* The unsigned min is just all zeros, for which the compressed
+	 rep is just a single HWI.  */ 
+      wide_int result;
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = 0;
+      return result;
+    }
+  else
+    {
+      /* The signed min is all zeros except the top bit.  This must be
+	 explicitly represented.  */
+      return set_bit_in_zero (prec - 1, bitsize, precision);
+    }
+}
+
+/*
+ * Public utilities.
+ */
+
+/* Check the upper HOST_WIDE_INTs of src to see if the length can be
+   shortened.  An upper HOST_WIDE_INT is unnecessary if it is all ones
+   or zeros and the top bit of the next lower word matches.
+
+   This function may change the representation of THIS, but does not
+   change the value that THIS represents.  It does not sign extend in
+   the case that the size of the mode is less than
+   HOST_BITS_PER_WIDE_INT.  */
+
+void
+wide_int::canonize ()
+{
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  int blocks_needed = BLOCKS_NEEDED (precision);
+  HOST_WIDE_INT top;
+  int i;
+
+  if (len > blocks_needed)
+    len = blocks_needed;
+
+  /* Clean up the top bits for any mode that is not a multiple of a HWI.  */
+  if (len == blocks_needed && small_prec)
+    val[len - 1] = sext_hwi (val[len - 1], small_prec);
+
+  if (len == 1)
+    return;
+
+  top = val[len - 1];
+  if (top != 0 && top != (HOST_WIDE_INT)-1)
+    return;
+
+  /* At this point we know that the top is either 0 or -1.  Find the
+     first block that is not a copy of this.  */
+  for (i = len - 2; i >= 0; i--)
+    {
+      HOST_WIDE_INT x = val[i];
+      if (x != top)
+	{
+	  if (x >> (HOST_BITS_PER_WIDE_INT - 1) == top)
+	    {
+	      len = i + 1;
+	      return;
+	    }
+
+	  /* We need an extra block because the top bit block i does
+	     not match the extension.  */
+	  len = i + 2;
+	  return;
+	}
+    }
+
+  /* The number is 0 or -1.  */
+  len = 1;
+}
+
+
+/* Make a copy of this.  */
+
+wide_int
+wide_int::copy () const
+{
+  wide_int result;
+  int i;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = len;
+
+  for (i = 0; i < result.len; i++)
+    result.val[i] = val[i];
+  return result;
+}
+
+
+/* Copy THIS replacing the bitsize with BS and precision with PREC.
+   It can do any of truncation, extension or copying.  */
+
+wide_int
+wide_int::force_to_size (unsigned int bs, unsigned int prec, SignOp sgn) const
+{
+  wide_int result;
+  int blocks_needed = BLOCKS_NEEDED (prec);
+  int i;
+
+  result.bitsize = bs;
+  result.precision = prec;
+  result.len = blocks_needed < len ? blocks_needed : len;
+  for (i = 0; i < result.len; i++)
+    result.val[i] = val[i];
+
+  if (prec >= precision) 
+    {
+      /* Expanding */
+      int small_precision = precision & (HOST_BITS_PER_WIDE_INT - 1);
+
+      if (sgn == UNSIGNED)
+	{
+	  if (len == BLOCKS_NEEDED (precision)
+	      && len < blocks_needed
+	      && small_precision == 0
+	      && result.val[result.len - 1] < 0)
+	    /* We need to put the 0 block on top to keep the value
+	       from being sign extended.  */ 
+	    result.val[result.len++] = 0;
+	  /* We are unsigned and the current precision is not on an
+	     even block and that block is explicitly represented.
+	     Then we have to do an explicit zext of the top block. */
+	  else if (small_precision && blocks_needed == len)
+	    result.val[blocks_needed-1]
+	      = zext_hwi (result.val[blocks_needed-1], small_precision);
+	}
+    }
+  else
+    {
+      /* Truncating.  */
+      int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+      /* The only weird case we need to look at here is when we are
+         truncating within the top block.  We need to make sure that
+         everything in the block above the new precision is sign
+         extended.  Note that this is independent of the SGN.  This is
+         just to stay canonical.  */
+      if (small_prec && (blocks_needed == len))
+	result.val[blocks_needed-1]
+	  = sext_hwi (result.val[blocks_needed-1], small_prec);
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwvvs ("wide_int:: %s = force_to_size (%s, bs = %d, prec = %d %s)\n", 
+		 result, *this, bs, prec, sgn==UNSIGNED ? "U" : "S");
+#endif
+
+  return result;
+}
+
+/*
+ * public printing routines.
+ */
+
+/* Try to print the signed self in decimal to BUF if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decs (char *buf) const
+{
+  if ((precision <= HOST_BITS_PER_WIDE_INT)
+      || (len == 1 && !neg_p ()))
+      sprintf (buf, HOST_WIDE_INT_PRINT_DEC, val[0]);
+  else
+    print_hex (buf);
+}
+
+/* Try to print the signed self in decimal to FILE if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decs (FILE *file) const
+{
+  char buf[(2 * MAX_BITSIZE_MODE_ANY_INT / BITS_PER_UNIT) + 4];
+  print_decs (buf);
+  fputs (buf, file);
+}
+
+/* Try to print the unsigned self in decimal to BUF if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decu (char *buf) const
+{
+  if ((precision <= HOST_BITS_PER_WIDE_INT)
+      || (len == 1 && !neg_p ()))
+      sprintf (buf, HOST_WIDE_INT_PRINT_UNSIGNED, val[0]);
+  else
+    print_hex (buf);
+}
+
+/* Try to print the signed self in decimal to FILE if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decu (FILE *file) const
+{
+  char buf[(2 * MAX_BITSIZE_MODE_ANY_INT / BITS_PER_UNIT) + 4];
+  print_decu (buf);
+  fputs (buf, file);
+}
+
+void 
+wide_int::print_hex (char *buf) const
+{
+  int i = len;
+
+  if (zero_p ())
+    sprintf (buf, "0x");
+  else
+    {
+      if (neg_p ())
+	{
+	  int j;
+	  /* If the number is negative, we may need to pad value with
+	     0xFFF...  because the leading elements may be missing and
+	     we do not print a '-' with hex.  */
+	  for (j = BLOCKS_NEEDED (precision); j > i; j--)
+	    buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, (HOST_WIDE_INT) -1);
+	    
+	}
+      else
+	buf += sprintf (buf, HOST_WIDE_INT_PRINT_HEX, val [--i]);
+      while (-- i >= 0)
+	buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, val [i]);
+    }
+}
+
+/* Print one big hex number to FILE.  Note that some assemblers may not
+   accept this for large modes.  */
+void 
+wide_int::print_hex (FILE *file) const
+{
+  char buf[(2 * MAX_BITSIZE_MODE_ANY_INT / BITS_PER_UNIT) + 4];
+  print_hex (buf);
+  fputs (buf, file);
+}
+
+/*
+ * Comparisons, note that only equality is an operator.  The other
+ * comparisons cannot be operators since they are inherently singed or
+ * unsigned and C++ has no such operators.
+ */
+
+/* Return true if THIS == OP1.  */
+
+bool
+wide_int::eq_p_large (const wide_int &op1) const
+{
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+
+  while (l0 > l1)
+    if (val[l0--] != op1.sign_mask ())
+      return false;
+
+  while (l1 > l0)
+    if (op1.val[l1--] != sign_mask ())
+      return false;
+
+  while (l0 >= 0)
+    if (val[l0--] != op1.val[l1--])
+      return false;
+
+  return true;
+}
+
+/* Return true if THIS < OP1 using signed comparisons.  */
+
+bool
+wide_int::lts_p_large (const wide_int &op1) const
+{
+  int l;
+  HOST_WIDE_INT s0, s1;
+  unsigned HOST_WIDE_INT u0, u1;
+  int blocks_needed = BLOCKS_NEEDED (precision);
+
+  /* Only the top block is compared as signed.  The rest are unsigned
+     comparisons.  */
+  s0 = blocks_needed == len ? val [blocks_needed - 1] : sign_mask ();
+  s1 = blocks_needed == op1.len ? op1.val [blocks_needed - 1] : op1.sign_mask ();
+  if (s0 < s1)
+    return true;
+  if (s0 > s1)
+    return false;
+
+  l = MAX (len, op1.len);
+  /* We have already checked to top block so skip down.  */
+  l = (l == blocks_needed) ? l - 2 : l - 1;
+
+  while (l >= 0)
+    {
+      u0 = l < len ? val [l] : sign_mask ();
+      u1 = l < op1.len ? op1.val [l] : op1.sign_mask ();
+
+      if (u0 < u1)
+	return true;
+      if (u0 > u1)
+	return false;
+      l--;
+    }
+
+  return false;
+}
+
+/* Returns -1 if THIS < OP1, 0 if THIS == OP1 and 1 if A > OP1 using
+   signed compares.  */
+
+int
+wide_int::cmps_large (const wide_int &op1) const
+{
+  int l;
+  HOST_WIDE_INT s0, s1;
+  unsigned HOST_WIDE_INT u0, u1;
+  int blocks_needed = BLOCKS_NEEDED (precision);
+
+  /* Only the top block is compared as signed.  The rest are unsigned
+     comparisons.  */
+  s0 = blocks_needed == len ? val [blocks_needed - 1] : sign_mask ();
+  s1 = blocks_needed == op1.len ? op1.val [blocks_needed - 1] : op1.sign_mask ();
+  if (s0 < s1)
+    return -1;
+  if (s0 > s1)
+    return 1;
+
+  l = MAX (len, op1.len);
+  /* We have already checked to top block so skip down.  */
+  l = (l == blocks_needed) ? l - 2 : l - 1;
+
+  while (l >= 0)
+    {
+      u0 = l < len ? val [l] : sign_mask ();
+      u1 = l < op1.len ? op1.val [l] : op1.sign_mask ();
+
+      if (u0 < u1)
+	return -1;
+      if (u0 > u1)
+	return 1;
+      l--;
+    }
+
+  return 0;
+}
+
+/* Return true if THIS < OP1 using unsigned comparisons.  */
+
+bool
+wide_int::ltu_p_large (const wide_int &op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+
+  while (l0 > l1)
+    {
+      x0 = val[l0--];
+      x1 = op1.sign_mask ();
+      if (x0 < x1)
+	return true;
+      if (x0 > x1)
+	return false;
+    }
+
+  while (l1 > l0)
+    {
+      x0 = sign_mask ();
+      x1 = op1.val[l1--];
+      if (x0 < x1)
+	return true;
+      if (x0 > x1)
+	return false;
+    }
+
+  while (l0 >= 0)
+    {
+      x0 = val[l0--];
+      x1 = op1.val[l1--];
+      if (x0 < x1)
+	return true;
+      if (x0 > x1)
+	return false;
+    }
+
+  return false;
+}
+
+/* Returns -1 if THIS < OP1, 0 if THIS == OP1 and 1 if A > OP1 using
+   unsigned compares.  */
+
+int
+wide_int::cmpu_large (const wide_int &op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+
+  while (l0 > l1)
+    {
+      x0 = val[l0--];
+      x1 = op1.sign_mask ();
+      if (x0 < x1)
+	return -1;
+      else if (x0 > x1)
+	return 1;
+    }
+
+  while (l1 > l0)
+    {
+      x0 = sign_mask ();
+      x1 = op1.val[l1--];
+      if (x0 < x1)
+	return -1;
+      if (x0 > x1)
+	return 1;
+    }
+
+  while (l0 >= 0)
+    {
+      x0 = val[l0--];
+      x1 = op1.val[l1--];
+      if (x0 < x1)
+	return -1;
+      if (x0 > x1)
+	return 1;
+    }
+
+  return 0;
+}
+
+/* Return true if THIS has the sign bit set to 1 and all other bits are
+   zero.  */
+
+bool
+wide_int::only_sign_bit_p (unsigned int prec) const
+{
+  int i;
+  HOST_WIDE_INT x;
+  int small_prec;
+  bool result;
+
+  if (BLOCKS_NEEDED (prec) != len)
+    {
+      result = false;
+      goto ex;
+    }
+
+  for (i=0; i < len - 1; i++)
+    if (val[i] != 0)
+      {
+	result = false;
+	goto ex;
+      }
+
+  x = val[len - 1];
+  small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec)
+    x = x << (HOST_BITS_PER_WIDE_INT - small_prec);
+
+  result = x == ((HOST_WIDE_INT)1) << (HOST_BITS_PER_WIDE_INT - 1);
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = only_sign_bit_p (%s)\n", result, *this);
+#endif
+
+  return result;
+}
+
+/* Returns true if THIS fits into range of TYPE.  Signedness of OP0 is
+   assumed to be the same as the signedness of TYPE.  */
+
+bool
+wide_int::fits_to_tree_p (const_tree type) const
+{
+  int type_prec = TYPE_PRECISION (type);
+
+  if (TYPE_UNSIGNED (type))
+    return fits_u_p (type_prec);
+  else
+    return fits_s_p (type_prec);
+}
+
+/* Returns true of THIS fits in the unsigned range of precision.  */
+
+bool
+wide_int::fits_s_p (unsigned int prec) const
+{
+  if (len < BLOCKS_NEEDED (prec))
+    return true;
+
+  if (precision <= prec)
+    return true;
+
+  return *this == sext (prec);
+}
+
+
+/* Returns true if THIS fits into range of TYPE.  */
+
+bool
+wide_int::fits_u_p (unsigned int prec) const
+{
+  if (len < BLOCKS_NEEDED (prec))
+    return true;
+
+  if (precision <= prec)
+    return true;
+
+  return *this == zext (prec);
+}
+
+/*
+ * Extension.
+ */
+
+/* Sign extend THIS starting at OFFSET.  The bitsize and precision of
+   the result are the same as THIS.  */
+
+wide_int
+wide_int::sext (unsigned int offset) const
+{
+  wide_int result;
+  int off;
+
+  gcc_assert (precision >= offset);
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.bitsize = bitsize;
+      result.precision = precision;
+      if (offset < precision)
+	result.val[0] = sext_hwi (val[0], offset);
+      else
+	/* If offset is greater or equal to precision there is nothing
+	   to do since the internal rep is already sign extended.  */
+	result.val[0] = val[0];
+
+      result.len = 1;
+    }
+  else if (precision == offset)
+    result = *this;
+  else
+    {
+      result = decompress (offset, bitsize, precision);
+      
+      /* Now we can do the real sign extension.  */
+      off = offset & (HOST_BITS_PER_WIDE_INT - 1);
+      if (off)
+	{
+	  int block = BLOCK_OF (offset);
+	  result.val[block] = sext_hwi (val[block], off);
+	  result.len = block + 1;
+	}
+      /* We never need an extra element for sign extended values.  */
+    }    
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s sext %d)\n", result, *this, offset);
+#endif
+
+  return result;
+}
+
+/* Zero extend THIS starting at OFFSET.  The bitsize and precision of
+   the result are the same as THIS.  */
+
+wide_int
+wide_int::zext (unsigned int offset) const
+{
+  wide_int result;
+  int off;
+  int block;
+
+  gcc_assert (precision >= offset);
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.bitsize = bitsize;
+      result.precision = precision;
+      if (offset < precision)
+	result.val[0] = zext_hwi (val[0], offset);
+      else if (offset == precision)
+	result.val[0] = val[0];
+	/* If offset was greater than the precision we need to zero
+	   extend from the old precision since the internal rep was
+	   equivalent to sign extended.  */
+      else
+	result.val[0] = zext_hwi (val[0], precision);
+	
+      result.len = 1;
+    }
+  else if (precision == offset)
+    result = *this;
+  else
+    {
+      result = decompress (offset, bitsize, precision);
+
+      /* Now we can do the real zero extension.  */
+      off = offset & (HOST_BITS_PER_WIDE_INT - 1);
+      block = BLOCK_OF (offset);
+      if (off)
+	{
+	  result.val[block] = zext_hwi (val[block], off);
+	  result.len = block + 1;
+	}
+      else
+	/* See if we need an extra zero element to satisfy the
+	   compression rule.  */
+	if (val[block - 1] < 0 && offset < precision)
+	  {
+	    result.val[block] = 0;
+	    result.len += 1;
+	  }
+    }
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s zext %d)\n", result, *this, offset);
+#endif
+
+  return result;
+}
+
+/*
+ * Masking, inserting, shifting, rotating.
+ */
+
+/* Return a value with a one bit inserted in THIS at BITPOS.  */
+
+wide_int
+wide_int::set_bit (unsigned int bitpos) const
+{
+  wide_int result;
+  int i, j;
+
+  if (bitpos >= precision)
+    result = copy ();
+  else
+    {
+      result = decompress (bitpos, bitsize, precision);
+      j = bitpos / HOST_BITS_PER_WIDE_INT;
+      i = bitpos & (HOST_BITS_PER_WIDE_INT - 1);
+      result.val[j] |= ((HOST_WIDE_INT)1) << i;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s set_bit %d)\n", result, *this, bitpos);
+#endif
+
+  return result;
+}
+
+/* Insert a 1 bit into 0 at BITPOS producing an number with BITSIZE
+   and PRECISION.  */
+
+wide_int
+wide_int::set_bit_in_zero (unsigned int bitpos, 
+			   unsigned int bitsize, unsigned int prec)
+{
+  wide_int result;
+  int blocks_needed = BLOCKS_NEEDED (bitpos + 1);
+  int i, j;
+
+  result.bitsize = bitsize;
+  result.precision = prec;
+  if (bitpos >= prec)
+    {
+      result.len = 1;
+      result.val[0] = 0;
+    }
+  else
+    {
+      result.len = blocks_needed;
+      for (i = 0; i < blocks_needed; i++)
+	result.val[i] = 0;
+      
+      j = bitpos / HOST_BITS_PER_WIDE_INT;
+      i = bitpos & (HOST_BITS_PER_WIDE_INT - 1);
+      result.val[j] |= ((HOST_WIDE_INT)1) << i;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wv ("wide_int:: %s = set_bit_in_zero (%d)\n", result, bitpos);
+#endif
+
+  return result;
+}
+
+/* Insert WIDTH bits from OP0 into THIS starting at START.  */
+
+wide_int
+wide_int::insert (const wide_int &op0, unsigned int start, 
+		  unsigned int width) const
+{
+  wide_int result;
+  wide_int mask;
+  wide_int tmp;
+
+  if (start + width >= precision) 
+    width = precision - start;
+
+  mask = shifted_mask (start, width, false, bitsize, precision);
+  tmp = op0.lshift (start, NONE, bitsize, precision);
+  result = tmp & mask;
+
+  tmp = and_not (mask);
+  result = result | tmp;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwwvv ("wide_int:: %s = (%s insert start = %d width = %d)\n", 
+		 result, *this, op0, start, width);
+#endif
+
+  return result;
+}
+
+/* bswap THIS.  */
+
+wide_int
+wide_int::bswap () const
+{
+  wide_int result;
+  int i, s;
+  int end;
+  int len = BLOCKS_NEEDED (precision);
+  HOST_WIDE_INT mask = sign_mask ();
+
+  /* This is not a well defined operation if the precision is not a
+     multiple of 8.  */
+  gcc_assert ((precision & 0x7) == 0);
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = len;
+
+  for (i = 0; i < len; i++)
+    result.val[0] = mask;
+
+  /* Only swap the bytes that are not the padding.  */
+  if ((precision & (HOST_BITS_PER_WIDE_INT - 1))
+      && (this->len == len))
+    end = precision;
+  else
+    end = this->len * HOST_BITS_PER_WIDE_INT;
+
+  for (s = 0; s < end; s += 8)
+    {
+      unsigned int d = precision - s - 8;
+      unsigned HOST_WIDE_INT byte;
+
+      int block = s / HOST_BITS_PER_WIDE_INT;
+      int offset = s & (HOST_BITS_PER_WIDE_INT - 1);
+
+      byte = (val[block] >> offset) & 0xff;
+
+      block = d / HOST_BITS_PER_WIDE_INT;
+      offset = d & (HOST_BITS_PER_WIDE_INT - 1);
+
+      result.val[block] &= ((((HOST_WIDE_INT)1 << offset) + 8)
+			    - ((HOST_WIDE_INT)1 << offset));
+      result.val[block] |= byte << offset;
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_ww ("wide_int:: %s = bswap (%s)\n", result, *this);
+#endif
+
+  return result;
+}
+
+/* Return a result mask where the lower WIDTH bits are ones and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  The result is made with BITSIZE and
+   PREC. */
+
+wide_int
+wide_int::mask (unsigned int width, bool negate, 
+		unsigned int bitsize, unsigned int prec)
+{
+  wide_int result;
+  unsigned int i = 0;
+  int shift;
+
+  gcc_assert (width < 2 * MAX_BITSIZE_MODE_ANY_INT);
+  gcc_assert (prec <= 2 * MAX_BITSIZE_MODE_ANY_INT);
+
+  if (width == 0)
+    {
+      if (negate)
+	result = wide_int::minus_one (bitsize, prec);
+      else
+	result = wide_int::zero (bitsize, prec);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wvv ("wide_int:: %s = mask (%d, negate = %d)\n", result, width, negate);
+#endif
+      return result;
+    }
+
+  result.bitsize = bitsize;
+  result.precision = prec;
+
+  while (i < width / HOST_BITS_PER_WIDE_INT)
+    result.val[i++] = negate ? 0 : (HOST_WIDE_INT)-1;
+
+  shift = width & (HOST_BITS_PER_WIDE_INT - 1);
+  if (shift != 0)
+    {
+      HOST_WIDE_INT last = (((HOST_WIDE_INT)1) << shift) - 1;
+      result.val[i++] = negate ? ~last : last;
+    }
+  result.len = i;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvv ("wide_int:: %s = mask (%d, negate = %d)\n", result, width, negate);
+#endif
+
+  return result;
+}
+
+/* Return a result mask of WIDTH ones starting at START and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  */
+wide_int
+wide_int::shifted_mask (unsigned int start, unsigned int width, 
+			bool negate,
+			unsigned int bitsize, unsigned int prec)
+{
+  wide_int result;
+  unsigned int i = 0;
+  unsigned int shift;
+  unsigned int end = start + width;
+  HOST_WIDE_INT block;
+
+  if (start + width > prec)
+    width = prec - start;
+ 
+  if (width == 0)
+    {
+      if (negate)
+	result = wide_int::minus_one (bitsize, prec);
+      else
+	result = wide_int::zero (bitsize, prec);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wvvv 
+	  ("wide_int:: %s = shifted_mask (start = %d width = %d negate = %d)\n", 
+	   result, start, width, negate);
+#endif
+      return result;
+    }
+
+  result.bitsize = bitsize;
+  result.precision = prec;
+
+  while (i < start / HOST_BITS_PER_WIDE_INT)
+    result.val[i++] = negate ? (HOST_WIDE_INT)-1 : 0;
+
+  shift = start & (HOST_BITS_PER_WIDE_INT - 1);
+  if (shift)
+    {
+      block = (((HOST_WIDE_INT)1) << shift) - 1;
+      shift = (end) & (HOST_BITS_PER_WIDE_INT - 1);
+      if (shift)
+	{
+	  /* case 000111000 */
+	  block = (((HOST_WIDE_INT)1) << shift) - block - 1;
+	  result.val[i++] = negate ? ~block : block;
+	  result.len = i;
+
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_wvvv 
+	      ("wide_int:: %s = shifted_mask (start = %d width = %d negate = %d)\n", 
+	       result, start, width, negate);
+#endif
+	  return result;
+	}
+      else
+	/* ...111000 */
+	result.val[i++] = negate ? block : ~block;
+    }
+
+  while (i < end / HOST_BITS_PER_WIDE_INT)
+    /* 1111111 */
+    result.val[i++] = negate ? 0 : (HOST_WIDE_INT)-1;
+
+  shift = end & (HOST_BITS_PER_WIDE_INT - 1);
+  if (shift != 0)
+    {
+      /* 000011111 */
+      block = (((HOST_WIDE_INT)1) << shift) - 1;
+      result.val[i++] = negate ? ~block : block;
+    }
+
+  result.len = i;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvvv 
+      ("wide_int:: %s = shifted_mask (start = %d width = %d negate = %d)\n", 
+       result, start, width, negate);
+#endif
+
+  return result;
+}
+
+
+/*
+ * logical operations.
+ */
+
+/* Return THIS & OP1.  */
+
+wide_int
+wide_int::and_large (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = MAX (len, op1.len);
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () == 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () == 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l1] = op1.val[l1];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] & op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+  return result;
+}
+
+/* Return THIS & ~OP1.  */
+
+wide_int
+wide_int::and_not_large (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = MAX (len, op1.len);
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () != 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () == 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l1] = ~op1.val[l1];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] & ~op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+
+  return result;
+}
+
+/* Return THIS | OP1.  */
+
+wide_int
+wide_int::or_large (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = MAX (len, op1.len);
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () != 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () != 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l1] = op1.val[l1];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] | op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+
+  return result;
+}
+
+/* Return THIS | ~OP1.  */
+
+wide_int
+wide_int::or_not_large (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = MAX (len, op1.len);
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () == 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () != 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l1] = ~op1.val[l1];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] | ~op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+  return result;
+}
+
+/* Return the exclusive ior (xor) of THIS and OP1.  */
+
+wide_int
+wide_int::xor_large (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+
+  result.len = MAX (len, op1.len);
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  while (l0 > l1)
+    {
+      result.val[l0] = val[l0] ^ op1.sign_mask ();
+      l0--;
+    }
+
+  while (l1 > l0)
+    {
+      result.val[l1] = sign_mask () ^ op1.val[l1];
+      l1--;
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] ^ op1.val[l0];
+      l0--;
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s ^ %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+/*
+ * math
+ */
+
+/* Absolute value of THIS.  */
+
+wide_int
+wide_int::abs () const
+{
+  if (sign_mask ())
+    return neg ();
+
+  wide_int result = copy ();
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_ww ("wide_int:: %s = abs (%s)\n", result, *this);
+#endif
+  return result;
+}
+
+/* Add of THIS and OP1.  No overflow is detected.  */
+
+wide_int
+wide_int::add_large (const wide_int &op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT carry = 0;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = MAX (len, op1.len);
+  mask0 = sign_mask ();
+  mask1 = op1.sign_mask ();
+  /* Add all of the explicitly defined elements.  */
+  for (i = 0; i < result.len; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < op1.len ? (unsigned HOST_WIDE_INT)op1.val[i] : mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  if (len * HOST_BITS_PER_WIDE_INT < precision)
+    {
+      result.val[result.len] = mask0 + mask1 + carry;
+      result.len++;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+  return result;
+}
+
+/* Add of OP0 and OP1 with overflow checking.  If the result overflows
+   within the precision, set OVERFLOW.  OVERFLOW is assumed to be
+   sticky so it should be initialized.  SGN controls if signed or
+   unsigned overflow is checked.  */
+
+wide_int
+wide_int::add (const wide_int &op1, SignOp sgn, bool *overflow) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0 = 0;
+  unsigned HOST_WIDE_INT o1 = 0;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT carry = 0;
+  unsigned HOST_WIDE_INT old_carry = 0;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  int i, small_prec;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = MAX (len, op1.len);
+  mask0 = sign_mask ();
+  mask1 = op1.sign_mask ();
+
+  /* Add all of the explicitly defined elements.  */
+  for (i = 0; i < len; i++)
+    {
+      o0 = val[i];
+      o1 = op1.val[i];
+      x = o0 + o1 + carry;
+      result.val [i] = x;
+      old_carry = carry;
+      carry = x < o0;
+    }
+
+  /* Uncompress the rest.  */
+  for (i = len; i < op1.len; i++)
+    {
+      o0 = val[i];
+      o1 = mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      old_carry = carry;
+      carry = x < o0;
+    }
+  for (i = len; i < op1.len; i++)
+    {
+      o0 = mask0;
+      o1 = op1.val[i];
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      old_carry = carry;
+      carry = x < o0;
+    }
+
+  if (len * HOST_BITS_PER_WIDE_INT < precision)
+    {
+      result.val[result.len] = mask0 + mask1 + carry;
+      result.len++;
+      /* If we were short, we could not have overflowed.  */
+      goto ex;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec == 0)
+    {
+      if (sgn == wide_int::SIGNED)
+	{
+	  if (((!(o0 ^ o1)) & (x ^ o0)) >> (HOST_BITS_PER_WIDE_INT - 1))
+	    *overflow = true;
+	}
+      else if (old_carry)
+	{
+	  if ((~o0) <= o1)
+	    *overflow = true;
+	}
+      else
+	{
+	  if ((~o0) < o1)
+	    *overflow = true;
+	}
+    }
+  else
+    {
+      if (sgn == wide_int::UNSIGNED)
+	{
+	  /* The caveat for unsigned is to get rid of the bits above
+	     the precision before doing the addition.  To check the
+	     overflow, clear these bits and then redo the last
+	     addition.  If there are any non zero bits above the prec,
+	     we overflowed. */
+	  o0 = zext_hwi (o0, small_prec);
+	  o1 = zext_hwi (o1, small_prec);
+	  x = o0 + o1 + old_carry;
+	  if (x >> small_prec)
+	    *overflow = true;
+	}
+      else 
+	{
+	  /* Overflow in this case is easy since we can see bits beyond
+	     the precision.  If the value computed is not the sign
+	     extended value, then we have overflow.  */
+	  unsigned HOST_WIDE_INT y = sext_hwi (x, small_prec);
+	  if (x != y)
+	    *overflow = true;
+	}
+    }
+
+ ex:
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvww ("wide_int:: %s %d = (%s +O %s)\n", 
+	       result, *overflow, *this, op1);
+#endif
+  return result;
+}
+
+/* Count leading zeros of THIS but only looking at the bits in the
+   smallest HWI of size mode.  */
+
+wide_int
+wide_int::clz (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (clz (), bs, prec);
+}
+
+/* Count leading zeros of THIS.  */
+
+int
+wide_int::clz () const
+{
+  int i;
+  int start;
+  int count;
+  HOST_WIDE_INT v;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+
+  if (zero_p ())
+    {
+      enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
+      if (mode == BLKmode)
+	mode_for_size (precision, MODE_PARTIAL_INT, 0); 
+
+      /* Even if the value at zero is undefined, we have to come up
+	 with some replacement.  Seems good enough.  */
+      if (mode == BLKmode)
+	count = precision;
+      else if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, count))
+	count = precision;
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_vw ("wide_int:: %d = clz (%s)\n", count, *this);
+#endif
+      return count;
+    }
+
+  /* The high order block is special if it is the last block and the
+     precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
+     have to clear out any ones above the precision before doing clz
+     on this block.  */
+  if (BLOCKS_NEEDED (precision) == len && small_prec)
+    {
+      v = zext_hwi (val[len - 1], small_prec);
+      count = clz_hwi (v) - (HOST_BITS_PER_WIDE_INT - small_prec);
+      start = len - 2;
+      if (v != 0)
+	{
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_vw ("wide_int:: %d = clz (%s)\n", count, *this);
+#endif
+	  return count;
+	}
+    }
+  else
+    {
+      count = HOST_BITS_PER_WIDE_INT * (BLOCKS_NEEDED (precision) - len);
+      start = len - 1;
+    }
+
+  for (i = start; i >= 0; i--)
+    {
+      v = elt (i);
+      count += clz_hwi (v);
+      if (v != 0)
+	break;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = clz (%s)\n", count, *this);
+#endif
+  return count;
+}
+
+wide_int
+wide_int::clrsb (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (clrsb (), bs, prec);
+}
+
+/* Count the number of redundant leading bits of THIS.  Return result
+   as a HOST_WIDE_INT.  There is a wrapper to convert this into a
+   wide_int.  */
+
+int
+wide_int::clrsb () const
+{
+  if (neg_p ())
+    return operator ~ ().clz () - 1;
+
+  return clz () - 1;
+}
+
+wide_int
+wide_int::ctz (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (ctz (), bs, prec);
+}
+
+/* Count zeros of THIS.  Return result as a HOST_WIDE_INT.  There is a
+   wrapper to convert this into a wide_int.  */
+
+int
+wide_int::ctz () const
+{
+  int i;
+  unsigned int count = 0;
+  HOST_WIDE_INT v;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  int end;
+  bool more_to_do;
+
+  if (zero_p ())
+    {
+      enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
+      if (mode == BLKmode)
+	mode_for_size (precision, MODE_PARTIAL_INT, 0); 
+
+      /* Even if the value at zero is undefined, we have to come up
+	 with some replacement.  Seems good enough.  */
+      if (mode == BLKmode)
+	count = precision;
+      else if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, count))
+	count = precision;
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_vw ("wide_int:: %d = ctz (%s)\n", count, *this);
+#endif
+      return count;
+    }
+
+  /* The high order block is special if it is the last block and the
+     precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
+     have to clear out any ones above the precision before doing clz
+     on this block.  */
+  if (BLOCKS_NEEDED (precision) == len && small_prec)
+    {
+      end = len - 1;
+      more_to_do = true;
+    }
+  else
+    {
+      end = len;
+      more_to_do = false;
+    }
+
+  for (i = 0; i < end; i++)
+    {
+      v = val[i];
+      count += ctz_hwi (v);
+      if (v != 0)
+	{
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_vw ("wide_int:: %d = ctz (%s)\n", count, *this);
+#endif
+	  return count;
+	}
+    }
+
+  if (more_to_do)
+    {
+      v = zext_hwi (val[len - 1], small_prec);
+      count = ctz_hwi (v);
+      /* The top word was all zeros so we have to cut it back to prec,
+	 because we are counting some of the zeros above the
+	 interesting part.  */
+      if (count > precision)
+	count = precision;
+    }
+  else
+    /* Skip over the blocks that are not represented.  They must be
+       all zeros at this point.  */
+    count = precision;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = ctz (%s)\n", count, *this);
+#endif
+  return count;
+}
+
+/* ffs of THIS.  */
+
+wide_int
+wide_int::ffs () const
+{
+  HOST_WIDE_INT count = ctz ();
+  if (count == precision)
+    count = 0;
+  else
+    count += 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = ffs (%s)\n", count, *this);
+#endif
+  return wide_int::from_shwi (count, word_mode);
+}
+
+/* Subroutines of the multiplication and division operations.  Unpack
+   the first IN_LEN HOST_WIDE_INTs in INPUT into 2 * IN_LEN
+   HOST_HALF_WIDE_INTs of RESULT.  The rest of RESULT is filled by
+   uncompressing the top bit of INPUT[IN_LEN - 1].  */
+
+static void
+wi_unpack (unsigned HOST_HALF_WIDE_INT *result, 
+	   const unsigned HOST_WIDE_INT *input,
+	   int in_len, int out_len)
+{
+  int i;
+  int j = 0;
+  HOST_WIDE_INT mask;
+
+  for (i = 0; i <in_len; i++)
+    {
+      result[j++] = input[i];
+      result[j++] = input[i] >> HOST_BITS_PER_HALF_WIDE_INT;
+    }
+  mask = ((HOST_WIDE_INT)input[in_len - 1]) >> (HOST_BITS_PER_WIDE_INT - 1);
+  mask &= HALF_INT_MASK;
+
+  /* Smear the sign bit.  */
+  while (j < out_len)
+    result[j++] = mask;
+}
+
+/* The inverse of wi_unpack.  IN_LEN is the the number of input
+   blocks.  The number of output blocks will be half this amount.  */
+
+static void
+wi_pack (unsigned HOST_WIDE_INT *result, 
+	 const unsigned HOST_HALF_WIDE_INT *input, 
+	 int in_len)
+{
+  int i = 0;
+  int j = 0;
+
+  while (i < in_len - 2)
+    {
+      result[j++] = (unsigned HOST_WIDE_INT)input[i] 
+	| ((unsigned HOST_WIDE_INT)input[i + 1] << HOST_BITS_PER_HALF_WIDE_INT);
+      i += 2;
+    }
+
+  /* Handle the case where in_len is odd.   For this we zero extend.  */
+  if (i & 1)
+    result[j++] = (unsigned HOST_WIDE_INT)input[i];
+  else
+    result[j++] = (unsigned HOST_WIDE_INT)input[i] 
+      | ((unsigned HOST_WIDE_INT)input[i + 1] << HOST_BITS_PER_HALF_WIDE_INT);
+}
+
+/* Return an integer that is the exact log2 of THIS.  */
+
+int
+wide_int::exact_log2 () const
+{
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  HOST_WIDE_INT count;
+  HOST_WIDE_INT result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      HOST_WIDE_INT v;
+      if (small_prec)
+	v = zext_hwi (val[0], small_prec);
+      else
+	v = val[0];
+      result = ::exact_log2 (v);
+      goto ex;
+    }
+
+  count = ctz ();
+  if (clz () + count + 1 == precision)
+    {
+      result = count;
+      goto ex;
+    }
+
+  result = -1;
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = exact_log2 (%s)\n", result, *this);
+#endif
+  return result;
+}
+
+/* Return an integer that is the floor log2 of THIS.  */
+
+int
+wide_int::floor_log2 () const
+{
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  HOST_WIDE_INT result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      HOST_WIDE_INT v;
+      if (small_prec)
+	v = zext_hwi (val[0], small_prec);
+      else
+	v = val[0];
+      result = ::floor_log2 (v);
+      goto ex;
+    }
+
+  result = precision - 1 - clz ();
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = floor_log2 (%s)\n", result, *this);
+#endif
+  return result;
+}
+
+
+/* Multiply Op1 by Op2.  If HIGH is set, only the upper half of the
+   result is returned.  If FULL is set, the entire result is returned
+   in a mode that is twice the width of the inputs.  However, that
+   mode needs to exist if the value is to be usable.  Clients that use
+   FULL need to check for this.
+
+   If HIGH or FULL are not setm throw away the upper half after the check
+   is made to see if it overflows.  Unfortunately there is no better
+   way to check for overflow than to do this.  OVERFLOW is assumed to
+   be sticky so it should be initialized.  SGN controls the signess
+   and is used to check overflow or if HIGH or FULL is set.  */
+
+wide_int
+wide_int::mul_internal (bool high, bool full, 
+			const wide_int *op1, const wide_int *op2,
+			wide_int::SignOp sgn,  bool *overflow, 
+			bool needs_overflow)
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1, k, t;
+  unsigned int i;
+  unsigned int j;
+  unsigned int prec = op1->get_precision ();
+  unsigned int blocks_needed = BLOCKS_NEEDED (prec);
+  unsigned int half_blocks_needed = blocks_needed * 2;
+  /* The sizes here are scaled to support a 2x largest mode by 2x
+     largest mode yielding a 4x largest mode result.  This is what is
+     needed by vpn.  */
+
+  unsigned HOST_HALF_WIDE_INT 
+    u[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT 
+    v[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+  /* The '2' in 'R' is because we are internally doing a full
+     multiply.  */
+  unsigned HOST_HALF_WIDE_INT 
+    r[2 * 2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+  HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1;
+
+  result.bitsize = op1->bitsize;
+  result.precision = op1->precision;
+
+  if (high || full || needs_overflow)
+    {
+      /* If we need to check for overflow, we can only do half wide
+	 multiplies quickly because we need to look at the top bits to
+	 check for the overflow.  */
+      if (prec <= HOST_BITS_PER_HALF_WIDE_INT)
+	{
+	  HOST_WIDE_INT t, r;
+	  result.len = 1;
+	  o0 = op1->elt (0);
+	  o1 = op2->elt (0);
+	  r = o0 * o1;
+	  /* Signed shift down will leave 0 or -1 if there was no
+	     overflow for signed or 0 for unsigned.  */
+	  t = r >> (HOST_BITS_PER_HALF_WIDE_INT - 1);
+	  if (needs_overflow)
+	    {
+	      if (sgn == wide_int::SIGNED)
+		{
+		  if (t != (HOST_WIDE_INT)-1 && t != 0)
+		    *overflow = true;
+		}
+	      else
+		{
+		  if (t != 0)
+		    *overflow = true;
+		}
+	    }
+	  if (full)
+	    {
+	      result.val[0] = sext_hwi (r, prec * 2);
+	      result.bitsize = op1->bitsize * 2;
+	      result.precision = op1->precision * 2;
+	    }
+	  else if (high)
+	    result.val[0] = r >> prec;
+	  else
+	    result.val[0] = sext_hwi (r, prec);
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_wvww ("wide_int:: %s %d = (%s *O %s)\n", 
+			result, *overflow, *op1, *op2);
+#endif
+	  return result;
+	}
+    }
+
+  wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1->val, op1->len,
+	     half_blocks_needed);
+  wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2->val, op2->len,
+	     half_blocks_needed);
+
+  /* The 2 is for a full mult.  */
+  memset (r, 0, half_blocks_needed * 2 
+	  * HOST_BITS_PER_HALF_WIDE_INT / BITS_PER_UNIT);
+
+  for (j = 0; j < half_blocks_needed; j++)
+    {
+      k = 0;
+      for (i = 0; i < half_blocks_needed; i++)
+	{
+	  t = ((unsigned HOST_WIDE_INT)u[i] * (unsigned HOST_WIDE_INT)v[j]
+	       + r[i + j] + k);
+	  r[i + j] = t & HALF_INT_MASK;
+	  k = t >> HOST_BITS_PER_HALF_WIDE_INT;
+	}
+      r[j + half_blocks_needed] = k;
+    }
+
+  /* We did unsigned math above.  For signed we must adjust the
+     product (assuming we need to see that).  */
+  if (sgn == wide_int::SIGNED && (full || high || needs_overflow))
+    {
+      unsigned HOST_WIDE_INT b;
+      if ((*op1).neg_p ())
+	{
+	  b = 0;
+	  for (i = 0; i < half_blocks_needed; i++)
+	    {
+	      t = (unsigned HOST_WIDE_INT)r[i + half_blocks_needed]
+		- (unsigned HOST_WIDE_INT)v[i] - b;
+	      r[i + half_blocks_needed] = t & HALF_INT_MASK;
+	      b = t >> (HOST_BITS_PER_WIDE_INT - 1);
+	    }
+	}
+      if ((*op2).neg_p ())
+	{
+	  b = 0;
+	  for (i = 0; i < half_blocks_needed; i++)
+	    {
+	      t = (unsigned HOST_WIDE_INT)r[i + half_blocks_needed]
+		- (unsigned HOST_WIDE_INT)u[i] - b;
+	      r[i + half_blocks_needed] = t & HALF_INT_MASK;
+	      b = t >> (HOST_BITS_PER_WIDE_INT - 1);
+	    }
+	}
+    }
+
+  if (needs_overflow)
+    {
+      HOST_WIDE_INT top;
+
+      /* For unsigned, overflow is true if any of the top bits are set.
+	 For signed, overflow is true if any of the top bits are not equal
+	 to the sign bit.  */
+      if (sgn == wide_int::UNSIGNED)
+	top = 0;
+      else
+	{
+	  top = r[(half_blocks_needed) - 1];
+	  top = ((top << (HOST_BITS_PER_WIDE_INT / 2))
+		 >> (HOST_BITS_PER_WIDE_INT - 1));
+	  top &= mask;
+	}
+      
+      for (i = half_blocks_needed; i < half_blocks_needed * 2; i++)
+	if (((HOST_WIDE_INT)(r[i] & mask)) != top)
+	  *overflow = true; 
+    }
+
+  if (full)
+    {
+      /* compute [2prec] <- [prec] * [prec] */
+      wi_pack ((unsigned HOST_WIDE_INT*)result.val, r, 2 * half_blocks_needed);
+      result.len = blocks_needed * 2;
+      result.bitsize = op1->bitsize * 2;
+      result.precision = op1->precision * 2;
+    }
+  else if (high)
+    {
+      /* compute [prec] <- ([prec] * [prec]) >> [prec] */
+      wi_pack ((unsigned HOST_WIDE_INT*)&result.val [blocks_needed >> 1],
+	       r, half_blocks_needed);
+      result.len = blocks_needed;
+    }
+  else
+    {
+      /* compute [prec] <- ([prec] * [prec]) && ((1 << [prec]) - 1) */
+      wi_pack ((unsigned HOST_WIDE_INT*)result.val, r, half_blocks_needed);
+      result.len = blocks_needed;
+    }
+      
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvww ("wide_int:: %s %d = (%s *O %s)\n", 
+		result, *overflow, *op1, *op2);
+#endif
+  return result;
+}
+
+/* Multiply THIS and OP1.  The signess is specified with SGN.
+   OVERFLOW is set true if the result overflows.  */
+
+wide_int 
+wide_int::mul (const wide_int &op1, SignOp sgn, bool *overflow) const
+{
+  return mul_internal (false, false, this, &op1, sgn, overflow, true);
+}
+
+/* Multiply THIS and OP1.  The signess is specified with SGN.  The
+   result is twice the precision as the operands.  The signess is
+   specified with SGN.  */
+
+wide_int
+wide_int::mul_full (const wide_int &op1, SignOp sgn) const
+{
+  bool overflow = false;
+
+  return mul_internal (false, true, this, &op1, sgn, &overflow, false);
+}
+
+/* Multiply THIS and OP1 and return the high part of that result.  The
+   signess is specified with SGN.  The result is the same precision as
+   the operands.  The mode is the same mode as the operands.  The
+   signess is specified with y.  */
+
+wide_int
+wide_int::mul_high (const wide_int &op1, SignOp sgn) const
+{
+  bool overflow = false;
+
+  return mul_internal (true, false, this, &op1, sgn, &overflow, false);
+}
+
+/* Compute the parity of THIS.  */
+
+wide_int
+wide_int::parity (unsigned int bs, unsigned int prec) const
+{
+  int count = popcount ();
+  return wide_int::from_shwi (count & 1, bs, prec);
+}
+
+/* Compute the population count of THIS producing a number with
+   BITSIZE and PREC.  */
+
+wide_int
+wide_int::popcount (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (popcount (), bs, prec);
+}
+
+/* Compute the population count of THIS.  */
+
+int
+wide_int::popcount () const
+{
+  int i;
+  int start;
+  int count;
+  HOST_WIDE_INT v;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+
+  /* The high order block is special if it is the last block and the
+     precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
+     have to clear out any ones above the precision before doing clz
+     on this block.  */
+  if (BLOCKS_NEEDED (precision) == len && small_prec)
+    {
+      v = zext_hwi (val[len - 1], small_prec);
+      count = popcount_hwi (v);
+      start = len - 2;
+    }
+  else
+    {
+      if (sign_mask ())
+	count = HOST_BITS_PER_WIDE_INT * (BLOCKS_NEEDED (precision) - len);
+      else
+	count = 0;
+      start = len - 1;
+    }
+
+  for (i = start; i >= 0; i--)
+    {
+      v = val[i];
+      count += popcount_hwi (v);
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int:: %d = popcount (%s)\n", count, *this);
+#endif
+  return count;
+}
+
+/* Subtract of THIS and OP1.  No overflow is detected.  */
+
+wide_int
+wide_int::sub_large (const wide_int &op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  /* We implement subtraction as an in place negate and add.  Negation
+     is just inversion and add 1, so we can do the add of 1 by just
+     starting the borrow in of the first element at 1.  */
+  unsigned HOST_WIDE_INT borrow = 0;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = MAX (len, op1.len);
+  mask0 = sign_mask ();
+  mask1 = op1.sign_mask ();
+
+  /* Subtract all of the explicitly defined elements.  */
+  for (i = 0; i < result.len; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < op1.len ? (unsigned HOST_WIDE_INT)op1.val[i] : mask1;
+      x = o0 - o1 - borrow;
+      result.val[i] = x;
+      borrow = o0 < o1;
+    }
+
+  if (len * HOST_BITS_PER_WIDE_INT < precision)
+    {
+      result.val[result.len] = mask0 - mask1 - borrow;
+      result.len++;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+  return result;
+}
+
+/* Subtract of THIS and OP1 with overflow checking.  If the result
+   overflows within the precision, set OVERFLOW.  OVERFLOW is assumed
+   to be sticky so it should be initialized.  SGN controls if signed or
+   unsigned overflow is checked.  */
+
+wide_int
+wide_int::sub (const wide_int &op1, SignOp sgn, bool *overflow) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0 = 0;
+  unsigned HOST_WIDE_INT o1 = 0;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT borrow = 0;
+  unsigned HOST_WIDE_INT old_borrow = 0;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  int i, small_prec;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = MAX (len, op1.len);
+  mask0 = sign_mask ();
+  mask1 = op1.sign_mask ();
+
+  /* Subtract all of the explicitly defined elements.  */
+  for (i = 0; i < len; i++)
+    {
+      o0 = val[i];
+      o1 = op1.val[i];
+      x = o0 - o1 - borrow;
+      result.val[i] = x;
+      old_borrow = borrow;
+      borrow = o0 < o1;
+    }
+
+  /* Uncompress the rest.  */
+  for (i = len; i < op1.len; i++)
+    {
+      o0 = val[i];
+      o1 = mask1;
+      x = o0 - o1 - borrow;
+      result.val[i] = x;
+      old_borrow = borrow;
+      borrow = o0 < o1;
+    }
+
+  for (i = op1.len; i < len; i++)
+    {
+      o0 = mask0;
+      o1 = op1.val[i];
+      x = o0 - o1 - borrow;
+      result.val[i] = x;
+      old_borrow = borrow;
+      borrow = o0 < o1;
+    }
+
+  if (len * HOST_BITS_PER_WIDE_INT < precision)
+    {
+      result.val[result.len] = mask0 - mask1 - borrow;
+      result.len++;
+      /* If we were short, we could not have overflowed.  */
+      goto ex;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec == 0)
+    {
+      if (sgn == wide_int::SIGNED)
+	{
+	  if (((x ^ o0) & (x ^ o0)) >> (HOST_BITS_PER_WIDE_INT - 1))
+	    *overflow = true;
+	}
+      else if (old_borrow)
+	{
+	  if ((~o0) <= o1)
+	    *overflow = true;
+	}
+      else
+	{
+	  if ((~o0) < o1)
+	    *overflow = true;
+	}
+    }
+  else
+    {
+      if (sgn == wide_int::UNSIGNED)
+	{
+	  /* The caveat for unsigned is to get rid of the bits above
+	     the precision before doing the addition.  To check the
+	     overflow, clear these bits and then redo the last
+	     addition.  If there are any non zero bits above the prec,
+	     we overflowed. */
+	  o0 = zext_hwi (o0, small_prec);
+	  o1 = zext_hwi (o1, small_prec);
+	  x = o0 - o1 - old_borrow;
+	  if (x >> small_prec)
+	    *overflow = true;
+	}
+      else 
+	{
+	  /* Overflow in this case is easy since we can see bits beyond
+	     the precision.  If the value computed is not the sign
+	     extended value, then we have overflow.  */
+	  unsigned HOST_WIDE_INT y = sext_hwi (x, small_prec);
+	  if (x != y)
+	    *overflow = true;
+	}
+    }
+
+ ex:
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvww ("wide_int:: %s %d = (%s -O %s)\n", 
+		result, *overflow, *this, op1);
+#endif
+  return result;
+}
+
+/*
+ * Division and Mod
+ */
+
+/* Compute B_QUOTIENT and B_REMAINDER from B_DIVIDEND/B_DIVISOR.  The
+   algorithm is a small modification of the algorithm in Hacker's
+   Delight by Warren, which itself is a small modification of Knuth's
+   algorithm.  M is the number of significant elements of U however
+   there needs to be at least one extra element of B_DIVIDEND
+   allocated, N is the number of elements of B_DIVISOR.  */
+
+void
+wide_int::divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient, 
+			     unsigned HOST_HALF_WIDE_INT *b_remainder,
+			     unsigned HOST_HALF_WIDE_INT *b_dividend, 
+			     unsigned HOST_HALF_WIDE_INT *b_divisor, 
+			     int m, int n)
+{
+  /* The "digits" are a HOST_HALF_WIDE_INT which the size of half of a
+     HOST_WIDE_INT and stored in the lower bits of each word.  This
+     algorithm should work properly on both 32 and 64 bit
+     machines.  */
+  unsigned HOST_WIDE_INT b
+    = (unsigned HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT;
+  unsigned HOST_WIDE_INT qhat;   /* Estimate of quotient digit.  */
+  unsigned HOST_WIDE_INT rhat;   /* A remainder.  */
+  unsigned HOST_WIDE_INT p;      /* Product of two digits.  */
+  HOST_WIDE_INT s, i, j, t, k;
+
+  /* Single digit divisor.  */
+  if (n == 1)
+    {
+      k = 0;
+      for (j = m - 1; j >= 0; j--)
+	{
+	  b_quotient[j] = (k * b + b_dividend[j])/b_divisor[0];
+	  k = ((k * b + b_dividend[j])
+	       - ((unsigned HOST_WIDE_INT)b_quotient[j]
+		  * (unsigned HOST_WIDE_INT)b_divisor[0]));
+	}
+      b_remainder[0] = k;
+      return;
+    }
+
+  s = clz_hwi (b_divisor[n-1]) - HOST_BITS_PER_HALF_WIDE_INT; /* CHECK clz */
+
+  /* Normalize B_DIVIDEND and B_DIVISOR.  Unlike the published
+     algorithm, we can overwrite b_dividend and b_divisor, so we do
+     that.  */
+  for (i = n - 1; i > 0; i--)
+    b_divisor[i] = (b_divisor[i] << s)
+      | (b_divisor[i-1] >> (HOST_BITS_PER_HALF_WIDE_INT - s));
+  b_divisor[0] = b_divisor[0] << s;
+
+  b_dividend[m] = b_dividend[m-1] >> (HOST_BITS_PER_HALF_WIDE_INT - s);
+  for (i = m - 1; i > 0; i--)
+    b_dividend[i] = (b_dividend[i] << s)
+      | (b_dividend[i-1] >> (HOST_BITS_PER_HALF_WIDE_INT - s));
+  b_dividend[0] = b_dividend[0] << s;
+
+  /* Main loop.  */
+  for (j = m - n; j >= 0; j--)
+    {
+      qhat = (b_dividend[j+n] * b + b_dividend[j+n-1]) / b_divisor[n-1];
+      rhat = (b_dividend[j+n] * b + b_dividend[j+n-1]) - qhat * b_divisor[n-1];
+    again:
+      if (qhat >= b || qhat * b_divisor[n-2] > b * rhat + b_dividend[j+n-2])
+	{
+	  qhat -= 1;
+	  rhat += b_divisor[n-1];
+	  if (rhat < b)
+	    goto again;
+	}
+
+      /* Multiply and subtract.  */
+      k = 0;
+      for (i = 0; i < n; i++)
+	{
+	  p = qhat * b_divisor[i];
+	  t = b_dividend[i+j] - k - (p & HALF_INT_MASK);
+	  b_dividend[i + j] = t;
+	  k = ((p >> HOST_BITS_PER_HALF_WIDE_INT)
+	       - (t >> HOST_BITS_PER_HALF_WIDE_INT));
+	}
+      t = b_dividend[j+n] - k;
+      b_dividend[j+n] = t;
+
+      b_quotient[j] = qhat;
+      if (t < 0)
+	{
+	  b_quotient[j] -= 1;
+	  k = 0;
+	  for (i = 0; i < n; i++)
+	    {
+	      t = (HOST_WIDE_INT)b_dividend[i+j] + b_divisor[i] + k;
+	      b_dividend[i+j] = t;
+	      k = t >> HOST_BITS_PER_HALF_WIDE_INT;
+	    }
+	  b_dividend[j+n] += k;
+	}
+    }
+  for (i = 0; i < n; i++)
+    b_remainder[i] = (b_dividend[i] >> s) 
+      | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s));
+}
+
+
+/* Do a truncating divide DIVISOR into DIVIDEND.  The result is the
+   same size as the operands.  SIGN is either wide_int::SIGNED or
+   wide_int::UNSIGNED.  */
+
+wide_int
+wide_int::divmod_internal (bool compute_quotient, 
+			   const wide_int *dividend, const wide_int *divisor,
+			   wide_int::SignOp sgn, wide_int *remainder,
+			   bool compute_remainder, 
+			   bool *overflow)
+{
+  wide_int quotient, u0, u1;
+  unsigned int prec = dividend->get_precision();
+  unsigned int bs = dividend->get_bitsize ();
+  int blocks_needed = 2 * BLOCKS_NEEDED (prec);
+  /* The '2' in the next 4 vars are because they are built on half
+     sized wide ints.  */
+  unsigned HOST_HALF_WIDE_INT 
+    b_quotient[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT
+    b_remainder[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT
+    b_dividend[(MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT) + 1];
+  unsigned HOST_HALF_WIDE_INT
+    b_divisor[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+  int m, n;
+  bool dividend_neg = false;
+  bool divisor_neg = false;
+
+  if ((*divisor).zero_p ())
+    *overflow = true;
+
+  /* The smallest signed number / -1 causes overflow.  */
+  if (sgn == wide_int::SIGNED)
+    {
+      wide_int t = wide_int::set_bit_in_zero (prec - 1, 
+					      bs, 
+					      prec);
+      if (*dividend == t && (*divisor).minus_one_p ())
+	*overflow = true;
+    }
+
+  quotient.bitsize = bs;
+  remainder->bitsize = bs;
+  quotient.precision = prec;
+  remainder->precision = prec;
+
+  /* If overflow is set, just get out.  There will only be grief by
+     continuing.  */
+  if (*overflow)
+    {
+      if (compute_remainder)
+	{
+	  remainder->len = 1;
+	  remainder->val[0] = 0;
+	}
+      return wide_int::zero (bs, prec);
+    }
+
+  /* Do it on the host if you can.  */
+  if (prec <= HOST_BITS_PER_WIDE_INT)
+    {
+      quotient.len = 1;
+      remainder->len = 1;
+      if (sgn == wide_int::SIGNED)
+	{
+	  quotient.val[0] 
+	    = sext_hwi (dividend->elt (0) / divisor->val[0], prec);
+	  remainder->val[0] 
+	    = sext_hwi (dividend->elt (0) % divisor->val[0], prec);
+	}
+      else
+	{
+	  unsigned HOST_WIDE_INT o0 = dividend->elt (0);
+	  unsigned HOST_WIDE_INT o1 = divisor->elt (0);
+
+	  if (prec < HOST_BITS_PER_WIDE_INT)
+	    {
+	      o0 = zext_hwi (o0, prec);
+	      o1 = zext_hwi (o1, prec);
+	    }
+	  quotient.val[0] = sext_hwi (o0 / o1, prec);
+	  remainder->val[0] = sext_hwi (o0 % o1, prec);
+	}
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwww ("wide_int:: (q = %s) (r = %s) = (%s / %s)\n", 
+		    quotient, *remainder, *dividend, *divisor);
+#endif
+      return quotient;
+    }
+
+  /* Make the divisor and divident positive and remember what we
+     did.  */
+  if (sgn == wide_int::SIGNED)
+    {
+      if (dividend->sign_mask ())
+	{
+	  u0 = dividend->neg ();
+	  dividend = &u0;
+	  dividend_neg = true;
+	}
+      if (divisor->sign_mask ())
+	{
+	  u1 = divisor->neg ();
+	  divisor = &u1;
+	  divisor_neg = true;
+	}
+    }
+
+  wi_unpack (b_dividend, (const unsigned HOST_WIDE_INT*)dividend->val,
+	     dividend->len, blocks_needed);
+  wi_unpack (b_divisor, (const unsigned HOST_WIDE_INT*)divisor->val, 
+	     divisor->len, blocks_needed);
+
+  if (dividend->sign_mask ())
+    m = blocks_needed;
+  else
+    m = 2 * dividend->get_len ();
+
+  if (divisor->sign_mask ())
+    n = blocks_needed;
+  else
+    n = 2 * divisor->get_len ();
+
+  /* It is known that the top input block to the divisor is non zero,
+     but when this block is split into two half blocks, it may be that
+     the top half block is zero.  Skip over this half block.  */
+  if (b_divisor[n - 1] == 0)
+    n--;
+
+  divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);
+
+  if (compute_quotient)
+    {
+      wi_pack ((unsigned HOST_WIDE_INT*)quotient.val, b_quotient, m);
+      quotient.len = m / 2;
+      quotient.canonize ();
+      /* The quotient is neg if exactly one of the divisor or dividend is
+	 neg.  */
+      if (dividend_neg != divisor_neg)
+	quotient = quotient.neg ();
+    }
+  else
+    quotient = wide_int::zero (word_mode);
+
+  if (compute_remainder)
+    {
+      wi_pack ((unsigned HOST_WIDE_INT*)remainder->val, b_remainder, n);
+      if (n & 1)
+	n++;
+      remainder->len = n / 2;
+      (*remainder).canonize ();
+      /* The remainder is always the same sign as the dividend.  */
+      if (dividend_neg)
+	*remainder = (*remainder).neg ();
+    }
+  else
+    *remainder = wide_int::zero (word_mode);
+
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwww ("wide_int:: (q = %s) (r = %s) = (%s / %s)\n", 
+		quotient, *remainder, *dividend, *divisor);
+#endif
+  return quotient;
+}
+
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is
+   truncated.  */
+
+wide_int
+wide_int::div_trunc (const wide_int &divisor, SignOp sgn) const
+{
+  wide_int remainder;
+  bool overflow = false;
+
+  return divmod_internal (true, this, &divisor, sgn, 
+			  &remainder, false, &overflow);
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is truncated.
+   Overflow is set to true if the result overflows, otherwise it is
+   not set.  */
+wide_int
+wide_int::div_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  
+  return divmod_internal (true, this, &divisor, sgn, 
+			  &remainder, false, overflow);
+}
+
+/* Divide DIVISOR into THIS producing both the quotient and remainder.
+   The result is the same size as the operands.  The sign is specified
+   in SGN.  The output is truncated.  */
+
+wide_int
+wide_int::divmod_trunc (const wide_int &divisor, wide_int *remainder, SignOp sgn) const
+{
+  bool overflow = false;
+
+  return divmod_internal (true, this, &divisor, sgn, 
+			  remainder, true, &overflow);
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is truncated.  */
+
+wide_int
+wide_int::mod_trunc (const wide_int &divisor, SignOp sgn) const
+{
+  bool overflow = false;
+  wide_int remainder;
+
+  divmod_internal (false, this, &divisor, sgn, 
+		   &remainder, true, &overflow);
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+
+  divmod_internal (true, this, &divisor, sgn, 
+			  &remainder, true, overflow);
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is floor
+   truncated.  Overflow is set to true if the result overflows,
+   otherwise it is not set.  */
+
+wide_int
+wide_int::div_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+  if (sgn == SIGNED && quotient.neg_p () && !remainder.zero_p ())
+    return quotient - wide_int::one (bitsize, precision);
+  return quotient;
+}
+
+
+/* Divide DIVISOR into THIS.  The remainder is also produced in
+   REMAINDER.  The result is the same size as the operands.  The sign
+   is specified in SGN.  The output is floor truncated.  Overflow is
+   set to true if the result overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::divmod_floor (const wide_int &divisor, wide_int *remainder, SignOp sgn) const
+{
+  wide_int quotient;
+  bool overflow = false;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      remainder, true, &overflow);
+  if (sgn == SIGNED && quotient.neg_p () && !(*remainder).zero_p ())
+    {
+      *remainder = *remainder - divisor;
+      return quotient - wide_int::one (bitsize, precision);
+    }
+  return quotient;
+}
+
+
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is floor truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (sgn == SIGNED && quotient.neg_p () && !remainder.zero_p ())
+    return remainder - divisor;
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is ceil
+   truncated.  Overflow is set to true if the result overflows,
+   otherwise it is not set.  */
+
+wide_int
+wide_int::div_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (!remainder.zero_p ())
+    {
+      if (sgn == UNSIGNED || quotient.neg_p ())
+	return quotient;
+      else
+	return quotient + wide_int::one (bitsize, precision);
+    }
+  return quotient;
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is the
+   same size as the operands.  The sign is specified in SGN.  The
+   output is ceil truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (!remainder.zero_p ())
+    {
+      if (sgn == UNSIGNED || quotient.neg_p ())
+	return remainder;
+      else
+	return remainder - divisor;
+    }
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is round
+   truncated.  Overflow is set to true if the result overflows,
+   otherwise it is not set.  */
+
+wide_int
+wide_int::div_round (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+  if (!remainder.zero_p ())
+    {
+      if (sgn == SIGNED)
+	{
+	  wide_int p_remainder = remainder.neg_p () ? remainder.neg () : remainder;
+	  wide_int p_divisor = divisor.neg_p () ? divisor.neg () : divisor;
+	  p_divisor = p_divisor.rshiftu (1);
+	  
+	  if (p_divisor.gts_p (p_remainder)) 
+	    {
+	      if (quotient.neg_p ())
+		return quotient - wide_int::one (bitsize, precision);
+	      else 
+		return quotient + wide_int::one (bitsize, precision);
+	    }
+	}
+      else
+	{
+	  wide_int p_divisor = divisor.rshiftu (1);
+	  if (p_divisor.gtu_p (remainder))
+	    return quotient + wide_int::one (bitsize, precision);
+	}
+    }
+  return quotient;
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is round truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_round (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (!remainder.zero_p ())
+    {
+      if (sgn == SIGNED)
+	{
+	  wide_int p_remainder = remainder.neg_p () ? remainder.neg () : remainder;
+	  wide_int p_divisor = divisor.neg_p () ? divisor.neg () : divisor;
+	  p_divisor = p_divisor.rshiftu (1);
+	  
+	  if (p_divisor.gts_p (p_remainder)) 
+	    {
+	      if (quotient.neg_p ())
+		return remainder + divisor;
+	      else 
+		return remainder - divisor;
+	    }
+	}
+      else
+	{
+	  wide_int p_divisor = divisor.rshiftu (1);
+	  if (p_divisor.gtu_p (remainder))
+	    return remainder - divisor;
+	}
+    }
+  return remainder;
+}
+
+/*
+ * Shifting, rotating and extraction.
+ */
+
+/* Extract WIDTH bits from THIS starting at OFFSET.  The result is
+   assumed to fit in a HOST_WIDE_INT.  This function is safe in that
+   it can properly access elements that may not be explicitly
+   represented.  */
+
+HOST_WIDE_INT
+wide_int::extract_to_hwi (int offset, int width) const
+{
+  int start_elt, end_elt, shift;
+  HOST_WIDE_INT x;
+
+  /* Get rid of the easy cases first.   */
+  if (offset >= len * HOST_BITS_PER_WIDE_INT)
+    return sign_mask ();
+  if (offset + width <= 0)
+    return 0;
+
+  shift = offset & (HOST_BITS_PER_WIDE_INT - 1);
+  if (offset < 0)
+    {
+      start_elt = -1;
+      end_elt = 0;
+      x = 0;
+    }
+  else
+    {
+      start_elt = offset / HOST_BITS_PER_WIDE_INT;
+      end_elt = (offset + width - 1) / HOST_BITS_PER_WIDE_INT;
+      x = start_elt >= len ? sign_mask () : val[start_elt] >> shift;
+    }
+
+  if (start_elt != end_elt)
+    {
+      HOST_WIDE_INT y = end_elt == len
+	? sign_mask () : val[end_elt];
+
+      x = (unsigned HOST_WIDE_INT)x >> shift;
+      x |= y << (HOST_BITS_PER_WIDE_INT - shift);
+    }
+
+  if (width != HOST_BITS_PER_WIDE_INT)
+    x &= ((HOST_WIDE_INT)1 << width) - 1;
+
+  return x;
+}
+
+
+/* Left shift THIS by CNT.  See the definition of Op.TRUNC for how to
+   set Z.  Since this is used internally, it has the ability to
+   specify the BISIZE and PRECISION independently.  This is useful
+   when inserting a small value into a larger one.  */
+
+wide_int
+wide_int::lshift_large (unsigned int cnt, 
+			unsigned int bs, unsigned int res_prec) const
+{
+  wide_int result;
+  unsigned int i;
+
+  result.bitsize = bs;
+  result.precision = res_prec;
+
+  if (cnt >= res_prec)
+    {
+      result.val[0] = 0;
+      result.len = 1;
+      return result;
+    }
+
+  for (i = 0; i < res_prec; i += HOST_BITS_PER_WIDE_INT)
+    result.val[i / HOST_BITS_PER_WIDE_INT]
+      = extract_to_hwi (i - cnt, HOST_BITS_PER_WIDE_INT);
+
+  result.len = BLOCKS_NEEDED (res_prec);
+  result.canonize ();
+
+  return result;
+}
+
+/* Rotate THIS left by CNT within its precision.  */
+
+wide_int
+wide_int::lrotate (unsigned int cnt) const
+{
+  wide_int left, right, result;
+
+  left = lshift (cnt, NONE);
+  right = rshiftu (precision - cnt, NONE);
+  result = left | right;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s lrotate %d)\n", result, *this, cnt);
+#endif
+  return result;
+}
+
+/* Unsigned right shift THIS by CNT.  See the definition of Op.TRUNC
+   for how to set Z.  */
+
+wide_int
+wide_int::rshiftu_large (unsigned int cnt) const
+{
+  wide_int result;
+  int stop_block, offset, i;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (cnt >= precision)
+    {
+      result.val[0] = 0;
+      result.len = 1;
+      return result;
+    }
+
+  stop_block = BLOCKS_NEEDED (precision - cnt);
+  for (i = 0; i < stop_block; i++)
+    result.val[i]
+      = extract_to_hwi ((i * HOST_BITS_PER_WIDE_INT) + cnt,
+			HOST_BITS_PER_WIDE_INT);
+
+  result.len = stop_block;
+
+  offset = (precision - cnt) & (HOST_BITS_PER_WIDE_INT - 1);
+  if (offset)
+    result.val[stop_block - 1] = zext_hwi (result.val[stop_block - 1], offset);
+  else
+    /* The top block had a 1 at the top position so it will decompress
+       wrong unless a zero block is added.  This only works because we
+       know the shift was greater than 0.  */
+    if (result.val[stop_block - 1] < 0)
+      result.val[result.len++] = 0;
+  return result;
+}
+
+/* Signed right shift THIS by CNT.  See the definition of Op.TRUNC for
+   how to set Z.  */
+
+wide_int
+wide_int::rshifts_large (unsigned int cnt) const
+{
+  wide_int result;
+  int stop_block, i;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (cnt >= precision)
+    {
+      HOST_WIDE_INT m = sign_mask ();
+      result.val[0] = m;
+      result.len = 1;
+      return result;
+    }
+
+  stop_block = BLOCKS_NEEDED (precision - cnt);
+  for (i = 0; i < stop_block; i++)
+    result.val[i]
+      = extract_to_hwi ((i * HOST_BITS_PER_WIDE_INT) + cnt,
+			HOST_BITS_PER_WIDE_INT);
+
+  result.len = stop_block;
+
+  /* No need to sign extend the last block, since it extract_to_hwi
+     already did that.  */
+  return result;
+}
+
+/* Rotate THIS right by CNT within its precision.  */
+
+wide_int
+wide_int::rrotate (unsigned int cnt) const
+{
+  wide_int left, right, result;
+
+  left = lshift (precision - cnt, NONE);
+  right = rshiftu (cnt, NONE);
+  result = left | right;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s rrotate %d)\n", result, *this, cnt);
+#endif
+  return result;
+}
+
+/*
+ * Private utilities.
+ */
+/* Decompress THIS for at least TARGET bits into a result with bitsize
+   BS and precision PREC.  */
+
+wide_int
+wide_int::decompress (unsigned int target, 
+		      unsigned int bs, unsigned int prec) const
+{
+  wide_int result;
+  int blocks_needed = BLOCKS_NEEDED (target);
+  HOST_WIDE_INT mask;
+  int len, i;
+
+  result.bitsize = bs;
+  result.precision = prec;
+  result.len = blocks_needed;
+
+  for (i = 0; i < this->len; i++)
+    result.val[i] = val[i];
+
+  len = this->len;
+
+  if (target > result.precision)
+    return result;
+
+  /* The extension that we are doing here is not sign extension, it is
+     decompression.  */
+  mask = sign_mask ();
+  while (len < blocks_needed)
+    result.val[len++] = mask;
+
+  return result;
+}
+
+
+/*
+ * Private debug printing routines.
+ */
+
+/* The debugging routines print results of wide operations into the
+   dump files of the respective passes in which they were called.  */
+char *
+wide_int::dump (char* buf) const
+{
+  int i;
+  int l;
+  const char * sep = "";
+
+  l = sprintf (buf, "[%d,%d (", bitsize, precision);
+  for (i = len - 1; i >= 0; i--)
+    {
+      l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]);
+      sep = " ";
+    }
+
+  gcc_assert (len != 0);
+
+  l += sprintf (&buf[l], ")]");
+
+  gcc_assert (l < MAX_SIZE);
+  return buf;
+}
+
+#ifdef DEBUG_WIDE_INT
+void
+wide_int::debug_vw (const char* fmt, int r, const wide_int& o0)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r, o0.dump (buf0));
+}
+
+void
+wide_int::debug_vwh (const char* fmt, int r, const wide_int &o0,
+		     HOST_WIDE_INT o1)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r, o0.dump (buf0), o1);
+}
+
+void
+wide_int::debug_vww (const char* fmt, int r, const wide_int &o0,
+		     const wide_int &o1)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  fprintf (dump_file, fmt, r, o0.dump (buf0), o1.dump (buf1));
+}
+
+void
+wide_int::debug_wh (const char* fmt, const wide_int &r,
+		    HOST_WIDE_INT o1)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), o1);
+}
+
+void
+wide_int::debug_whh (const char* fmt, const wide_int &r,
+		     HOST_WIDE_INT o1, HOST_WIDE_INT o2)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), o1, o2);
+}
+
+void
+wide_int::debug_wv (const char* fmt, const wide_int &r, int v0)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), v0);
+}
+
+void
+wide_int::debug_wvv (const char* fmt, const wide_int &r,
+		     int v0, int v1)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), v0, v1);
+}
+
+void
+wide_int::debug_wvvv (const char* fmt, const wide_int &r,
+		      int v0, int v1, int v2)
+{
+  char buf0[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), v0, v1, v2);
+}
+
+void
+wide_int::debug_wvww (const char* fmt, const wide_int &r, int v0,
+		      const wide_int &o0, const wide_int &o1)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  char buf2[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), v0,
+	   o0.dump (buf1), o1.dump (buf2));
+}
+
+void
+wide_int::debug_ww (const char* fmt, const wide_int &r,
+		    const wide_int &o0)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), o0.dump (buf1));
+}
+
+void
+wide_int::debug_wwv (const char* fmt, const wide_int &r,
+		     const wide_int &o0, int v0)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), o0.dump (buf1), v0);
+}
+
+void
+wide_int::debug_wwvvs (const char* fmt, const wide_int &r, 
+		       const wide_int &o0, int v0, int v1,
+		       const char *s)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0), o0.dump (buf1), v0, v1, s);
+}
+
+void
+wide_int::debug_wwwvv (const char* fmt, const wide_int &r,
+		       const wide_int &o0, const wide_int &o1,
+		       int v0, int v1)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  char buf2[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2), v0, v1);
+}
+
+void
+wide_int::debug_www (const char* fmt, const wide_int &r,
+		     const wide_int &o0, const wide_int &o1)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  char buf2[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2));
+}
+
+void
+wide_int::debug_wwww (const char* fmt, const wide_int &r,
+		      const wide_int &o0, const wide_int &o1,
+		      const wide_int &o2)
+{
+  char buf0[MAX_SIZE];
+  char buf1[MAX_SIZE];
+  char buf2[MAX_SIZE];
+  char buf3[MAX_SIZE];
+  fprintf (dump_file, fmt, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2), o2.dump (buf3));
+}
+#endif
+
diff --git a/gcc/wide-int.h b/gcc/wide-int.h
new file mode 100644
index 0000000..fdd1ddb
--- /dev/null
+++ b/gcc/wide-int.h
@@ -0,0 +1,2340 @@ 
+/* Operations with very long integers.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef WIDE_INT_H
+#define WIDE_INT_H
+
+/* A wide integer is currently represented as a vector of
+   HOST_WIDE_INTs.  The vector contains enough elements to hold a
+   value of MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT which is
+   a derived for each host target combination.  The values are stored
+   in the vector with the least signicant HOST_BITS_PER_WIDE_INT bits
+   of the value stored in element 0.
+
+   A wide_int contains four fields: the vector (VAL), the bitsize,
+   precision and a length, (LEN).  The length is the number of HWIs
+   needed to represent the value.
+
+   Since most integers used in a compiler are small values, it is
+   generally profitable to use a representation of the value that is
+   shorter than the modes precision.  LEN is used to indicate the
+   number of elements of the vector that are in use.  When LEN *
+   HOST_BITS_PER_WIDE_INT < the precision, the value has been
+   compressed.  The values of the elements of the vector greater than
+   LEN - 1. are all equal to the highest order bit of LEN.
+
+   The representation does not contain any information about
+   signedness of the represented value, so it can be used to represent
+   both signed and unsigned numbers.  For operations where the results
+   depend on signedness (division, comparisons), the signedness must
+   be specified separately.  For operations where the signness
+   matters, one of the operands to the operation specifies either
+   wide_int::SIGNED or wide_int::UNSIGNED.
+
+   All constructors for wide_int take either a bitsize and precision,
+   an enum machine_mode or tree_type.  */
+
+
+#ifndef GENERATOR_FILE
+#include "tree.h"
+#include "hwint.h"
+#include "options.h"
+#include "tm.h"
+#include "insn-modes.h"
+#include "machmode.h"
+#include "double-int.h"
+#include <gmp.h>
+#include "insn-modes.h"
+#include "dumpfile.h"
+
+#define DEBUG_WIDE_INT
+
+class wide_int {
+  /* Internal representation.  */
+  
+  /* VAL is set to a size that is capable of computing a full
+     multiplication on the largest mode that is represented on the
+     target.  The full multiplication is use by tree-vrp.  tree-vpn
+     currently does a 2x largest mode by 2x largest mode yielding a 4x
+     largest mode result.  If operations are added that require larger
+     buffers, then VAL needs to be changed.  */
+  HOST_WIDE_INT val[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
+  unsigned short len;
+  unsigned int bitsize;
+  unsigned int precision;
+
+ public:
+  enum ShiftOp {
+    NONE,
+    /* There are two uses for the wide-int shifting functions.  The
+       first use is as an emulation of the target hardware.  The
+       second use is as service routines for other optimizations.  The
+       first case needs to be identified by passing TRUNC as the value
+       of ShiftOp so that shift amount is properly handled according to the
+       SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
+       amount is always truncated by the bytesize of the mode of
+       THIS.  */
+    TRUNC
+  };
+
+  enum SignOp {
+    /* Many of the math functions produce different results depending
+       on if they are SIGNED or UNSIGNED.  In general, there are two
+       different functions, whose names are prefixed with an 'S" and
+       or an 'U'.  However, for some math functions there is also a
+       routine that does not have the prefix and takes an SignOp
+       parameter of SIGNED or UNSIGNED.  */
+    SIGNED,
+    UNSIGNED
+  };
+
+  /* Conversions.  */
+
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize, 
+				    unsigned int precision);
+  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize, 
+			     unsigned int precision, bool *overflow);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, unsigned int bitsize, 
+				    unsigned int precision);
+  static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, unsigned int bitsize, 
+			     unsigned int precision, bool *overflow);
+
+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type);
+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type, 
+				   bool *overflow);
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, enum machine_mode mode);
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, enum machine_mode mode, 
+				    bool *overflow);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, 
+				    enum machine_mode mode);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, 
+				    enum machine_mode mode, 
+				    bool *overflow);
+  static wide_int from_array (const HOST_WIDE_INT* op0,
+			      unsigned int len,
+			      unsigned int bitsize, 
+			      unsigned int precision); 
+  inline static wide_int from_array (const HOST_WIDE_INT* op0,
+				     unsigned int len,
+				     enum machine_mode mode);
+  inline static wide_int from_array (const HOST_WIDE_INT* op0,
+				     unsigned int len,
+				     const_tree type);
+
+  static wide_int from_double_int (double_int, 
+				   unsigned int bitsize, 
+				   unsigned int precision);
+  inline static wide_int from_double_int (double_int, enum machine_mode);
+  static wide_int from_tree (const_tree);
+  static wide_int from_tree_as_infinite_precision (const_tree tcst, 
+						   unsigned int bitsize, 
+						   unsigned int precision);
+  static wide_int from_rtx (const_rtx, enum machine_mode);
+
+  inline HOST_WIDE_INT to_shwi () const;
+  inline HOST_WIDE_INT to_shwi (unsigned int prec) const;
+  inline unsigned HOST_WIDE_INT to_uhwi () const;
+  inline unsigned HOST_WIDE_INT to_uhwi (unsigned int prec) const;
+
+  /* Largest and smallest values that are represented in modes or precisions.  */
+
+  static wide_int max_value (unsigned int prec, unsigned int bitsize, 
+			     unsigned int precision, SignOp sgn);
+  inline static wide_int max_value (unsigned int bitsize, 
+				    unsigned int precision, SignOp sgn);
+  inline static wide_int max_value (const_tree type);
+  inline static wide_int max_value (enum machine_mode mode, SignOp sgn);
+  
+  static wide_int min_value (unsigned int prec, unsigned int bitsize, 
+			     unsigned int precision, SignOp sgn);
+  inline static wide_int min_value (unsigned int bitsize, 
+				    unsigned int precision, SignOp sgn);
+  inline static wide_int min_value (const_tree type);
+  inline static wide_int min_value (enum machine_mode mode, SignOp sgn);
+  
+  /* Small constants */
+
+  inline static wide_int minus_one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int minus_one (const_tree type);
+  inline static wide_int minus_one (enum machine_mode mode);
+  inline static wide_int minus_one (const wide_int &op1);
+  inline static wide_int zero (unsigned int bitsize, unsigned int prec);
+  inline static wide_int zero (const_tree type);
+  inline static wide_int zero (enum machine_mode mode);
+  inline static wide_int zero (const wide_int &op1);
+  inline static wide_int one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int one (const_tree type);
+  inline static wide_int one (enum machine_mode mode);
+  inline static wide_int one (const wide_int &op1);
+  inline static wide_int two (unsigned int bitsize, unsigned int prec);
+  inline static wide_int two (const_tree type);
+  inline static wide_int two (enum machine_mode mode);
+  inline static wide_int two (const wide_int &op1);
+
+  /* Accessors.  */
+
+  inline unsigned short get_len () const;
+  inline unsigned int get_bitsize () const;
+  inline unsigned int get_precision () const;
+  inline HOST_WIDE_INT elt (unsigned int i) const;
+
+  /* Printing functions.  */
+
+  void print_dec (char *buf, SignOp sgn) const;
+  void print_dec (FILE *file, SignOp sgn) const;
+  void print_decs (char *buf) const;
+  void print_decs (FILE *file) const;
+  void print_decu (char *buf) const;
+  void print_decu (FILE *file) const;
+  void print_hex (char *buf) const;
+  void print_hex (FILE *file) const;
+
+  /* Comparative functions.  */
+
+  inline bool minus_one_p () const;
+  inline bool zero_p () const;
+  inline bool one_p () const;
+  inline bool neg_p () const;
+
+  inline bool operator == (const wide_int &y) const;
+  inline bool operator != (const wide_int &y) const;
+  inline bool gt_p (HOST_WIDE_INT x, SignOp sgn) const;
+  inline bool gt_p (const wide_int &x, SignOp sgn) const;
+  inline bool gts_p (HOST_WIDE_INT y) const;
+  inline bool gts_p (const wide_int &y) const;
+  inline bool gtu_p (unsigned HOST_WIDE_INT y) const;
+  inline bool gtu_p (const wide_int &y) const;
+
+  inline bool lt_p (const HOST_WIDE_INT x, SignOp sgn) const;
+  inline bool lt_p (const wide_int &x, SignOp sgn) const;
+  inline bool lts_p (HOST_WIDE_INT y) const;
+  inline bool lts_p (const wide_int &y) const;
+  inline bool ltu_p (unsigned HOST_WIDE_INT y) const;
+  inline bool ltu_p (const wide_int &y) const;
+  inline int cmp (const wide_int &y, SignOp sgn) const;
+  inline int cmps (const wide_int &y) const;
+  inline int cmpu (const wide_int &y) const;
+
+  bool only_sign_bit_p (unsigned int prec) const;
+  inline bool only_sign_bit_p () const;
+  inline bool fits_uhwi_p () const;
+  inline bool fits_shwi_p () const;
+  bool fits_to_tree_p (const_tree type) const;
+  bool fits_u_p (unsigned int prec) const;
+  bool fits_s_p (unsigned int prec) const;
+
+  /* Min and max */
+
+  inline wide_int min (const wide_int &op1, SignOp sgn) const;
+  inline wide_int max (const wide_int &op1, SignOp sgn) const;
+  inline wide_int smin (const wide_int &op1) const;
+  inline wide_int smax (const wide_int &op1) const;
+  inline wide_int umin (const wide_int &op1) const;
+  inline wide_int umax (const wide_int &op1) const;
+
+  /* Extension, these do not change the precision or bitsize.  */
+
+  inline wide_int ext (unsigned int offset, SignOp sgn) const;
+  wide_int sext (unsigned int offset) const;
+  wide_int zext (unsigned int offset) const;
+
+  /* Make a fast copy.  */
+
+  wide_int copy () const;
+
+  /* These change the underlying bitsize and precision.  */
+  
+  wide_int force_to_size (unsigned int bitsize, unsigned int precision, 
+			  SignOp sgn) const;
+  inline wide_int force_to_size (enum machine_mode mode, SignOp sgn) const;
+  inline wide_int force_to_size (const_tree type) const;
+  inline wide_int force_to_size (const_tree type, SignOp sgn) const;
+
+  inline wide_int sforce_to_size (enum machine_mode mode) const;
+  inline wide_int sforce_to_size (const_tree type) const;
+  inline wide_int zforce_to_size (enum machine_mode mode) const;
+  inline wide_int zforce_to_size (const_tree type) const;
+
+  /* Masking, and Insertion  */
+
+  wide_int set_bit (unsigned int bitpos) const;
+  static wide_int set_bit_in_zero (unsigned int, 
+				   unsigned int bitsize, 
+				   unsigned int prec);
+  inline static wide_int set_bit_in_zero (unsigned int, 
+					  enum machine_mode mode);
+  inline static wide_int set_bit_in_zero (unsigned int, const_tree type);
+  wide_int insert (const wide_int &op0, unsigned int offset,
+		   unsigned int width) const;
+  static wide_int mask (unsigned int start, bool negate, 
+			unsigned int bitsize, unsigned int prec);
+  inline static wide_int mask (unsigned int start, bool negate, 
+			       enum machine_mode mode);
+  inline static wide_int mask (unsigned int start, bool negate,
+			       const_tree type);
+  wide_int bswap () const;
+  static wide_int shifted_mask (unsigned int start, unsigned int width,
+				bool negate,
+				unsigned int bitsize, unsigned int prec);
+  inline static wide_int shifted_mask (unsigned int start, unsigned int width, 
+				       bool negate, enum machine_mode mode);
+  inline static wide_int shifted_mask (unsigned int start, unsigned int width, 
+				       bool negate, const_tree type);
+  inline HOST_WIDE_INT sign_mask () const;
+
+  /* Logicals */
+
+  inline wide_int operator & (const wide_int &y) const;
+  inline wide_int and_not (const wide_int &y) const;
+  inline wide_int operator ~ () const;
+  inline wide_int operator | (const wide_int &y) const;
+  inline wide_int or_not (const wide_int &y) const;
+  inline wide_int operator ^ (const wide_int &y) const;
+
+  /* Arithmetic operation functions, alpha sorted.  */
+
+  wide_int abs () const;
+  inline wide_int operator + (const wide_int &y) const;
+  wide_int add (const wide_int &x, SignOp sgn, bool *overflow) const;
+  wide_int clz (unsigned int bitsize, unsigned int prec) const;
+  int clz () const;
+  wide_int clrsb (unsigned int bitsize, unsigned int prec) const;
+  int clrsb () const;
+  wide_int ctz (unsigned int bitsize, unsigned int prec) const;
+  int ctz () const;
+  int exact_log2 () const;
+  int floor_log2 () const;
+  wide_int ffs () const;
+  inline wide_int operator * (const wide_int &y) const;
+  wide_int mul (const wide_int &x, SignOp sgn, bool *overflow) const;
+  inline wide_int smul (const wide_int &x, bool *overflow) const;
+  inline wide_int umul (const wide_int &x, bool *overflow) const;
+  wide_int mul_full (const wide_int &x, SignOp sgn) const;
+  inline wide_int umul_full (const wide_int &x) const;
+  inline wide_int smul_full (const wide_int &x) const;
+  wide_int mul_high (const wide_int &x, SignOp sgn) const;
+  inline wide_int neg () const;
+  inline wide_int neg (bool *overflow) const;
+  wide_int parity (unsigned int bitsize, unsigned int prec) const;
+  int popcount () const;
+  wide_int popcount (unsigned int bitsize, unsigned int prec) const;
+  inline wide_int operator - (const wide_int &y) const;
+  wide_int sub (const wide_int &x, SignOp sgn, bool *overflow) const;
+
+  /* Divison and mod.  These are the ones that are actually used, but
+     there are a lot of them.  */
+
+  wide_int div_trunc (const wide_int &divisor, SignOp sgn) const;
+  wide_int div_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int sdiv_trunc (const wide_int &divisor) const;
+  inline wide_int udiv_trunc (const wide_int &divisor) const;
+
+  wide_int div_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int udiv_floor (const wide_int &divisor) const;
+  inline wide_int sdiv_floor (const wide_int &divisor) const;
+  wide_int div_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  wide_int div_round (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+
+  wide_int divmod_trunc (const wide_int &divisor, wide_int *mod, SignOp sgn) const;
+  inline wide_int sdivmod_trunc (const wide_int &divisor, wide_int *mod) const;
+  inline wide_int udivmod_trunc (const wide_int &divisor, wide_int *mod) const;
+
+  wide_int divmod_floor (const wide_int &divisor, wide_int *mod, SignOp sgn) const;
+  inline wide_int sdivmod_floor (const wide_int &divisor, wide_int *mod) const;
+
+  wide_int mod_trunc (const wide_int &divisor, SignOp sgn) const;
+  wide_int mod_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int smod_trunc (const wide_int &divisor) const;
+  inline wide_int umod_trunc (const wide_int &divisor) const;
+
+  wide_int mod_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int umod_floor (const wide_int &divisor) const;
+  wide_int mod_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  wide_int mod_round (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+
+  /* Shifting rotating and extracting.  */
+  HOST_WIDE_INT extract_to_hwi (int offset, int width) const;
+
+  inline wide_int lshift (const wide_int &y, ShiftOp z = NONE) const;
+  inline wide_int lshift (unsigned int y, ShiftOp z, unsigned int bitsize, 
+			  unsigned int precision) const;
+  inline wide_int lshift (unsigned int y, ShiftOp z = NONE) const;
+
+  inline wide_int lrotate (const wide_int &y) const;
+  wide_int lrotate (unsigned int y) const;
+
+  inline wide_int rshift (const wide_int &y, SignOp sgn, ShiftOp z = NONE) const;
+  inline wide_int rshiftu (const wide_int &y, ShiftOp z = NONE) const;
+  inline wide_int rshiftu (unsigned int y, ShiftOp z = NONE) const;
+  inline wide_int rshifts (const wide_int &y, ShiftOp z = NONE) const;
+  inline wide_int rshifts (unsigned int y, ShiftOp z = NONE) const;
+
+  inline wide_int rrotate (const wide_int &y) const;
+  wide_int rrotate (unsigned int y) const;
+
+  static const int DUMP_MAX = (2 * (MAX_BITSIZE_MODE_ANY_INT / 4
+			       + MAX_BITSIZE_MODE_ANY_INT 
+				    / HOST_BITS_PER_WIDE_INT + 32));
+  char *dump (char* buf) const;
+ private:
+
+  /* 
+   * Internal versions that do the work if the values do not fit in a
+   * HWI.
+   */ 
+
+  /* Comparisons */
+  bool eq_p_large (const wide_int &op1) const;
+  bool lts_p_large (const wide_int &op1) const;
+  int cmps_large (const wide_int &op1) const;
+  bool ltu_p_large (const wide_int &op1) const;
+  int cmpu_large (const wide_int &op1) const;
+
+  /* Logicals.  */
+  wide_int and_large (const wide_int &op1) const;
+  wide_int and_not_large (const wide_int &y) const;
+  wide_int or_large (const wide_int &y) const;
+  wide_int or_not_large (const wide_int &y) const;
+  wide_int xor_large (const wide_int &y) const;
+
+  /* Arithmetic */
+  wide_int add_large (const wide_int &op1) const;
+  wide_int sub_large (const wide_int &op1) const;
+
+  wide_int lshift_large (unsigned int cnt, 
+			 unsigned int bs, unsigned int res_prec) const;
+  wide_int rshiftu_large (unsigned int cnt) const;
+  wide_int rshifts_large (unsigned int cnt) const;
+
+  static wide_int
+    mul_internal (bool high, bool full, 
+		  const wide_int *op1, const wide_int *op2,
+		  wide_int::SignOp sgn,  bool *overflow, bool needs_overflow);
+  static void
+    divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient, 
+		       unsigned HOST_HALF_WIDE_INT *b_remainder,
+		       unsigned HOST_HALF_WIDE_INT *b_dividend, 
+		       unsigned HOST_HALF_WIDE_INT *b_divisor, 
+		       int m, int n);
+  static wide_int
+    divmod_internal (bool compute_quotient, 
+		     const wide_int *dividend, const wide_int *divisor,
+		     wide_int::SignOp sgn, wide_int *remainder,
+		     bool compute_remainder, 
+		     bool *overflow);
+
+
+  /* Private utility routines.  */
+  wide_int decompress (unsigned int target, unsigned int bitsize, 
+		       unsigned int precision) const;
+  void canonize ();
+  static inline int trunc_shift (unsigned int bitsize, int cnt);
+  static inline int trunc_shift (unsigned int bitsize, const wide_int &cnt, ShiftOp z);
+
+#ifdef DEBUG_WIDE_INT
+  /* Debugging routines.  */
+  static void debug_vw  (const char* fmt, int r, const wide_int& o0);
+  static void debug_vwh (const char* fmt, int r, const wide_int &o0,
+			 HOST_WIDE_INT o1);
+  static void debug_vww (const char* fmt, int r, const wide_int &o0,
+			 const wide_int &o1);
+  static void debug_wh (const char* fmt, const wide_int &r,
+			 HOST_WIDE_INT o1);
+  static void debug_whh (const char* fmt, const wide_int &r,
+			 HOST_WIDE_INT o1, HOST_WIDE_INT o2);
+  static void debug_wv (const char* fmt, const wide_int &r, int v0);
+  static void debug_wvv (const char* fmt, const wide_int &r, int v0,
+			 int v1);
+  static void debug_wvvv (const char* fmt, const wide_int &r, int v0,
+			  int v1, int v2);
+  static void debug_wvww (const char* fmt, const wide_int &r, int v0,
+			  const wide_int &o0, const wide_int &o1);
+  static void debug_wwv (const char* fmt, const wide_int &r,
+			 const wide_int &o0, int v0);
+  static void debug_wwvvs (const char* fmt, const wide_int &r, 
+			   const wide_int &o0, 
+			   int v0, int v1, const char *s);
+  static void debug_wwwvv (const char* fmt, const wide_int &r,
+			   const wide_int &o0, const wide_int &o1,
+			   int v0, int v1);
+  static void debug_ww (const char* fmt, const wide_int &r,
+			const wide_int &o0);
+  static void debug_www (const char* fmt, const wide_int &r,
+			 const wide_int &o0, const wide_int &o1);
+  static void debug_wwww (const char* fmt, const wide_int &r,
+			  const wide_int &o0, const wide_int &o1, 
+			  const wide_int &o2);
+#endif
+};
+
+/* Insert a 1 bit into 0 at BITPOS producing an number with bitsize
+   and precision taken from MODE.  */
+
+wide_int
+wide_int::set_bit_in_zero (unsigned int bitpos, enum machine_mode mode)
+{
+  return wide_int::set_bit_in_zero (bitpos, GET_MODE_BITSIZE (mode),
+				    GET_MODE_PRECISION (mode));
+}
+
+/* Insert a 1 bit into 0 at BITPOS producing an number with bitsize
+   and precision taken from TYPE.  */
+
+wide_int
+wide_int::set_bit_in_zero (unsigned int bitpos, const_tree type)
+{
+
+  return wide_int::set_bit_in_zero (bitpos, 
+				    GET_MODE_BITSIZE (TYPE_MODE (type)),
+				    TYPE_PRECISION (type));
+}
+
+/* Return a result mask where the lower WIDTH bits are ones and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.   The result is made with bitsize
+   and precision taken from MODE.  */
+
+wide_int
+wide_int::mask (unsigned int width, bool negate, enum machine_mode mode)
+{
+  return wide_int::mask (width, negate, 
+			 GET_MODE_BITSIZE (mode),
+			 GET_MODE_PRECISION (mode));
+}
+
+/* Return a result mask where the lower WIDTH bits are ones and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  The result is made with bitsize
+   and precision taken from TYPE.  */
+
+wide_int
+wide_int::mask (unsigned int width, bool negate, const_tree type)
+{
+
+  return wide_int::mask (width, negate, 
+			 GET_MODE_BITSIZE (TYPE_MODE (type)),
+			 TYPE_PRECISION (type));
+}
+
+/* Return a result mask of WIDTH ones starting at START and the bits
+   above that up to the precision are zeros.  The result is inverted
+   if NEGATE is true.  The result is made with bitsize and precision
+   taken from MODE.  */
+
+wide_int
+wide_int::shifted_mask (unsigned int start, unsigned int width, 
+			bool negate, enum machine_mode mode)
+{
+  return wide_int::shifted_mask (start, width, negate, 
+				 GET_MODE_BITSIZE (mode),
+				 GET_MODE_PRECISION (mode));
+}
+
+/* Return a result mask of WIDTH ones starting at START and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  The result is made with bitsize
+   and precision taken from TYPE.  */
+
+wide_int
+wide_int::shifted_mask (unsigned int start, unsigned int width, 
+			bool negate, const_tree type)
+{
+
+  return wide_int::shifted_mask (start, width, negate, 
+				 GET_MODE_BITSIZE (TYPE_MODE (type)),
+				 TYPE_PRECISION (type));
+}
+
+/* Produce 0 or -1 that is the smear of the sign bit.  */
+
+HOST_WIDE_INT
+wide_int::sign_mask () const
+{
+  int i = len - 1;
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    return ((val[0] << (HOST_BITS_PER_WIDE_INT - precision))
+	    >> (HOST_BITS_PER_WIDE_INT - 1));
+
+  /* VRP appears to be badly broken and this is a very ugly fix.  */
+  if (i >= 0)
+    return val[i] >> (HOST_BITS_PER_WIDE_INT - 1);
+
+  gcc_unreachable ();
+#if 0
+  return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1);
+#endif
+}
+
+/* Conversions */
+
+/* Convert OP0 into a wide_int with parameters taken from TYPE.  */
+
+wide_int
+wide_int::from_hwi (HOST_WIDE_INT op0, const_tree type)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  if (TYPE_UNSIGNED (type))
+    return wide_int::from_uhwi (op0, bitsize, prec);
+  else
+    return wide_int::from_shwi (op0, bitsize, prec);
+}
+
+/* Convert OP0 into a wide_int with parameters taken from TYPE.  If
+   the value does not fit, set OVERFLOW.  */
+
+wide_int
+wide_int::from_hwi (HOST_WIDE_INT op0, const_tree type, 
+		    bool *overflow)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  if (TYPE_UNSIGNED (type))
+    return wide_int::from_uhwi (op0, bitsize, prec, overflow);
+  else
+    return wide_int::from_shwi (op0, bitsize, prec, overflow);
+}
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, unsigned int bitsize, unsigned int precision)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    op0 = sext_hwi (op0, precision);
+
+  result.val[0] = op0;
+  result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wh ("wide_int::from_uhwi %s = " HOST_WIDE_INT_PRINT_HEX "\n", 
+	      result, op0);
+#endif
+
+  return result;
+}
+
+/* Convert signed OP0 into a wide_int with parameters taken from
+   MODE.  */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, enum machine_mode mode)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_shwi (op0, bitsize, prec);
+}
+
+/* Convert signed OP0 into a wide_int with parameters taken from
+   MODE. If the value does not fit, set OVERFLOW. */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, enum machine_mode mode, 
+	   bool *overflow)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_shwi (op0, bitsize, prec, overflow);
+}
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0,
+		     unsigned int bitsize, unsigned int precision)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    op0 = zext_hwi (op0, precision);
+
+  result.val[0] = op0;
+
+  /* If the top bit is a 1, we need to add another word of 0s since
+     that would not expand the right value since the infinite
+     expansion of any unsigned number must have 0s at the top.  */
+  if ((HOST_WIDE_INT)op0 < 0 && precision > HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[1] = 0;
+      result.len = 2;
+    }
+  else
+    result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wh ("wide_int::from_uhwi %s = " HOST_WIDE_INT_PRINT_HEX "\n", 
+	      result, op0);
+#endif
+
+  return result;
+}
+
+/* Convert unsigned OP0 into a wide_int with parameters taken from
+   MODE.  */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0, enum machine_mode mode)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_uhwi (op0, bitsize, prec);
+}
+
+/* Convert unsigned OP0 into a wide_int with parameters taken from
+   MODE. If the value does not fit, set OVERFLOW. */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0, enum machine_mode mode, 
+		     bool *overflow)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_uhwi (op0, bitsize, prec, overflow);
+}
+
+/* Return THIS as a signed HOST_WIDE_INT.  If THIS does not fit in
+   PREC, the information is lost. */
+
+HOST_WIDE_INT 
+wide_int::to_shwi (unsigned int prec) const
+{
+  HOST_WIDE_INT result;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    result = sext_hwi (val[0], prec);
+  else
+    result = val[0];
+
+  return result;
+}
+
+/* Return THIS as a signed HOST_WIDE_INT.  If THIS is too large for
+   the mode's precision, the information is lost. */
+
+HOST_WIDE_INT 
+wide_int::to_shwi () const
+{
+  return to_shwi (precision);
+}
+
+/* Return THIS as an unsigned HOST_WIDE_INT.  If THIS does not fit in
+   PREC, the information is lost. */
+
+unsigned HOST_WIDE_INT 
+wide_int::to_uhwi (unsigned int prec) const
+{
+  HOST_WIDE_INT result;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    result = zext_hwi (val[0], prec);
+  else
+    result = val[0];
+
+  return result;
+}
+
+/* Return THIS as an unsigned HOST_WIDE_INT.  If THIS is too large for
+   the mode's precision, the information is lost. */
+
+unsigned HOST_WIDE_INT 
+wide_int::to_uhwi () const
+{
+  return to_uhwi (precision);
+}
+
+
+/* Convert OP0 of LEN into a wide_int with parameters taken from
+   MODE. If the value does not fit, set OVERFLOW. */
+
+wide_int
+wide_int::from_array (const HOST_WIDE_INT* op0, unsigned int len, 
+		      enum machine_mode mode)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_array (op0, len, bitsize, prec);
+}
+
+/* Convert OP0 of LEN into a wide_int with parameters taken from
+   MODE. If the value does not fit, set OVERFLOW. */
+
+wide_int
+wide_int::from_array (const HOST_WIDE_INT* op0, unsigned int len, 
+		      const_tree type)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  return wide_int::from_array (op0, len, bitsize, prec);
+}
+
+/* Convert double_int OP0 into a wide_int with parameters taken from
+   MODE.  */
+
+wide_int
+wide_int::from_double_int (double_int op0, enum machine_mode mode)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_double_int (op0, bitsize, prec);
+}
+
+/* Min and Max value helpers.  */
+
+/* Produce the largest SGNed number that is represented in PRECISION.
+   The result is represented in BITSIZE and PRECISION.  SGN must be
+   SIGNED or UNSIGNED.  */
+
+wide_int
+wide_int::max_value (unsigned int bitsize, unsigned int precision, 
+		     SignOp sgn)
+{
+  return max_value (precision, bitsize, precision, sgn);
+}
+  
+/* Produce the largest number that is represented in MODE. The
+   bitsize and precision are taken from mode.  SGN must be SIGNED or
+   UNSIGNED.  */
+
+wide_int
+wide_int::max_value (enum machine_mode mode, SignOp sgn)
+{
+  unsigned int prec = GET_MODE_PRECISION (mode);
+  return max_value (prec, GET_MODE_BITSIZE (mode), prec, sgn);
+}
+
+/* Produce the largest number that is represented in TYPE. The
+   bitsize and precision and sign are taken from TYPE.  */
+
+wide_int
+wide_int::max_value (const_tree type)
+{
+  unsigned int prec = TYPE_PRECISION (type);
+  return max_value (prec, GET_MODE_BITSIZE (TYPE_MODE (type)), 
+		    prec, 
+		    TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED);
+}
+
+/* Produce the smallest SGNed number that is represented in PRECISION.
+   The result is represented in BITSIZE and PRECISION.  SGN must be
+   SIGNED or UNSIGNED.  */
+
+wide_int
+wide_int::min_value (unsigned int bitsize, unsigned int precision, 
+		     SignOp sgn)
+{
+  return min_value (precision, bitsize, precision, sgn);
+}
+  
+/* Produce the smallest number that is represented in MODE. The
+   bitsize and precision are taken from mode.  SGN must be SIGNED or
+   UNSIGNED.  */
+
+wide_int
+wide_int::min_value (enum machine_mode mode, SignOp sgn)
+{
+  unsigned int prec = GET_MODE_PRECISION (mode);
+  return min_value (prec, GET_MODE_BITSIZE (mode), prec, sgn);
+}
+
+/* Produce the smallest number that is represented in TYPE. The
+   bitsize and precision and sign are taken from TYPE.  */
+
+wide_int
+wide_int::min_value (const_tree type)
+{
+  unsigned int prec = TYPE_PRECISION (type);
+  return min_value (prec, GET_MODE_BITSIZE (TYPE_MODE (type)), 
+		    prec, 
+		    TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED);
+}
+
+
+/* Small constants.  */
+
+/* Return a wide int of -1 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::minus_one (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (-1, bs, prec);
+}
+
+/* Return a wide int of -1 with TYPE.  */
+
+wide_int
+wide_int::minus_one (const_tree type)
+{
+  return wide_int::from_shwi (-1, TYPE_MODE (type));
+}
+
+/* Return a wide int of -1 with MODE.  */
+
+wide_int
+wide_int::minus_one (enum machine_mode mode)
+{
+  return wide_int::from_shwi (-1, mode);
+}
+
+/* Return a wide int of -1 that is the same size as op1.  */
+
+wide_int
+wide_int::minus_one (const wide_int &op1)
+{
+  return wide_int::from_shwi (-1, op1.get_bitsize (), op1.get_precision ());
+}
+
+
+/* Return a wide int of 0 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::zero (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (0, bs, prec);
+}
+
+/* Return a wide int of 0 with TYPE.  */
+
+wide_int
+wide_int::zero (const_tree type)
+{
+  return wide_int::from_shwi (0, TYPE_MODE (type));
+}
+
+/* Return a wide int of 0 with MODE.  */
+
+wide_int
+wide_int::zero (enum machine_mode mode)
+{
+  return wide_int::from_shwi (0, mode);
+}
+
+/* Return a wide int of 0 that is the same size as op1.  */
+
+wide_int
+wide_int::zero (const wide_int &op1)
+{
+  return wide_int::from_shwi (0, op1.get_bitsize (), op1.get_precision ());
+}
+
+
+/* Return a wide int of 1 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::one (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (1, bs, prec);
+}
+
+/* Return a wide int of 1 with TYPE.  */
+
+wide_int
+wide_int::one (const_tree type)
+{
+  return wide_int::from_shwi (1, TYPE_MODE (type));
+}
+
+/* Return a wide int of 1 with MODE.  */
+
+wide_int
+wide_int::one (enum machine_mode mode)
+{
+  return wide_int::from_shwi (1, mode);
+}
+
+/* Return a wide int of 1 that is the same size as op1.  */
+
+wide_int
+wide_int::one (const wide_int &op1)
+{
+  return wide_int::from_shwi (1, op1.get_bitsize (), op1.get_precision ());
+}
+
+
+/* Return a wide int of 2 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::two (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (2, bs, prec);
+}
+
+/* Return a wide int of 2 with TYPE.  */
+
+wide_int
+wide_int::two (const_tree type)
+{
+  return wide_int::from_shwi (2, TYPE_MODE (type));
+}
+
+/* Return a wide int of 2 with MODE.  */
+
+wide_int
+wide_int::two (enum machine_mode mode)
+{
+  return wide_int::from_shwi (2, mode);
+}
+
+/* Return a wide int of 2 that is the same size as op1.  */
+
+wide_int
+wide_int::two (const wide_int &op1)
+{
+  return wide_int::from_shwi (2, op1.get_bitsize (), op1.get_precision ());
+}
+
+/* Public accessors for the interior of a wide int.  */
+
+/* Get the number of host wide ints actually represented within the
+   wide int.  */
+
+unsigned short
+wide_int::get_len () const
+{
+  return len;
+}
+
+/* Get bitsize of the value represented within the wide int.  */
+
+unsigned int
+wide_int::get_bitsize () const
+{
+  return bitsize;
+}
+
+/* Get precision of the value represented within the wide int.  */
+
+unsigned int
+wide_int::get_precision () const
+{
+  return precision;
+}
+
+/* Get a particular element of the wide int.  */
+
+HOST_WIDE_INT
+wide_int::elt (unsigned int i) const
+{
+  return i >= len ? sign_mask () : val[i];
+}
+
+/* Return true if THIS is -1.  */
+
+bool
+wide_int::minus_one_p () const
+{
+  return len == 1 && val[0] == (HOST_WIDE_INT)-1;
+}
+
+/* Return true if THIS is 0.  */
+
+bool
+wide_int::zero_p () const
+{
+  return len == 1 && val[0] == 0;
+}
+
+/* Return true if THIS is 1.  */
+
+bool
+wide_int::one_p () const
+{
+  return len == 1 && val[0] == 1;
+}
+
+/* Return true if THIS is negative.  */
+
+bool
+wide_int::neg_p () const
+{
+  return sign_mask () != 0;
+}
+
+/*
+ * Comparisons, note that only equality is an operator.  The other
+ * comparisons cannot be operators since they are inherently signed or
+ * unsigned and C++ has no such operators.
+ */
+
+/* Return true if THIS == OP1.  */
+
+bool
+wide_int::operator == (const wide_int &op1) const
+{
+  bool result;
+
+  gcc_assert (precision == op1.precision);
+
+  if (this == &op1)
+    {
+      result = true;
+      goto ex;
+    }
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      unsigned HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << precision) - 1;
+      result = (val[0] & mask) == (op1.val[0] & mask);
+      goto ex;
+    }
+
+  if (precision == HOST_BITS_PER_WIDE_INT)
+    {
+      result = val[0] == op1.val[0];
+      goto ex;
+    }
+
+  result = eq_p_large (op1);
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int:: %d = (%s == %s)\n", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS is not equal to OP1. */ 
+
+bool
+wide_int::operator != (const wide_int &op1) const
+{
+  return !(*this == op1);
+}  
+
+/* Return true if THIS is less than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::lt_p (const wide_int &op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return lts_p (op1);
+  else
+    return ltu_p (op1);
+}  
+
+/* Return true if THIS is greater than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::gt_p (HOST_WIDE_INT op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return gts_p (op1);
+  else
+    return gtu_p (op1);
+}  
+
+/* Return true if THIS is greater than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::gt_p (const wide_int &op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return op1.lts_p (*this);
+  else
+    return op1.ltu_p (*this);
+}  
+
+/* Return true if THIS is signed greater than OP1.  */
+
+bool
+wide_int::gts_p (const wide_int &op1) const
+{
+  return op1.lts_p (*this);
+}  
+
+/* Return true if THIS > OP1 using signed comparisons.  */
+
+bool
+wide_int::gts_p (const HOST_WIDE_INT op1) const
+{
+  bool result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      /* The values are already logically sign extended.  */
+      result = val[0] > sext_hwi (op1, precision);
+      goto ex;
+    }
+  
+  result = !neg_p ();
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int:: %d = (%s gts_p 0x"HOST_WIDE_INT_PRINT_HEX")\n", 
+	       result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS is unsigned greater than OP1.  */
+
+bool
+wide_int::gtu_p (const wide_int &op1) const
+{
+  return op1.ltu_p (*this);
+}  
+
+/* Return true if THIS > OP1 using unsigned comparisons.  */
+
+bool
+wide_int::gtu_p (const unsigned HOST_WIDE_INT op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  bool result;
+
+  if (precision < HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      x0 = zext_hwi (val[0], precision);
+      x1 = zext_hwi (op1, precision);
+
+      result = x0 > x1;
+    }
+  else
+    result = true;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int:: %d = (%s gtu_p 0x"HOST_WIDE_INT_PRINT_HEX")\n", 
+	       result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS is less than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::lt_p (HOST_WIDE_INT op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return lts_p (op1);
+  else
+    return ltu_p (op1);
+}  
+
+/* Return true if THIS < OP1 using signed comparisons.  */
+
+bool
+wide_int::lts_p (const HOST_WIDE_INT op1) const
+{
+  bool result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      /* The values are already logically sign extended.  */
+      result = val[0] < sext_hwi (op1, precision);
+      goto ex;
+    }
+  
+  result = neg_p ();
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int:: %d = (%s lts_p 0x"HOST_WIDE_INT_PRINT_HEX")\n", 
+	       result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using signed comparisons.  */
+
+bool
+wide_int::lts_p (const wide_int &op1) const
+{
+  bool result;
+
+  gcc_assert (precision == op1.precision);
+
+  if (this == &op1)
+    {
+      result = false;
+      goto ex;
+    }
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      /* The values are already logically sign extended.  */
+      result = val[0] < op1.val[0];
+      goto ex;
+    }
+
+  result = lts_p_large (op1);
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int:: %d = (%s lts_p %s)\n", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using unsigned comparisons.  */
+
+bool
+wide_int::ltu_p (const unsigned HOST_WIDE_INT op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  bool result;
+
+  if (precision < HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      x0 = zext_hwi (val[0], precision);
+      x1 = zext_hwi (op1, precision);
+
+      result = x0 < x1;
+    }
+  else
+    result = false;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int:: %d = (%s ltu_p 0x"HOST_WIDE_INT_PRINT_HEX")\n", 
+	       result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using unsigned comparisons.  */
+
+bool
+wide_int::ltu_p (const wide_int &op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  bool result;
+
+  gcc_assert (precision == op1.precision);
+
+  if (this == &op1)
+    {
+      result = false;
+      goto ex;
+    }
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	{
+	  x0 = zext_hwi (val[0], precision);
+	  x1 = zext_hwi (op1.val[0], precision);
+	}
+      else
+	{
+	  x0 = val[0];
+	  x1 = op1.val[0];
+	}
+
+      result = x0 < x1;
+      goto ex;
+    }
+
+  result = ltu_p_large (op1);
+
+ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int:: %d = (%s ltu_p %s)\n", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return -1 0 or 1 depending on how THIS compares with OP1.  Signness
+   is indicated by OP.  */
+
+int
+wide_int::cmp (const wide_int &op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return cmps (op1);
+  else
+    return cmpu (op1);
+}  
+
+
+/* Returns -1 if THIS < OP1, 0 if THIS == OP1 and 1 if A > OP1 using
+   signed compares.  */
+
+int
+wide_int::cmps (const wide_int &op1) const
+{
+  int result;
+
+  gcc_assert (precision == op1.precision);
+
+  if (this == &op1)
+    {
+      result = 0;
+      goto ex;
+    }
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      /* The values are already logically sign extended.  */
+      if (val[0] < op1.val[0])
+	{
+	  result = -1;
+	  goto ex;
+	}
+      if (val[0] > op1.val[0])
+	{
+	  result = 1;
+	  goto ex;
+	}
+    }
+
+  result = cmps_large (op1);
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int:: %d = (%s cmps %s)\n", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Returns -1 if THIS < OP1, 0 if THIS == OP1 and 1 if A > OP1 using
+   unsigned compares.  */
+
+int
+wide_int::cmpu (const wide_int &op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  int result;
+
+  gcc_assert (precision == op1.precision);
+
+  if (this == &op1)
+    {
+      result = 0;
+      goto ex;
+    }
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	{
+	  x0 = zext_hwi (val[0], precision);
+	  x1 = zext_hwi (op1.val[0], precision);
+	}
+      else
+	{
+	  x0 = val[0];
+	  x1 = op1.val[0];
+	}
+
+      if (x0 < x1)
+	result = -1;
+      else if (x0 == x1)
+	result = 0;
+      else
+	result = 1;
+      goto ex;
+    }
+
+  result = cmpu_large (op1);
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int:: %d = (%s cmpu %s)\n", result, *this, op1);
+#endif
+
+  return result;
+}
+
+
+/* Return the signed or unsigned min of THIS and OP1. */
+
+wide_int
+wide_int::min (const wide_int &op1, SignOp sgn) const
+{
+  if (sgn == SIGNED)
+    return lts_p (op1) ? (*this) : op1;
+  else
+    return ltu_p (op1) ? (*this) : op1;
+}  
+
+/* Return the signed or unsigned max of THIS and OP1. */
+
+wide_int
+wide_int::max (const wide_int &op1, SignOp sgn) const
+{
+  if (sgn == SIGNED)
+    return gts_p (op1) ? (*this) : op1;
+  else
+    return gtu_p (op1) ? (*this) : op1;
+}  
+
+/* Return the signed min of THIS and OP1. */
+
+wide_int
+wide_int::smin (const wide_int &op1) const
+{
+  return lts_p (op1) ? (*this) : op1;
+}  
+
+/* Return the signed max of THIS and OP1. */
+
+wide_int
+wide_int::smax (const wide_int &op1) const
+{
+  return gts_p (op1) ? (*this) : op1;
+}  
+
+/* Return the unsigned min of THIS and OP1. */
+
+wide_int
+wide_int::umin (const wide_int &op1) const
+{
+  return ltu_p (op1) ? (*this) : op1;
+}  
+
+/* Return the unsigned max of THIS and OP1. */
+
+wide_int
+wide_int::umax (const wide_int &op1) const
+{
+  return gtu_p (op1) ? (*this) : op1;
+}  
+
+/* Return true if THIS has the sign bit set to 1 and all other bits are
+   zero.  */
+
+bool
+wide_int::only_sign_bit_p () const
+{
+  return only_sign_bit_p (precision);
+}
+
+/* Return true if THIS fits in a HOST_WIDE_INT with no loss of
+   precision.  */
+
+bool
+wide_int::fits_shwi_p () const
+{
+  return len == 1;
+}
+
+/* Return true if THIS fits in an unsigned HOST_WIDE_INT with no loss
+   of precision.  */
+
+bool
+wide_int::fits_uhwi_p () const
+{
+  return len == 1 
+    || (len == 2 && val[1] == 0);
+}
+
+/* Return THIS extended to PREC.  The signness of the extension is
+   specified by OP.  */
+
+wide_int 
+wide_int::ext (unsigned int prec, SignOp z) const
+{
+  if (z == UNSIGNED)
+    return zext (prec);
+  else
+    return sext (prec);
+}
+
+/* Return THIS forced to the bitsize and precision of TYPE.  If this
+   is extension, the sign is set by SGN. */
+
+wide_int 
+wide_int::force_to_size (enum machine_mode mode, SignOp sgn) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return force_to_size (bitsize, prec, sgn);
+}
+
+/* Return THIS forced to the bitsize, precision and sign of TYPE.  If
+   this is extension, the sign is set by SGN. */
+
+wide_int 
+wide_int::force_to_size (const_tree type) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+  SignOp sgn = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
+
+  return force_to_size (bitsize, prec, sgn);
+}
+
+/* Return THIS zero extended to the bitsize and precision of TYPE but
+   extends using SGN.  If this is extension, the sign is set by
+   SGN. */
+
+wide_int 
+wide_int::force_to_size (const_tree type, SignOp sgn) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  return force_to_size (bitsize, prec, sgn);
+}
+
+/* Return THIS forced to the bitsize and precision of TYPE.  If this
+   is extension, it is signed. */
+
+wide_int 
+wide_int::sforce_to_size (enum machine_mode mode) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return force_to_size (bitsize, prec, SIGNED);
+}
+
+/* Return THIS zero extended to the bitsize and precision of TYPE but
+   extends using SGN.  If this is extension, it is signed. */
+
+wide_int 
+wide_int::sforce_to_size (const_tree type) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  return force_to_size (bitsize, prec, SIGNED);
+}
+
+/* Return THIS forced to the bitsize and precision of TYPE.  If this
+   is extension, it is unsigned. */
+
+wide_int 
+wide_int::zforce_to_size (enum machine_mode mode) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return force_to_size (bitsize, prec, UNSIGNED);
+}
+
+/* Return THIS zero extended to the bitsize and precision of TYPE but
+   extends using SGN.  If this is extension, it is unsigned. */
+
+wide_int 
+wide_int::zforce_to_size (const_tree type) const
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  return force_to_size (bitsize, prec, UNSIGNED);
+}
+
+/*
+ * Logicals.
+ */
+
+/* Return THIS & OP1.  */
+
+wide_int
+wide_int::operator & (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] & op1.val[0];
+    }
+  else
+    result = and_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s & %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+
+/* Return THIS & ~OP1.  */
+
+wide_int
+wide_int::and_not (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] & ~op1.val[0];
+    }
+  else
+    result = and_not_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s &~ %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+
+/* Return the logical negation (bitwise complement) of THIS.  */
+
+wide_int
+wide_int::operator ~ () const
+{
+  wide_int result;
+  int l0 = len - 1;
+
+  result.len = len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = ~val[l0];
+      l0--;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_ww ("wide_int:: %s = (~ %s)\n", result, *this);
+#endif
+  return result;
+}
+
+/* Return THIS | OP1.  */
+
+wide_int
+wide_int::operator | (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] | op1.val[0];
+    }
+  else
+    result = or_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s | %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+
+/* Return THIS | ~OP1.  */
+
+wide_int
+wide_int::or_not (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] | ~op1.val[0];
+    }
+  else
+    result = or_not_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s |~ %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+
+/* Return THIS ^ OP1.  */
+
+wide_int
+wide_int::operator ^ (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] ^ op1.val[0];
+    }
+  else
+    result = xor_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s ^ %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+/*
+ * Integer arithmetic
+ */
+
+/* Return THIS + OP1.  */
+
+wide_int
+wide_int::operator + (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] + op1.val[0];
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	result.val[0] = sext_hwi (result.val[0], precision);
+    }
+  else
+    result = add_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s + %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Multiply THIS and OP1.  The result is the same precision as the
+   operands, so there is no reason for signed or unsigned
+   versions.  */
+
+wide_int
+wide_int::operator * (const wide_int &op1) const
+{
+  bool overflow = false;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      wide_int result;
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] * op1.val[0];
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	result.val[0] = sext_hwi (result.val[0], precision);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_www ("wide_int:: %s = (%s * %s)\n", result, *this, op1);
+#endif
+      return result;
+    }
+  else
+    return mul_internal (false, false, this, &op1, UNSIGNED, &overflow, false);
+}
+
+/* Negate THIS.  */
+
+wide_int
+wide_int::neg () const
+{
+  wide_int z = wide_int::from_shwi (0, bitsize, precision);
+  return z - *this;
+}
+
+/* Negate THIS.  Set overflow if the value cannot be negated.  */
+
+wide_int
+wide_int::neg (bool *overflow) const
+{
+  wide_int z = wide_int::from_shwi (0, bitsize, precision);
+  if (only_sign_bit_p ())
+    *overflow = true;
+
+  return z - *this;
+}
+
+/* Return THIS - OP1.  */
+
+wide_int
+wide_int::operator - (const wide_int &op1) const
+{
+  wide_int result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.val[0] = val[0] - op1.val[0];
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	result.val[0] = sext_hwi (result.val[0], precision);
+    }
+  else
+    result = sub_large (op1);
+  
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int:: %s = (%s - %s)\n", result, *this, op1);
+#endif
+  return result;
+}
+
+
+/* Signed multiply THIS and OP1.  The result is the same precision as
+   the operands.  OVERFLOW is set true if the result overflows.  */
+
+wide_int
+wide_int::smul (const wide_int &x, bool *overflow) const
+{
+  return mul (x, SIGNED, overflow);
+}
+
+/* Unsigned multiply THIS and OP1.  The result is the same precision
+   as the operands.  OVERFLOW is set true if the result overflows.  */
+
+wide_int
+wide_int::umul (const wide_int &x, bool *overflow) const
+{
+  return mul (x, UNSIGNED, overflow);
+}
+
+/* Signed multiply THIS and OP1.  The result is twice the precision as
+   the operands.  */
+
+wide_int
+wide_int::smul_full (const wide_int &x) const
+{
+  return mul_full (x, SIGNED);
+}
+
+/* Unsigned multiply THIS and OP1.  The result is twice the precision
+   as the operands.  */
+
+wide_int
+wide_int::umul_full (const wide_int &x) const
+{
+  return mul_full (x, UNSIGNED);
+}
+
+/* Signed divide with truncation of result.  */
+
+wide_int
+wide_int::sdiv_trunc (const wide_int &divisor) const
+{
+  return div_trunc (divisor, SIGNED);
+}
+
+/* Unsigned divide with truncation of result.  */
+
+wide_int
+wide_int::udiv_trunc (const wide_int &divisor) const
+{
+  return div_trunc (divisor, UNSIGNED);
+}
+
+/* Unsigned divide with floor truncation of result.  */
+
+wide_int
+wide_int::udiv_floor (const wide_int &divisor) const
+{
+  bool overflow;
+
+  return div_floor (divisor, UNSIGNED, &overflow);
+}
+
+/* Signed divide with floor truncation of result.  */
+
+wide_int
+wide_int::sdiv_floor (const wide_int &divisor) const
+{
+  bool overflow;
+
+  return div_floor (divisor, SIGNED, &overflow);
+}
+
+/* Signed divide/mod with truncation of result.  */
+
+wide_int
+wide_int::sdivmod_trunc (const wide_int &divisor, wide_int *mod) const
+{
+  return divmod_trunc (divisor, mod, SIGNED);
+}
+
+/* Unsigned divide/mod with truncation of result.  */
+
+wide_int
+wide_int::udivmod_trunc (const wide_int &divisor, wide_int *mod) const
+{
+  return divmod_trunc (divisor, mod, UNSIGNED);
+}
+
+/* Signed divide/mod with floor truncation of result.  */
+
+wide_int
+wide_int::sdivmod_floor (const wide_int &divisor, wide_int *mod) const
+{
+  return divmod_floor (divisor, mod, SIGNED);
+}
+
+/* Signed mod with truncation of result.  */
+
+wide_int
+wide_int::smod_trunc (const wide_int &divisor) const
+{
+  return mod_trunc (divisor, SIGNED);
+}
+
+/* Unsigned mod with truncation of result.  */
+
+wide_int
+wide_int::umod_trunc (const wide_int &divisor) const
+{
+  return mod_trunc (divisor, UNSIGNED);
+}
+
+/* Unsigned mod with floor truncation of result.  */
+
+wide_int
+wide_int::umod_floor (const wide_int &divisor) const
+{
+  bool overflow;
+
+  return mod_floor (divisor, UNSIGNED, &overflow);
+}
+
+/* If SHIFT_COUNT_TRUNCATED is defined, truncate CNT.   
+
+   At first look, the shift truncation code does not look right.
+   Shifts (and rotates) are done according to the precision of the
+   mode but the shift count is truncated according to the bitsize
+   of the mode.   This is how real hardware works.
+
+   On an ideal machine, like Knuth's mix machine, a shift count is a
+   word long and all of the bits of that word are examined to compute
+   the shift amount.  But on real hardware, especially on machines
+   with fast (single cycle shifts) that takes too long.  On these
+   machines, the amount of time to perform a shift dictates the cycle
+   time of the machine so corners are cut to keep this fast.  A
+   comparison of an entire 64 bit word would take something like 6
+   gate delays before the shifting can even start.
+
+   So real hardware only looks at a small part of the shift amount.
+   On ibm machines, this tends to be 1 more than what is necessary to
+   encode the shift amount.  The rest of the world looks at only the
+   minimum number of bits.  This means that only 3 gate delays are
+   necessary to set up the shifter.
+
+   On the other hand, right shifts and rotates must be according to
+   the precision or the operation does not make any sense.   */
+inline int
+wide_int::trunc_shift (unsigned int bitsize, int cnt)
+{
+#ifdef SHIFT_COUNT_TRUNCATED
+  cnt = cnt & (bitsize - 1);
+#endif
+  return cnt;
+}
+
+/* This function is called in two contexts.  If OP == TRUNC, this
+   function provides a count that matches the semantics of the target
+   machine depending on the value of SHIFT_COUNT_TRUNCATED.  Note that
+   if SHIFT_COUNT_TRUNCATED is not defined, this function may produce
+   -1 as a value if the shift amount is greater than the bitsize of
+   the mode.  -1 is a surrogate for a very large amount.
+
+   If OP == NONE, then this function always truncates the shift value
+   to the bitsize because this shifting operation is a function that
+   is internal to GCC.  */
+
+inline int
+wide_int::trunc_shift (unsigned int bitsize, const wide_int &cnt, ShiftOp z)
+{
+  if (z == TRUNC)
+    {
+#ifdef SHIFT_COUNT_TRUNCATED
+      return cnt.val[0] & (bitsize - 1);
+#else
+      if (cnt.ltu (bitsize))
+	return cnt.val[0] & (bitsize - 1);
+      else 
+	return -1;
+#endif
+    }
+  else
+    return cnt.val[0] & (bitsize - 1);
+}
+
+/* Left shift by an integer Y.  See the definition of Op.TRUNC for how
+   to set Z.  */
+
+wide_int
+wide_int::lshift (unsigned int y, ShiftOp z) const
+{
+  return lshift (y, z, bitsize, precision);
+}
+
+/* Left shifting by an wide_int shift amount.  See the definition of
+   Op.TRUNC for how to set Z.  */
+
+wide_int
+wide_int::lshift (const wide_int &y, ShiftOp z) const
+{
+  if (z == TRUNC)
+    {
+      HOST_WIDE_INT shift = trunc_shift (bitsize, y, TRUNC);
+      if (shift == -1)
+	return wide_int::zero (bitsize, precision);
+      return lshift (shift, NONE, bitsize, precision);
+    }
+  else
+    return lshift (trunc_shift (bitsize, y, NONE), NONE, bitsize, precision);
+}
+
+/* Left shift THIS by CNT.  See the definition of Op.TRUNC for how to
+   set Z.  Since this is used internally, it has the ability to
+   specify the BISIZE and PRECISION independently.  This is useful
+   when inserting a small value into a larger one.  */
+
+wide_int
+wide_int::lshift (unsigned int cnt, ShiftOp op, 
+		  unsigned int bs, unsigned int res_prec) const
+{
+  wide_int result;
+
+  if (op == TRUNC)
+    cnt = trunc_shift (bs, cnt);
+
+  /* Handle the simple case quickly.   */
+  if (res_prec <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.bitsize = bs;
+      result.precision = res_prec;
+      result.len = 1;
+      result.val[0] = val[0] << cnt;
+    }
+  else
+    result = lshift_large (cnt, bs, res_prec);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s << %d)\n", result, *this, cnt);
+#endif
+
+  return result;
+}
+
+/* Rotate THIS left by Y within its precision.  */
+
+wide_int
+wide_int::lrotate (const wide_int &y) const
+{
+  return lrotate (y.val[0]);
+}
+
+
+/* Unsigned right shift by Y.  See the definition of Op.TRUNC for how
+   to set Z.  */
+
+wide_int
+wide_int::rshiftu (const wide_int &y, ShiftOp z) const
+{
+  if (z == TRUNC)
+    {
+      HOST_WIDE_INT shift = trunc_shift (bitsize, y, TRUNC);
+      if (shift == -1)
+	return wide_int::zero (bitsize, precision);
+      return rshiftu (shift, NONE);
+    }
+  else
+    return rshiftu (trunc_shift (bitsize, y, NONE), NONE);
+}
+
+/* Right shift THIS by Y.  SGN indicates the sign.  Z indicates the
+   truncation option.  */
+
+wide_int
+wide_int::rshift (const wide_int &y, SignOp sgn, ShiftOp z) const
+{
+  if (sgn == UNSIGNED)
+    return rshiftu (y, z);
+  else
+    return rshifts (y, z);
+}
+
+/* Unsigned right shift THIS by CNT.  See the definition of Op.TRUNC
+   for how to set Z.  */
+
+wide_int
+wide_int::rshiftu (unsigned int cnt, ShiftOp trunc_op) const
+{
+  wide_int result;
+
+  if (trunc_op == TRUNC)
+    cnt = trunc_shift (bitsize, cnt);
+
+  if (cnt == 0)
+    result = copy ();
+  
+  else if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      /* Handle the simple case quickly.   */
+      unsigned HOST_WIDE_INT x = val[0];
+
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.len = 1;
+
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	x = zext_hwi (x, precision);
+
+      result.val[0] = x >> cnt;
+    }
+  else 
+    result = rshiftu_large (cnt);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s >>U %d)\n", result, *this, cnt);
+#endif
+  return result;
+}
+
+/* Signed right shift by Y.  See the definition of Op.TRUNC for how to
+   set Z.  */
+wide_int
+wide_int::rshifts (const wide_int &y, ShiftOp z) const
+{
+  if (z == TRUNC)
+    {
+      HOST_WIDE_INT shift = trunc_shift (bitsize, y, TRUNC);
+      if (shift == -1)
+	{
+	  /* The value of the shift was larger than the bitsize and this
+	     machine does not truncate the value, so the result is
+	     a smeared sign bit.  */
+	  if (neg_p ())
+	    return wide_int::minus_one (bitsize, precision);
+	  else
+	    return wide_int::zero (bitsize, precision);
+	}
+      return rshifts (shift, NONE);
+    }
+  else
+    return rshifts (trunc_shift (bitsize, y, NONE), NONE);
+}
+
+/* Signed right shift THIS by CNT.  See the definition of Op.TRUNC for
+   how to set Z.  */
+
+wide_int
+wide_int::rshifts (unsigned int cnt, ShiftOp trunc_op) const
+{
+  wide_int result;
+
+  if (trunc_op == TRUNC)
+    cnt = trunc_shift (bitsize, cnt);
+
+  if (cnt == 0)
+    result = copy ();
+  else if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      /* Handle the simple case quickly.   */
+      HOST_WIDE_INT x = val[0];
+
+      result.bitsize = bitsize;
+      result.precision = precision;
+      result.len = 1;
+      result.val[0] = x >> cnt;
+    }
+  else
+    result = rshifts_large (cnt);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:: %s = (%s >>S %d)\n", result, *this, cnt);
+#endif
+  return result;
+}
+
+/* Rotate THIS right by Y within its precision.  */
+
+wide_int
+wide_int::rrotate (const wide_int &y) const
+{
+  return rrotate (y.val[0]);
+}
+
+/* tree related routines.  */
+
+extern tree wide_int_to_tree (tree type, const wide_int &cst);
+extern tree wide_int_to_infinite_tree (tree type, const wide_int &cst, 
+				       unsigned int prec);
+extern tree force_fit_type_wide (tree, const wide_int &, int, bool);
+#if 0
+extern wide_int mem_ref_offset (const_tree);
+#endif
+
+/* Conversion to and from GMP integer representations.  */
+
+void mpz_set_wide_int (mpz_t, wide_int, bool);
+wide_int mpz_get_wide_int (const_tree, mpz_t, bool);
+#endif /* GENERATOR FILE */
+
+#endif /* WIDE_INT_H */