diff mbox

patch to fix constant math - 4th patch - the wide-int class.

Message ID 50743E2B.6000104@naturalbridge.com
State New
Headers show

Commit Message

Kenneth Zadeck Oct. 9, 2012, 3:09 p.m. UTC
This patch implements the wide-int class.    this is a more general 
version of the double-int class and is meant to be the eventual 
replacement for that class.    The use of this class removes all 
dependencies of the host from the target compiler's integer math.

I have made all of the changes i agreed to in the earlier emails. In 
particular, this class internally maintains a bitsize and precision but 
not a mode.     The class now is neutral about modes and tree-types.    
the functions that take modes or tree-types are just convenience 
functions that translate the parameters into bitsize and precision and 
where ever there is a call that takes a mode, there is a corresponding 
call that takes a tree-type.

All of the little changes that richi suggested have also been made.

The buffer sizes is now twice the size needed by the largest integer 
mode.     This gives enough room for tree-vrp to do full multiplies on 
any type that the target supports.

Tested on x86-64.

This patch depends on the first three patches.   I am still waiting on 
final approval on the hwint.h patch.

Ok to commit?

kenny
2012-10-9  Kenneth Zadeck <zadeck@naturalbridge.com>

	* wide-int.c: New file containing implementation of wide_int class.
	* wide-int.h: New file containing public spec for wide_int class.

Comments

Richard Biener Oct. 23, 2012, 2:12 p.m. UTC | #1
On Tue, Oct 9, 2012 at 5:09 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> This patch implements the wide-int class.    this is a more general version
> of the double-int class and is meant to be the eventual replacement for that
> class.    The use of this class removes all dependencies of the host from
> the target compiler's integer math.
>
> I have made all of the changes i agreed to in the earlier emails. In
> particular, this class internally maintains a bitsize and precision but not
> a mode.     The class now is neutral about modes and tree-types.    the
> functions that take modes or tree-types are just convenience functions that
> translate the parameters into bitsize and precision and where ever there is
> a call that takes a mode, there is a corresponding call that takes a
> tree-type.
>
> All of the little changes that richi suggested have also been made.
>
> The buffer sizes is now twice the size needed by the largest integer mode.
> This gives enough room for tree-vrp to do full multiplies on any type that
> the target supports.
>
> Tested on x86-64.
>
> This patch depends on the first three patches.   I am still waiting on final
> approval on the hwint.h patch.
>
> Ok to commit?

diff --git a/gcc/wide-int.h b/gcc/wide-int.h
new file mode 100644
index 0000000..efd2c01
--- /dev/null
+++ b/gcc/wide-int.h
...
+#ifndef GENERATOR_FILE

The whole file is guarded with that ... why?  That is bound to be fragile once
use of wide-int spreads?  How do generator programs end up including
this file if they don't need it at all?

+#include "tree.h"
+#include "hwint.h"
+#include "options.h"
+#include "tm.h"
+#include "insn-modes.h"
+#include "machmode.h"
+#include "double-int.h"
+#include <gmp.h>
+#include "insn-modes.h"
+

That's a lot of tree and rtl dependencies.  double-int.h avoids these by
placing conversion routines in different headers or by only resorting to
types in coretypes.h.  Please try to reduce the above to a minimum.

+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];

are we sure this rounds properly?  Consider a port with max byte mode
size 4 on a 64bit host.

I still would like to have the ability to provide specializations of wide_int
for "small" sizes, thus ideally wide_int would be a template templated
on the number of HWIs in val.  Interface-wise wide_int<2> should be
identical to double_int, thus we should be able to do

typedef wide_int<2> double_int;

in double-int.h and replace its implementation with a specialization of
wide_int.  Due to a number of divergences (double_int is not a subset
of wide_int) that doesn't seem easily possible (one reason is the
ShiftOp and related enums you use).  Of course wide_int is not a
template either.  For the hypotetical embedded target above we'd
end up using wide_int<1>, a even more trivial specialization.

I realize again this wide-int is not what your wide-int is (because you
add a precision member).  Still factoring out the "common"s of
wide-int and double-int into a wide_int_raw <> template should be
possible.

+class wide_int {
+  /* Internal representation.  */
+
+  /* VAL is set to a size that is capable of computing a full
+     multiplication on the largest mode that is represented on the
+     target.  The full multiplication is use by tree-vrp.  If
+     operations are added that require larger buffers, then VAL needs
+     to be changed.  */
+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
+  unsigned short len;
+  unsigned int bitsize;
+  unsigned int precision;

The len, bitsize and precision members need documentation.  At least
one sounds redundant.

+ public:
+  enum ShiftOp {
+    NONE,
NONE is never a descriptive name ... I suppose this is for arithmetic vs.
logical shifts?
+    /* There are two uses for the wide-int shifting functions.  The
+       first use is as an emulation of the target hardware.  The
+       second use is as service routines for other optimizations.  The
+       first case needs to be identified by passing TRUNC as the value
+       of ShiftOp so that shift amount is properly handled according to the
+       SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
+       amount is always truncated by the bytesize of the mode of
+       THIS.  */
+    TRUNC

ah, no, it's for SHIFT_COUNT_TRUNCATED.  "mode of THIS"?  Now
it's precision I suppose.  That said, handling SHIFT_COUNT_TRUNCATED
in wide-int sounds over-engineered, the caller should be responsible
of applying SHIFT_COUNT_TRUNCATED when needed.

+  enum SignOp {
+    /* Many of the math functions produce different results depending
+       on if they are SIGNED or UNSIGNED.  In general, there are two
+       different functions, whose names are prefixed with an 'S" and
+       or an 'U'.  However, for some math functions there is also a
+       routine that does not have the prefix and takes an SignOp
+       parameter of SIGNED or UNSIGNED.  */
+    SIGNED,
+    UNSIGNED
+  };

double-int and _all_ of the rest of the middle-end uses a 'int uns' parameter
for this.  _Please_ follow that.  Otherwise if you communicate between
those interfaces you have to to uns ? UNSIGNED : SIGNED and
signop == UNSIGNED ? 1 : 0 all over the place.

+  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
+                            unsigned int precision);
+  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
+                            unsigned int precision, bool *overflow);

I suppose , bool *overflow = NULL would do as well?  What's the
distinction between bitsize and precision (I suppose, see the above
question)?  I suppose precision <= bitsize and bits above precision
are sign/zero extended (and the rest of the val[] array contains undefined
content?)?  But we also have 'len', which then matches bitsize (well
it may be larger).  So IMHO either bitsize or len is redundant.  At least
the tree level nowhere considers partial integer modes special this way
(only the precision is ever taken into account, but we always sign-/zero-extend
to the whole double-int - thus 'len' in wide-int terms).

+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type);
+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type,
+                                  bool *overflow);

Are you needing this overload or are you adding it for "completeness"?
Because this interface is wrong(TM), and whoever calls it has at least
cleanup opportunities ... from_tree or from_rtx makes sense.

Also in which cases do you need "overflow"?  a HWI always fits in
a wide-int!  All trees and rtxen do, too.  You seem to merge two
operations here, conversion to wide-int and truncation / extension.
That doesn't look like a clean interface to me.

+  static wide_int from_double_int (enum machine_mode, double_int);

the choice of passing a mode seems arbitrary (the natural interface
would be nothing - precision is 2 * HWI).  Passing it as first parameter
is even more strange to me ;)

+  static wide_int from_tree (const_tree);
+  static wide_int from_rtx (const_rtx, enum machine_mode);

+  HOST_WIDE_INT to_shwi (unsigned int prec) const;

See above - merges two basic operations.  You should write

 w.sext (prec).to_shwi ()

instead (I _suppose_ it should sign-extend, should it? ;)).  Btw, why
don't we need to always specify bitsize together with precision in all
the places?  (not that I am arguing for it, I'm arguing for the
removal of bitsize)

+  static wide_int max_value (unsigned int bitsize, unsigned int prec,
SignOp sgn);

now that I am seeing this - is there any restriction on how the precision
of a partial integer mode may differ from its bitsize?  Can we have
POImode with 1 bit precision?  I suppose the solution for all this is
that when converting a wide-int to a RTX with a mode then we need to
zero-/sign-extend to the modes bitsize (and wide-int only cares about
precision).  Eventually a set_len can adjust the amount of BITS_PER_UNITs
we fill with meaningful values if needed.  Otherwise len == precision
/ BITS_PER_UNIT (rounded to HWI for obvious practical reasons).

+  inline static wide_int minus_one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int minus_one (const_tree type);
+  inline static wide_int minus_one (enum machine_mode mode);
+  inline static wide_int zero (unsigned int bitsize, unsigned int prec);
+  inline static wide_int zero (const_tree type);
+  inline static wide_int zero (enum machine_mode mode);
+  inline static wide_int one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int one (const_tree type);
+  inline static wide_int one (enum machine_mode mode);
+  inline static wide_int two (unsigned int bitsize, unsigned int prec);
+  inline static wide_int two (const_tree type);
+  inline static wide_int two (enum machine_mode mode);
+  inline static wide_int ten (unsigned int bitsize, unsigned int prec);
+  inline static wide_int ten (const_tree type);
+  inline static wide_int ten (enum machine_mode mode);

wheeee .... ;)

What's wrong with from_uhwi (10, ...)?  double-int has the above for
compatibility reasons only.  And why should I care about type/mode/size
for something as simple as '1'?

+  inline unsigned short get_len () const;
+  inline unsigned int get_bitsize () const;
+  inline unsigned int get_precision () const;
+  inline unsigned int get_full_len () const;

not sure which air you are pulling full_len from ;)

+  wide_int force_to_size (unsigned int bitsize,
+                         unsigned int precision) const;

or rather 'trunc'?  Seems to be truncation and set_len combined?

I wonder if for the various ways to specify precision/len there is a nice C++
way of moving this detail out of wide-int.  I can think only of one:

struct WIntSpec {
  WIntSpec (unsigned int len, unsigned int precision);
  WIntSpec (const_tree);
  WIntSpec (enum machine_mode);
  unsigned int len;
  unsigned int precision;
};

and then (sorry to pick one of the less useful functions):

  inline static wide_int zero (WIntSpec)

which you should be able to call like

  wide_int::zero (SImode)
  wide_int::zero (integer_type_node)

and (ugly)

  wide_int::zero (WIntSpec (32, 32))

with C++0x wide_int::zero ({32, 32}) should be possible?  Or we keep
the precision overload.  At least providing the WIntSpec abstraction
allows custom ways of specifying required bits to not pollute wide-int
itself too much.  Lawrence?

+  /* Printing functions.  */
+
+  void print_dec (char *buf, SignOp sgn) const;
+  void print_dec (FILE *file, SignOp sgn) const;
+  void print_decs (char *buf) const;
+  void print_decs (FILE *file) const;
+  void print_decu (char *buf) const;
+  void print_decu (FILE *file) const;
+  void print_hex (char *buf) const;
+  void print_hex (FILE *file) const;

consider moving them to standalone functions, out of wide-int.h

+  inline bool minus_one_p () const;
+  inline bool zero_p () const;
+  inline bool one_p () const;
+  inline bool neg_p () const;

what's wrong with w == -1, w == 0, w == 1, etc.?

+  bool only_sign_bit_p (unsigned int prec) const;
+  bool only_sign_bit_p () const;

what's that?  Some of the less obvious functions should be documented
in the header I think.  Smells of combining two things again here.
Either wide-int has an intrinsic precision or it has not ... (like double-int).

+  bool fits_u_p (unsigned int prec) const;
+  bool fits_s_p (unsigned int prec) const;

See above.

+  /* Extension  */
+
+  inline wide_int ext (unsigned int offset, SignOp sgn) const;
+  wide_int sext (unsigned int offset) const;
+  wide_int sext (enum machine_mode mode) const;
+  wide_int zext (unsigned int offset) const;
+  wide_int zext (enum machine_mode mode) const;

'offset'?  I suppose that's 'precision'.  Does that alter the
precision of *this?
I think it should (and thus there should be no set_precision function).
If it doesn't alter precision the functions don't make much sense to me.

+  wide_int set_bit (unsigned int bitpos) const;

this kind of interface is strange.  You call it like w.set_bit (1) but it
doesn't actually set bit 1 in w but it constructs a new wide_int and
returns that.  So I suppose it should be

  wide_int with_bit_set (unsigned int bitpos) const;

or similar.  Or simply have a mutating set_bit.  Or leave it out entierly,
we cannot have many uses of this kind of weird interface.

similar comments for the rest.

.... rest skipped ...

+                                   / HOST_BITS_PER_WIDE_INT + 32));
+  char *dump (char* buf) const;
+ private:
+
+  /* Private utility routines.  */
+  wide_int decompress (unsigned int target, unsigned int bitsize,
+                      unsigned int precision) const;
+  static wide_int add_overflow (const wide_int *op0, const wide_int *op1,
+                               wide_int::SignOp sgn, bool *overflow);
+  static wide_int sub_overflow (const wide_int *op0, const wide_int *op1,
+                               wide_int::SignOp sgn, bool *overflow);
+};


IMHO way too many functions for a well tested initial implementation.
There are a lot of things that seem operation compositions.  Is your
concern efficiency here?  That would be bad as that means wide_ints
are too heavy weight.

Can you use gcov to see which functions have (how much) coverage?

Thanks,
Richard.



> kenny
>
>
Kenneth Zadeck Oct. 23, 2012, 4:12 p.m. UTC | #2
On 10/23/2012 10:12 AM, Richard Biener wrote:
> On Tue, Oct 9, 2012 at 5:09 PM, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> This patch implements the wide-int class.    this is a more general version
>> of the double-int class and is meant to be the eventual replacement for that
>> class.    The use of this class removes all dependencies of the host from
>> the target compiler's integer math.
>>
>> I have made all of the changes i agreed to in the earlier emails. In
>> particular, this class internally maintains a bitsize and precision but not
>> a mode.     The class now is neutral about modes and tree-types.    the
>> functions that take modes or tree-types are just convenience functions that
>> translate the parameters into bitsize and precision and where ever there is
>> a call that takes a mode, there is a corresponding call that takes a
>> tree-type.
>>
>> All of the little changes that richi suggested have also been made.
>>
>> The buffer sizes is now twice the size needed by the largest integer mode.
>> This gives enough room for tree-vrp to do full multiplies on any type that
>> the target supports.
>>
>> Tested on x86-64.
>>
>> This patch depends on the first three patches.   I am still waiting on final
>> approval on the hwint.h patch.
>>
>> Ok to commit?
> diff --git a/gcc/wide-int.h b/gcc/wide-int.h
> new file mode 100644
> index 0000000..efd2c01
> --- /dev/null
> +++ b/gcc/wide-int.h
> ...
> +#ifndef GENERATOR_FILE

> The whole file is guarded with that ... why?  That is bound to be fragile once
> use of wide-int spreads?  How do generator programs end up including
> this file if they don't need it at all?
This is so that wide-int can be included at the level of the 
generators.   There some stuff that needs to see this type that is done 
during the build build phase that cannot see the types that are included 
in wide-int.h.
> +#include "tree.h"
> +#include "hwint.h"
> +#include "options.h"
> +#include "tm.h"
> +#include "insn-modes.h"
> +#include "machmode.h"
> +#include "double-int.h"
> +#include <gmp.h>
> +#include "insn-modes.h"
> +
>
> That's a lot of tree and rtl dependencies.  double-int.h avoids these by
> placing conversion routines in different headers or by only resorting to
> types in coretypes.h.  Please try to reduce the above to a minimum.
>
> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
>
> are we sure this rounds properly?  Consider a port with max byte mode
> size 4 on a 64bit host.
I do not believe that this can happen.   The core compiler includes all 
modes up to TI mode, so by default we already up to 128 bits.
> I still would like to have the ability to provide specializations of wide_int
> for "small" sizes, thus ideally wide_int would be a template templated
> on the number of HWIs in val.  Interface-wise wide_int<2> should be
> identical to double_int, thus we should be able to do
>
> typedef wide_int<2> double_int;
If you want to go down this path after the patches get in, go for it.    
I see no use at all for this.
This was not meant to be a plug in replacement for double int. This goal 
of this patch is to get the compiler to do the constant math the way 
that the target does it.   Any such instantiation is by definition 
placing some predefined limit that some target may not want.

> in double-int.h and replace its implementation with a specialization of
> wide_int.  Due to a number of divergences (double_int is not a subset
> of wide_int) that doesn't seem easily possible (one reason is the
> ShiftOp and related enums you use).  Of course wide_int is not a
> template either.  For the hypotetical embedded target above we'd
> end up using wide_int<1>, a even more trivial specialization.
>
> I realize again this wide-int is not what your wide-int is (because you
> add a precision member).  Still factoring out the "common"s of
> wide-int and double-int into a wide_int_raw <> template should be
> possible.
>
> +class wide_int {
> +  /* Internal representation.  */
> +
> +  /* VAL is set to a size that is capable of computing a full
> +     multiplication on the largest mode that is represented on the
> +     target.  The full multiplication is use by tree-vrp.  If
> +     operations are added that require larger buffers, then VAL needs
> +     to be changed.  */
> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
> +  unsigned short len;
> +  unsigned int bitsize;
> +  unsigned int precision;
>
> The len, bitsize and precision members need documentation.  At least
> one sounds redundant.
>
> + public:
> +  enum ShiftOp {
> +    NONE,
> NONE is never a descriptive name ... I suppose this is for arithmetic vs.
> logical shifts?
suggest something
> +    /* There are two uses for the wide-int shifting functions.  The
> +       first use is as an emulation of the target hardware.  The
> +       second use is as service routines for other optimizations.  The
> +       first case needs to be identified by passing TRUNC as the value
> +       of ShiftOp so that shift amount is properly handled according to the
> +       SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
> +       amount is always truncated by the bytesize of the mode of
> +       THIS.  */
> +    TRUNC
>
> ah, no, it's for SHIFT_COUNT_TRUNCATED.  "mode of THIS"?  Now
> it's precision I suppose.  That said, handling SHIFT_COUNT_TRUNCATED
> in wide-int sounds over-engineered, the caller should be responsible
> of applying SHIFT_COUNT_TRUNCATED when needed.
I am fighting all of the modes out.   i will update this patch with more 
cleanups
> +  enum SignOp {
> +    /* Many of the math functions produce different results depending
> +       on if they are SIGNED or UNSIGNED.  In general, there are two
> +       different functions, whose names are prefixed with an 'S" and
> +       or an 'U'.  However, for some math functions there is also a
> +       routine that does not have the prefix and takes an SignOp
> +       parameter of SIGNED or UNSIGNED.  */
> +    SIGNED,
> +    UNSIGNED
> +  };
>
> double-int and _all_ of the rest of the middle-end uses a 'int uns' parameter
> for this.  _Please_ follow that.  Otherwise if you communicate between
> those interfaces you have to to uns ? UNSIGNED : SIGNED and
> signop == UNSIGNED ? 1 : 0 all over the place.
I really do not want to.   What i discovered is that some places in the 
compiler do and some places do not and some places take the reverse 
convention  MNEMONIC is better than NUMERIC.
>
> +  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
> +                            unsigned int precision);
> +  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
> +                            unsigned int precision, bool *overflow);
>
> I suppose , bool *overflow = NULL would do as well?  What's the
> distinction between bitsize and precision (I suppose, see the above
> question)?  I suppose precision <= bitsize and bits above precision
> are sign/zero extended (and the rest of the val[] array contains undefined
> content?)?  But we also have 'len', which then matches bitsize (well
> it may be larger).  So IMHO either bitsize or len is redundant.  At least
> the tree level nowhere considers partial integer modes special this way
> (only the precision is ever taken into account, but we always sign-/zero-extend
> to the whole double-int - thus 'len' in wide-int terms).
Some operations, mostly shifting, needs  both the bitsize and 
precision.   In the early days of the compiler, people pretty much 
ignored the precision and most of the compiler math was done using the 
bitsize.   This made it very painful for the people who supported ports 
that had odd sized modes.   Bernd has been cleaning this up at the rtl 
level and the first 5 patches move that forward.   But you really do 
need both.


> +  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type);
> +  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type,
> +                                  bool *overflow);
>
> Are you needing this overload or are you adding it for "completeness"?
> Because this interface is wrong(TM), and whoever calls it has at least
> cleanup opportunities ... from_tree or from_rtx makes sense.
the functions are actually quite different.    in general overflow 
checking at least doubles the cost of implementation and sometimes it is 
much greater.   Having them be separate cleans up the implementation.


>
> Also in which cases do you need "overflow"?  a HWI always fits in
> a wide-int!  All trees and rtxen do, too.  You seem to merge two
> operations here, conversion to wide-int and truncation / extension.
> That doesn't look like a clean interface to me.
This is the big difference between double_int and wide_int:    I do not 
care if it fits in the underlying representation, i care if it fits in 
the precision of the type.   If the type is for a char and the value is 
100000, overflow is set.   In general setting overflow in the wide-int 
interface means something very different from double-int interface.    A 
large number of places that check for overflow with double in do not 
need to check it for wide int.

> +  static wide_int from_double_int (enum machine_mode, double_int);
>
> the choice of passing a mode seems arbitrary (the natural interface
> would be nothing - precision is 2 * HWI).  Passing it as first parameter
> is even more strange to me ;)
the first part of the question is answered above.   the second part of 
the question was fixed on my private tree a few days ago and will get 
pushed out.
> +  static wide_int from_tree (const_tree);
> +  static wide_int from_rtx (const_rtx, enum machine_mode);
>
> +  HOST_WIDE_INT to_shwi (unsigned int prec) const;
>
> See above - merges two basic operations.  You should write
>
>   w.sext (prec).to_shwi ()
>
> instead (I _suppose_ it should sign-extend, should it? ;)).  Btw, why
> don't we need to always specify bitsize together with precision in all
> the places?  (not that I am arguing for it, I'm arguing for the
> removal of bitsize)
because the bitsize and precision are part of the representation of the 
value.    You only have to specify them on the way into wide-int or if 
you need to change them (this is rare but it does happen).
> +  static wide_int max_value (unsigned int bitsize, unsigned int prec,
> SignOp sgn);
>
> now that I am seeing this - is there any restriction on how the precision
> of a partial integer mode may differ from its bitsize?  Can we have
> POImode with 1 bit precision?  I suppose the solution for all this is
> that when converting a wide-int to a RTX with a mode then we need to
> zero-/sign-extend to the modes bitsize (and wide-int only cares about
> precision).  Eventually a set_len can adjust the amount of BITS_PER_UNITs
> we fill with meaningful values if needed.  Otherwise len == precision
> / BITS_PER_UNIT (rounded to HWI for obvious practical reasons).
the precision must be less than or equal to the bitsize.   That is the 
only restriction.
I do not know if you can have poi1?    Every case that i have seen, the 
partial int is a partial of the next largest power of 2 mode. But there 
is nothing in wide-int that cares about this.

>
> +  inline static wide_int minus_one (unsigned int bitsize, unsigned int prec);
> +  inline static wide_int minus_one (const_tree type);
> +  inline static wide_int minus_one (enum machine_mode mode);
> +  inline static wide_int zero (unsigned int bitsize, unsigned int prec);
> +  inline static wide_int zero (const_tree type);
> +  inline static wide_int zero (enum machine_mode mode);
> +  inline static wide_int one (unsigned int bitsize, unsigned int prec);
> +  inline static wide_int one (const_tree type);
> +  inline static wide_int one (enum machine_mode mode);
> +  inline static wide_int two (unsigned int bitsize, unsigned int prec);
> +  inline static wide_int two (const_tree type);
> +  inline static wide_int two (enum machine_mode mode);
> +  inline static wide_int ten (unsigned int bitsize, unsigned int prec);
> +  inline static wide_int ten (const_tree type);
> +  inline static wide_int ten (enum machine_mode mode);
>
> wheeee .... ;)
yes, and they are all used.
> What's wrong with from_uhwi (10, ...)?  double-int has the above for
> compatibility reasons only.  And why should I care about type/mode/size
> for something as simple as '1'?
the 10 is an interesting case.   at least in my patches it is not used, 
but i had put it in because i started from double-int and it has it.   
However, i believe in fat api's if something gets used a lot, then it 
should be part of the api.    all of them except for 10 are used a lot.

I will point out than in my original patch, these were just macros that 
expanded into from_uhwi, but richi wanted them to be real functions.



> +  inline unsigned short get_len () const;
> +  inline unsigned int get_bitsize () const;
> +  inline unsigned int get_precision () const;
> +  inline unsigned int get_full_len () const;
>
> not sure which air you are pulling full_len from ;)
when you need it, you need it.   the dwarf writer really needs it, 
because it wants to see all of the words on a multiword value, not just 
the ones that "need" to be represented so that it is easy to read.

I have a big comment on when not to use this in my tree.
> +  wide_int force_to_size (unsigned int bitsize,
> +                         unsigned int precision) const;
>
> or rather 'trunc'?  Seems to be truncation and set_len combined?
why do you think it is only shortening it?
>
> I wonder if for the various ways to specify precision/len there is a nice C++
> way of moving this detail out of wide-int.  I can think only of one:
>
> struct WIntSpec {
>    WIntSpec (unsigned int len, unsigned int precision);
>    WIntSpec (const_tree);
>    WIntSpec (enum machine_mode);
>    unsigned int len;
>    unsigned int precision;
> };
>
> and then (sorry to pick one of the less useful functions):
>
>    inline static wide_int zero (WIntSpec)
It depends on what you have available in your hands when you need to 
call a function.   At the rtl level we almost never have tree types, but 
we have modes.    At the tree level, you almost never have modes.    In 
general the convenience functions take anything and just extract the 
prec and bitsize for you.    But there are several places that need to 
specify the prec and bitsize and so these are now the base primitives.
> which you should be able to call like
>
>    wide_int::zero (SImode)
>    wide_int::zero (integer_type_node)
>
> and (ugly)
>
>    wide_int::zero (WIntSpec (32, 32))
>
> with C++0x wide_int::zero ({32, 32}) should be possible?  Or we keep
> the precision overload.  At least providing the WIntSpec abstraction
> allows custom ways of specifying required bits to not pollute wide-int
> itself too much.  Lawrence?
>
> +  /* Printing functions.  */
> +
> +  void print_dec (char *buf, SignOp sgn) const;
> +  void print_dec (FILE *file, SignOp sgn) const;
> +  void print_decs (char *buf) const;
> +  void print_decs (FILE *file) const;
> +  void print_decu (char *buf) const;
> +  void print_decu (FILE *file) const;
> +  void print_hex (char *buf) const;
> +  void print_hex (FILE *file) const;
>
> consider moving them to standalone functions, out of wide-int.h
I do not see much reason to do this.    They use the internals of 
wide-int and moving them somewhere else is just exposing the internals 
for no real reason.
>
> +  inline bool minus_one_p () const;
> +  inline bool zero_p () const;
> +  inline bool one_p () const;
> +  inline bool neg_p () const;
>
> what's wrong with w == -1, w == 0, w == 1, etc.?
I would love to do this and you seem to be somewhat knowledgeable of 
c++.   But i cannot for the life of me figure out how to do it.

say i have a TImode number, which must be represented in 4 ints on a 32 
bit host (the same issue happens on 64 bit hosts, but the examples are 
simpler on 32 bit hosts) and i compare it to -1.    The value that i am 
going to see as the argument of the function is going have the value 
0xffffffff.
but the value that i have internally is 128 bits.   do i take this and 0 
or sign extend it?   in particular if someone wants to compare a number 
to 0xdeadbeef  i have no idea what to do.   I tried defining two 
different functions, one that took a signed and one that took and 
unsigned number but then i wanted a cast in front of all the positive 
numbers.

If there is a way to do this, then i will do it, but it is going to have 
to work properly for things larger than a HOST_WIDE_INT.

I know that double-int does some of this and it does not carry around a 
notion of signedness either.   is this just code that has not been fully 
tested or is there a trick in c++ that i am missing?

> +  bool only_sign_bit_p (unsigned int prec) const;
> +  bool only_sign_bit_p () const;
>
> what's that?  Some of the less obvious functions should be documented
> in the header I think.  Smells of combining two things again here.
> Either wide-int has an intrinsic precision or it has not ... (like double-int).

Again, i have put in things that are useful, it is all driven from what 
the clients need.    This is done all over the back end.
>
> +  bool fits_u_p (unsigned int prec) const;
> +  bool fits_s_p (unsigned int prec) const;
>
> See above.
>
> +  /* Extension  */
> +
> +  inline wide_int ext (unsigned int offset, SignOp sgn) const;
> +  wide_int sext (unsigned int offset) const;
> +  wide_int sext (enum machine_mode mode) const;
> +  wide_int zext (unsigned int offset) const;
> +  wide_int zext (enum machine_mode mode) const;
>
> 'offset'?  I suppose that's 'precision'.  Does that alter the
> precision of *this?
> I think it should (and thus there should be no set_precision function).
> If it doesn't alter precision the functions don't make much sense to me.
The ones that alter the precision take a precision and bitsize, the ones 
that just want do the extension from some place and end up with a bit 
pattern that looks a certain way take the offset.

> +  wide_int set_bit (unsigned int bitpos) const;
>
> this kind of interface is strange.  You call it like w.set_bit (1) but it
> doesn't actually set bit 1 in w but it constructs a new wide_int and
> returns that.  So I suppose it should be
>
>    wide_int with_bit_set (unsigned int bitpos) const;
the interface is pure.    if you want me to change the name, that is fine.
>
> or similar.  Or simply have a mutating set_bit.  Or leave it out entierly,
> we cannot have many uses of this kind of weird interface.
>
> similar comments for the rest.
>
> .... rest skipped ...
>
> +                                   / HOST_BITS_PER_WIDE_INT + 32));
> +  char *dump (char* buf) const;
> + private:
> +
> +  /* Private utility routines.  */
> +  wide_int decompress (unsigned int target, unsigned int bitsize,
> +                      unsigned int precision) const;
> +  static wide_int add_overflow (const wide_int *op0, const wide_int *op1,
> +                               wide_int::SignOp sgn, bool *overflow);
> +  static wide_int sub_overflow (const wide_int *op0, const wide_int *op1,
> +                               wide_int::SignOp sgn, bool *overflow);
> +};
>
>
> IMHO way too many functions for a well tested initial implementation.
> There are a lot of things that seem operation compositions.  Is your
> concern efficiency here?  That would be bad as that means wide_ints
> are too heavy weight.
There are two sides of ease of use, you can force people to write out 
everything using a few primitives or you give them a rich interface.    
I come from the rich interface school.

If i was just going out and selling a new interface, something clean and 
small would be easier to sell.   However, that is not what i am 
doing.    I have converted substantially the entire back end to this and 
in the next few days i will submit patches that do the tree level.   So 
i am a big user and a rich api makes that conversion much easier.

Remember that these patches are not syntactic changes like the 
conversion of double-int to use c++ interfaces.    I am actually 
converting most of the code that only does transformations if the value 
fits in some fixed number of HWIs to work at the targets precision.   My 
motivation is that GCC does not actually work correctly with larger 
types.   I will do what it takes to get these patches in an acceptable 
form to the api police but my motivations is that the compiler is now 
neither correct nor robust with 128 bit types and above.

kenny

> Can you use gcov to see which functions have (how much) coverage?
>
> Thanks,
> Richard.
>
>
>
>> kenny
>>
>>
Lawrence Crowl Oct. 23, 2012, 6:08 p.m. UTC | #3
On 10/23/12, Richard Biener <richard.guenther@gmail.com> wrote:
> I wonder if for the various ways to specify precision/len there
> is a nice C++ way of moving this detail out of wide-int.  I can
> think only of one:
>
> struct WIntSpec {
>   WIntSpec (unsigned int len, unsigned int precision);
>   WIntSpec (const_tree);
>   WIntSpec (enum machine_mode);
>   unsigned int len;
>   unsigned int precision;
> };
>
> and then (sorry to pick one of the less useful functions):
>
>   inline static wide_int zero (WIntSpec)
>
> which you should be able to call like
>
>   wide_int::zero (SImode)
>   wide_int::zero (integer_type_node)
>
> and (ugly)
>
>   wide_int::zero (WIntSpec (32, 32))
>
> with C++0x wide_int::zero ({32, 32}) should be possible?  Or we
> keep the precision overload.  At least providing the WIntSpec
> abstraction allows custom ways of specifying required bits to
> not pollute wide-int itself too much.  Lawrence?

Yes, in C++11, wide_int::zero ({32, 32}) is possible using an
implicit conversion to WIntSpec from an initializer_list.  However,
at present we are limited to C++03 to enable older compilers as
boot compilers.
Lawrence Crowl Oct. 23, 2012, 6:38 p.m. UTC | #4
On 10/23/12, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> On 10/23/2012 10:12 AM, Richard Biener wrote:
> > +  inline bool minus_one_p () const;
> > +  inline bool zero_p () const;
> > +  inline bool one_p () const;
> > +  inline bool neg_p () const;
> >
> > what's wrong with w == -1, w == 0, w == 1, etc.?
>
> I would love to do this and you seem to be somewhat knowledgeable
> of c++.  But i cannot for the life of me figure out how to do it.

Starting from the simple case, you write an operator ==.

as global operator:  bool operator == (wide_int w, int i);
as member operator:  bool wide_int::operator == (int i);

In the simple case,

bool operator == (wide_int w, int i)
{
  switch (i)
    {
      case -1: return w.minus_one_p ();
      case  0: return w.zero_p ();
      case  1: return w.one_p ();
      default: unexpected....
    }
}

> say i have a TImode number, which must be represented in 4 ints
> on a 32 bit host (the same issue happens on 64 bit hosts, but
> the examples are simpler on 32 bit hosts) and i compare it to -1.
> The value that i am going to see as the argument of the function
> is going have the value 0xffffffff.  but the value that i have
> internally is 128 bits.  do i take this and 0 or sign extend it?

What would you have done with w.minus_one_p ()?

> in particular if someone wants to compare a number to 0xdeadbeef i
> have no idea what to do.  I tried defining two different functions,
> one that took a signed and one that took and unsigned number but
> then i wanted a cast in front of all the positive numbers.

This is where it does get tricky.  For signed arguments, you should sign
extend.  For unsigned arguments, you should not.  At present, we need
multiple overloads to avoid type ambiguities.

bool operator == (wide_int w, long long int i);
bool operator == (wide_int w, unsigned long long int i);
inline bool operator == (wide_int w, long int i)
  { return w == (long long int) i; }
inline bool operator (wide_int w, unsigned long int i)
  { return w == (unsigned long long int) i; }
inline bool operator == (wide_int w, int i)
  { return w == (long long int) i; }
inline bool operator (wide_int w, unsigned int i)
  { return w == (unsigned long long int) i; }

(There is a proposal before the C++ committee to fix this problem.)

Even so, there is room for potential bugs when wide_int does not
carry around whether or not it is signed.  The problem is that
regardless of what the programmer thinks of the sign of the wide int,
the comparison will use the sign of the int.

> If there is a way to do this, then i will do it, but it is going
> to have to work properly for things larger than a HOST_WIDE_INT.

The long-term solution, IMHO, is to either carry the sign information
around in either the type or the class data.  (I prefer type, but
with a mechanism to carry it as data when needed.)  Such comparisons
would then require consistency in signedness between the wide int
and the plain int.

> I know that double-int does some of this and it does not carry
> around a notion of signedness either.  is this just code that has
> not been fully tested or is there a trick in c++ that i am missing?

The double int class only provides == and !=, and only with other
double ints.  Otherwise, it has the same value query functions that
you do above.  In the case of double int, the goal was to simplify
use of the existing semantics.  If you are changing the semantics,
consider incorporating sign explicitly.
Kenneth Zadeck Oct. 23, 2012, 6:52 p.m. UTC | #5
On 10/23/2012 02:38 PM, Lawrence Crowl wrote:
> On 10/23/12, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>> +  inline bool minus_one_p () const;
>>> +  inline bool zero_p () const;
>>> +  inline bool one_p () const;
>>> +  inline bool neg_p () const;
>>>
>>> what's wrong with w == -1, w == 0, w == 1, etc.?
>> I would love to do this and you seem to be somewhat knowledgeable
>> of c++.  But i cannot for the life of me figure out how to do it.
> Starting from the simple case, you write an operator ==.
>
> as global operator:  bool operator == (wide_int w, int i);
> as member operator:  bool wide_int::operator == (int i);
>
> In the simple case,
>
> bool operator == (wide_int w, int i)
> {
>    switch (i)
>      {
>        case -1: return w.minus_one_p ();
>        case  0: return w.zero_p ();
>        case  1: return w.one_p ();
>        default: unexpected....
>      }
> }
>
no, this seems wrong.    you do not want to write code that can only 
fail at runtime unless there is a damn good reason to do that.
>> say i have a TImode number, which must be represented in 4 ints
>> on a 32 bit host (the same issue happens on 64 bit hosts, but
>> the examples are simpler on 32 bit hosts) and i compare it to -1.
>> The value that i am going to see as the argument of the function
>> is going have the value 0xffffffff.  but the value that i have
>> internally is 128 bits.  do i take this and 0 or sign extend it?
> What would you have done with w.minus_one_p ()?
the code "knows" that -1 is a negative number and it knows the precision 
of w.    That is enough information.   So it logically builds a -1 that 
has enough bits to do the conversion.


>> in particular if someone wants to compare a number to 0xdeadbeef i
>> have no idea what to do.  I tried defining two different functions,
>> one that took a signed and one that took and unsigned number but
>> then i wanted a cast in front of all the positive numbers.
> This is where it does get tricky.  For signed arguments, you should sign
> extend.  For unsigned arguments, you should not.  At present, we need
> multiple overloads to avoid type ambiguities.
>
> bool operator == (wide_int w, long long int i);
> bool operator == (wide_int w, unsigned long long int i);
> inline bool operator == (wide_int w, long int i)
>    { return w == (long long int) i; }
> inline bool operator (wide_int w, unsigned long int i)
>    { return w == (unsigned long long int) i; }
> inline bool operator == (wide_int w, int i)
>    { return w == (long long int) i; }
> inline bool operator (wide_int w, unsigned int i)
>    { return w == (unsigned long long int) i; }
>
> (There is a proposal before the C++ committee to fix this problem.)
>
> Even so, there is room for potential bugs when wide_int does not
> carry around whether or not it is signed.  The problem is that
> regardless of what the programmer thinks of the sign of the wide int,
> the comparison will use the sign of the int.
when they do we can revisit this.   but i looked at this and i said the 
potential bugs were not worth the effort.
>
>> If there is a way to do this, then i will do it, but it is going
>> to have to work properly for things larger than a HOST_WIDE_INT.
> The long-term solution, IMHO, is to either carry the sign information
> around in either the type or the class data.  (I prefer type, but
> with a mechanism to carry it as data when needed.)  Such comparisons
> would then require consistency in signedness between the wide int
> and the plain int.
carrying the sign information is a non starter.    The rtl level does 
not have it and the middle end violates it more often than not.    My 
view was to design this having looked at all of the usage.   I have 
basically converted the whole compiler before i released the abi.   I am 
still getting out the errors and breaking it up in reviewable sized 
patches, but i knew very very well who my clients were before i wrote 
the abi.
>
>> I know that double-int does some of this and it does not carry
>> around a notion of signedness either.  is this just code that has
>> not been fully tested or is there a trick in c++ that i am missing?
> The double int class only provides == and !=, and only with other
> double ints.  Otherwise, it has the same value query functions that
> you do above.  In the case of double int, the goal was to simplify
> use of the existing semantics.  If you are changing the semantics,
> consider incorporating sign explicitly.
>
i have, and it does not work.
Lawrence Crowl Oct. 23, 2012, 8:25 p.m. UTC | #6
On 10/23/12, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> On 10/23/2012 02:38 PM, Lawrence Crowl wrote:
>> On 10/23/12, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>> +  inline bool minus_one_p () const;
>>>> +  inline bool zero_p () const;
>>>> +  inline bool one_p () const;
>>>> +  inline bool neg_p () const;
>>>>
>>>> what's wrong with w == -1, w == 0, w == 1, etc.?
>>> I would love to do this and you seem to be somewhat knowledgeable
>>> of c++.  But i cannot for the life of me figure out how to do it.
>> Starting from the simple case, you write an operator ==.
>>
>> as global operator:  bool operator == (wide_int w, int i);
>> as member operator:  bool wide_int::operator == (int i);
>>
>> In the simple case,
>>
>> bool operator == (wide_int w, int i)
>> {
>>    switch (i)
>>      {
>>        case -1: return w.minus_one_p ();
>>        case  0: return w.zero_p ();
>>        case  1: return w.one_p ();
>>        default: unexpected....
>>      }
>> }
>
> no, this seems wrong.    you do not want to write code that can only
> fail at runtime unless there is a damn good reason to do that.

Well, that's because it's the oversimplified case.  :-)

>>> say i have a TImode number, which must be represented in 4 ints
>>> on a 32 bit host (the same issue happens on 64 bit hosts, but
>>> the examples are simpler on 32 bit hosts) and i compare it to -1.
>>> The value that i am going to see as the argument of the function
>>> is going have the value 0xffffffff.  but the value that i have
>>> internally is 128 bits.  do i take this and 0 or sign extend it?
>>
>> What would you have done with w.minus_one_p ()?
>
> the code "knows" that -1 is a negative number and it knows the
> precision of w.  That is enough information.  So it logically
> builds a -1 that has enough bits to do the conversion.

And the code could also know that '-n' is a negative number and do
the identical conversion.  It would certainly be more difficult to
write and get all the edge cases.

>>> in particular if someone wants to compare a number to 0xdeadbeef i
>>> have no idea what to do.  I tried defining two different functions,
>>> one that took a signed and one that took and unsigned number but
>>> then i wanted a cast in front of all the positive numbers.
>> This is where it does get tricky.  For signed arguments, you should sign
>> extend.  For unsigned arguments, you should not.  At present, we need
>> multiple overloads to avoid type ambiguities.
>>
>> bool operator == (wide_int w, long long int i);
>> bool operator == (wide_int w, unsigned long long int i);
>> inline bool operator == (wide_int w, long int i)
>>    { return w == (long long int) i; }
>> inline bool operator (wide_int w, unsigned long int i)
>>    { return w == (unsigned long long int) i; }
>> inline bool operator == (wide_int w, int i)
>>    { return w == (long long int) i; }
>> inline bool operator (wide_int w, unsigned int i)
>>    { return w == (unsigned long long int) i; }
>>
>> (There is a proposal before the C++ committee to fix this problem.)
>>
>> Even so, there is room for potential bugs when wide_int does not
>> carry around whether or not it is signed.  The problem is that
>> regardless of what the programmer thinks of the sign of the wide int,
>> the comparison will use the sign of the int.
>
> when they do we can revisit this.   but i looked at this and i said the
> potential bugs were not worth the effort.

I won't disagree.  I was answering what I thought were questions on
what was possible.

>>> If there is a way to do this, then i will do it, but it is going
>>> to have to work properly for things larger than a HOST_WIDE_INT.
>> The long-term solution, IMHO, is to either carry the sign information
>> around in either the type or the class data.  (I prefer type, but
>> with a mechanism to carry it as data when needed.)  Such comparisons
>> would then require consistency in signedness between the wide int
>> and the plain int.
>
> carrying the sign information is a non starter.    The rtl level does
> not have it and the middle end violates it more often than not.    My
> view was to design this having looked at all of the usage.   I have
> basically converted the whole compiler before i released the abi.   I am
> still getting out the errors and breaking it up in reviewable sized
> patches, but i knew very very well who my clients were before i wrote
> the abi.

Okay.

>>> I know that double-int does some of this and it does not carry
>>> around a notion of signedness either.  is this just code that has
>>> not been fully tested or is there a trick in c++ that i am missing?
>> The double int class only provides == and !=, and only with other
>> double ints.  Otherwise, it has the same value query functions that
>> you do above.  In the case of double int, the goal was to simplify
>> use of the existing semantics.  If you are changing the semantics,
>> consider incorporating sign explicitly.
>
> i have, and it does not work.

Unfortunate.
Kenneth Zadeck Oct. 23, 2012, 9:29 p.m. UTC | #7
On 10/23/2012 04:25 PM, Lawrence Crowl wrote:
> On 10/23/12, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>> On 10/23/2012 02:38 PM, Lawrence Crowl wrote:
>>> On 10/23/12, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>> +  inline bool minus_one_p () const;
>>>>> +  inline bool zero_p () const;
>>>>> +  inline bool one_p () const;
>>>>> +  inline bool neg_p () const;
>>>>>
>>>>> what's wrong with w == -1, w == 0, w == 1, etc.?
>>>> I would love to do this and you seem to be somewhat knowledgeable
>>>> of c++.  But i cannot for the life of me figure out how to do it.
>>> Starting from the simple case, you write an operator ==.
>>>
>>> as global operator:  bool operator == (wide_int w, int i);
>>> as member operator:  bool wide_int::operator == (int i);
>>>
>>> In the simple case,
>>>
>>> bool operator == (wide_int w, int i)
>>> {
>>>     switch (i)
>>>       {
>>>         case -1: return w.minus_one_p ();
>>>         case  0: return w.zero_p ();
>>>         case  1: return w.one_p ();
>>>         default: unexpected....
>>>       }
>>> }
>> no, this seems wrong.    you do not want to write code that can only
>> fail at runtime unless there is a damn good reason to do that.
> Well, that's because it's the oversimplified case.  :-)
>
>>>> say i have a TImode number, which must be represented in 4 ints
>>>> on a 32 bit host (the same issue happens on 64 bit hosts, but
>>>> the examples are simpler on 32 bit hosts) and i compare it to -1.
>>>> The value that i am going to see as the argument of the function
>>>> is going have the value 0xffffffff.  but the value that i have
>>>> internally is 128 bits.  do i take this and 0 or sign extend it?
>>> What would you have done with w.minus_one_p ()?
>> the code "knows" that -1 is a negative number and it knows the
>> precision of w.  That is enough information.  So it logically
>> builds a -1 that has enough bits to do the conversion.
> And the code could also know that '-n' is a negative number and do
> the identical conversion.  It would certainly be more difficult to
> write and get all the edge cases.
I am not a c++ hacker.   if someone wants to go there later, we can 
investigate this.
but it seems like a can of worms right now.
>
>>>> in particular if someone wants to compare a number to 0xdeadbeef i
>>>> have no idea what to do.  I tried defining two different functions,
>>>> one that took a signed and one that took and unsigned number but
>>>> then i wanted a cast in front of all the positive numbers.
>>> This is where it does get tricky.  For signed arguments, you should sign
>>> extend.  For unsigned arguments, you should not.  At present, we need
>>> multiple overloads to avoid type ambiguities.
>>>
>>> bool operator == (wide_int w, long long int i);
>>> bool operator == (wide_int w, unsigned long long int i);
>>> inline bool operator == (wide_int w, long int i)
>>>     { return w == (long long int) i; }
>>> inline bool operator (wide_int w, unsigned long int i)
>>>     { return w == (unsigned long long int) i; }
>>> inline bool operator == (wide_int w, int i)
>>>     { return w == (long long int) i; }
>>> inline bool operator (wide_int w, unsigned int i)
>>>     { return w == (unsigned long long int) i; }
>>>
>>> (There is a proposal before the C++ committee to fix this problem.)
>>>
>>> Even so, there is room for potential bugs when wide_int does not
>>> carry around whether or not it is signed.  The problem is that
>>> regardless of what the programmer thinks of the sign of the wide int,
>>> the comparison will use the sign of the int.
>> when they do we can revisit this.   but i looked at this and i said the
>> potential bugs were not worth the effort.
> I won't disagree.  I was answering what I thought were questions on
> what was possible.
>
>>>> If there is a way to do this, then i will do it, but it is going
>>>> to have to work properly for things larger than a HOST_WIDE_INT.
>>> The long-term solution, IMHO, is to either carry the sign information
>>> around in either the type or the class data.  (I prefer type, but
>>> with a mechanism to carry it as data when needed.)  Such comparisons
>>> would then require consistency in signedness between the wide int
>>> and the plain int.
>> carrying the sign information is a non starter.    The rtl level does
>> not have it and the middle end violates it more often than not.    My
>> view was to design this having looked at all of the usage.   I have
>> basically converted the whole compiler before i released the abi.   I am
>> still getting out the errors and breaking it up in reviewable sized
>> patches, but i knew very very well who my clients were before i wrote
>> the abi.
> Okay.
>
>>>> I know that double-int does some of this and it does not carry
>>>> around a notion of signedness either.  is this just code that has
>>>> not been fully tested or is there a trick in c++ that i am missing?
>>> The double int class only provides == and !=, and only with other
>>> double ints.  Otherwise, it has the same value query functions that
>>> you do above.  In the case of double int, the goal was to simplify
>>> use of the existing semantics.  If you are changing the semantics,
>>> consider incorporating sign explicitly.
>> i have, and it does not work.
> Unfortunate.
>
There is certainly a desire here not to let the ugliness of the back end 
not drag down the tree level.   But the truth is that with respect to 
the signedness, the tree level is very dirty.  If the double int code 
had taken a type from the start, things may have been different.    But 
the truth is that sign is different than size, some of the operations do 
not care about sign and some of the transformations that we want to do 
we do need them to be done in a particular but that is not dependent on 
the sign of the type. Richi has beat me up about this but i actually 
believe that most of the time when the sign of the type does not match 
the sign of the operation, the code is actually correct.

There were heroic things done at the rtl level to find the mode (and 
therefor the bitsize and precision) since this is not stored in constant 
integers.    But doing that allows the precision and bitsize to be 
reliably stored in the wide-ints.   But both the tree and rtl levels 
would need to significantly change to put the sign in.
Richard Biener Oct. 24, 2012, 9:43 a.m. UTC | #8
On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
>
> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>
>> On Tue, Oct 9, 2012 at 5:09 PM, Kenneth Zadeck <zadeck@naturalbridge.com>
>> wrote:
>>>
>>> This patch implements the wide-int class.    this is a more general
>>> version
>>> of the double-int class and is meant to be the eventual replacement for
>>> that
>>> class.    The use of this class removes all dependencies of the host from
>>> the target compiler's integer math.
>>>
>>> I have made all of the changes i agreed to in the earlier emails. In
>>> particular, this class internally maintains a bitsize and precision but
>>> not
>>> a mode.     The class now is neutral about modes and tree-types.    the
>>> functions that take modes or tree-types are just convenience functions
>>> that
>>> translate the parameters into bitsize and precision and where ever there
>>> is
>>> a call that takes a mode, there is a corresponding call that takes a
>>> tree-type.
>>>
>>> All of the little changes that richi suggested have also been made.
>>>
>>> The buffer sizes is now twice the size needed by the largest integer
>>> mode.
>>> This gives enough room for tree-vrp to do full multiplies on any type
>>> that
>>> the target supports.
>>>
>>> Tested on x86-64.
>>>
>>> This patch depends on the first three patches.   I am still waiting on
>>> final
>>> approval on the hwint.h patch.
>>>
>>> Ok to commit?
>>
>> diff --git a/gcc/wide-int.h b/gcc/wide-int.h
>> new file mode 100644
>> index 0000000..efd2c01
>> --- /dev/null
>> +++ b/gcc/wide-int.h
>> ...
>> +#ifndef GENERATOR_FILE
>
>
>> The whole file is guarded with that ... why?  That is bound to be fragile
>> once
>> use of wide-int spreads?  How do generator programs end up including
>> this file if they don't need it at all?
>
> This is so that wide-int can be included at the level of the generators.
> There some stuff that needs to see this type that is done during the build
> build phase that cannot see the types that are included in wide-int.h.
>
>> +#include "tree.h"
>> +#include "hwint.h"
>> +#include "options.h"
>> +#include "tm.h"
>> +#include "insn-modes.h"
>> +#include "machmode.h"
>> +#include "double-int.h"
>> +#include <gmp.h>
>> +#include "insn-modes.h"
>> +
>>
>> That's a lot of tree and rtl dependencies.  double-int.h avoids these by
>> placing conversion routines in different headers or by only resorting to
>> types in coretypes.h.  Please try to reduce the above to a minimum.
>>
>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>> HOST_BITS_PER_WIDE_INT];
>>
>> are we sure this rounds properly?  Consider a port with max byte mode
>> size 4 on a 64bit host.
>
> I do not believe that this can happen.   The core compiler includes all
> modes up to TI mode, so by default we already up to 128 bits.

And mode bitsizes are always power-of-two?  I suppose so.

>> I still would like to have the ability to provide specializations of
>> wide_int
>> for "small" sizes, thus ideally wide_int would be a template templated
>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>> identical to double_int, thus we should be able to do
>>
>> typedef wide_int<2> double_int;
>
> If you want to go down this path after the patches get in, go for it.    I
> see no use at all for this.
> This was not meant to be a plug in replacement for double int. This goal of
> this patch is to get the compiler to do the constant math the way that the
> target does it.   Any such instantiation is by definition placing some
> predefined limit that some target may not want.

Well, what I don't really like is that we now have two implementations
of functions that perform integer math on two-HWI sized integers.  What
I also don't like too much is that we have two different interfaces to operate
on them!  Can't you see how I come to not liking this?  Especially the
latter ...

>> in double-int.h and replace its implementation with a specialization of
>> wide_int.  Due to a number of divergences (double_int is not a subset
>> of wide_int) that doesn't seem easily possible (one reason is the
>> ShiftOp and related enums you use).  Of course wide_int is not a
>> template either.  For the hypotetical embedded target above we'd
>> end up using wide_int<1>, a even more trivial specialization.
>>
>> I realize again this wide-int is not what your wide-int is (because you
>> add a precision member).  Still factoring out the "common"s of
>> wide-int and double-int into a wide_int_raw <> template should be
>> possible.
>>
>> +class wide_int {
>> +  /* Internal representation.  */
>> +
>> +  /* VAL is set to a size that is capable of computing a full
>> +     multiplication on the largest mode that is represented on the
>> +     target.  The full multiplication is use by tree-vrp.  If
>> +     operations are added that require larger buffers, then VAL needs
>> +     to be changed.  */
>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>> HOST_BITS_PER_WIDE_INT];
>> +  unsigned short len;
>> +  unsigned int bitsize;
>> +  unsigned int precision;
>>
>> The len, bitsize and precision members need documentation.  At least
>> one sounds redundant.
>>
>> + public:
>> +  enum ShiftOp {
>> +    NONE,
>> NONE is never a descriptive name ... I suppose this is for arithmetic vs.
>> logical shifts?
>
> suggest something

Make it an overload instead of passing an extra argument.  Or as I say
make callers apply shift-count truncation separately.

>> +    /* There are two uses for the wide-int shifting functions.  The
>> +       first use is as an emulation of the target hardware.  The
>> +       second use is as service routines for other optimizations.  The
>> +       first case needs to be identified by passing TRUNC as the value
>> +       of ShiftOp so that shift amount is properly handled according to
>> the
>> +       SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
>> +       amount is always truncated by the bytesize of the mode of
>> +       THIS.  */
>> +    TRUNC
>>
>> ah, no, it's for SHIFT_COUNT_TRUNCATED.  "mode of THIS"?  Now
>> it's precision I suppose.  That said, handling SHIFT_COUNT_TRUNCATED
>> in wide-int sounds over-engineered, the caller should be responsible
>> of applying SHIFT_COUNT_TRUNCATED when needed.
>
> I am fighting all of the modes out.   i will update this patch with more
> cleanups

Thanks.

>> +  enum SignOp {
>> +    /* Many of the math functions produce different results depending
>> +       on if they are SIGNED or UNSIGNED.  In general, there are two
>> +       different functions, whose names are prefixed with an 'S" and
>> +       or an 'U'.  However, for some math functions there is also a
>> +       routine that does not have the prefix and takes an SignOp
>> +       parameter of SIGNED or UNSIGNED.  */
>> +    SIGNED,
>> +    UNSIGNED
>> +  };
>>
>> double-int and _all_ of the rest of the middle-end uses a 'int uns'
>> parameter
>> for this.  _Please_ follow that.  Otherwise if you communicate between
>> those interfaces you have to to uns ? UNSIGNED : SIGNED and
>> signop == UNSIGNED ? 1 : 0 all over the place.
>
> I really do not want to.   What i discovered is that some places in the
> compiler do and some places do not and some places take the reverse
> convention  MNEMONIC is better than NUMERIC.

I don't think that you scheme is an improvement given that GCC has a
mix of both.  I also fail to remember a place where it is not unsigned == 1,
signed == 0 (which you btw reverse ... SIGNED == 0, UNSIGNED ==1,
so even ugly casting doesn't allow moderating between the two schemes).

>>
>> +  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
>> +                            unsigned int precision);
>> +  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
>> +                            unsigned int precision, bool *overflow);
>>
>> I suppose , bool *overflow = NULL would do as well?  What's the
>> distinction between bitsize and precision (I suppose, see the above
>> question)?  I suppose precision <= bitsize and bits above precision
>> are sign/zero extended (and the rest of the val[] array contains undefined
>> content?)?  But we also have 'len', which then matches bitsize (well
>> it may be larger).  So IMHO either bitsize or len is redundant.  At least
>> the tree level nowhere considers partial integer modes special this way
>> (only the precision is ever taken into account, but we always
>> sign-/zero-extend
>> to the whole double-int - thus 'len' in wide-int terms).
>
> Some operations, mostly shifting, needs  both the bitsize and precision.
> In the early days of the compiler, people pretty much ignored the precision
> and most of the compiler math was done using the bitsize.   This made it
> very painful for the people who supported ports that had odd sized modes.
> Bernd has been cleaning this up at the rtl level and the first 5 patches
> move that forward.   But you really do need both.

I fail to understand.  How do you require anything else than precision for
shifting?  Do you say that shifting a PSImode 24-bit precision integer right
with a logical shift will pull in sign-bits that happen to be 'extended' into
the upper 8 bits?  Richard, can you clarify this please?  I still see
len == bitsize and thus a redundancy.

>> +  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type);
>> +  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type,
>> +                                  bool *overflow);
>>
>> Are you needing this overload or are you adding it for "completeness"?
>> Because this interface is wrong(TM), and whoever calls it has at least
>> cleanup opportunities ... from_tree or from_rtx makes sense.
>
> the functions are actually quite different.    in general overflow checking
> at least doubles the cost of implementation and sometimes it is much
> greater.   Having them be separate cleans up the implementation.

Still in the above you take the actual value from op0, a HOST_WIDE_INT
(with what precison / bitsize?), the wide_int will have precision / bitsize
from the tree type argument (will op0 be truncated / extended here?  How
is *overflow computed?).

Btw, if you have an overload like this please make 'overflow' a reference,
it doesn't make sense to pass NULL here (just use the other overload).

>
>
>>
>> Also in which cases do you need "overflow"?  a HWI always fits in
>> a wide-int!  All trees and rtxen do, too.  You seem to merge two
>> operations here, conversion to wide-int and truncation / extension.
>> That doesn't look like a clean interface to me.
>
> This is the big difference between double_int and wide_int:    I do not care
> if it fits in the underlying representation, i care if it fits in the
> precision of the type.   If the type is for a char and the value is 100000,
> overflow is set.   In general setting overflow in the wide-int interface
> means something very different from double-int interface.    A large number
> of places that check for overflow with double in do not need to check it for
> wide int.

Then in the above interface you really want to ask hwi_fits_type_p instead
of globbing this into the construction of a wide-int.  (This whole interface
is really a mess - sorry)

If overflow setting for wide-int means somehting very different from double-int
then it shouldn't be called overflow or it should at least be documented!
So - what does it actually mean?  Double-ints implicitely have precision
of 2*HWI so overflow is set whenever the operation overflows that precision.
It doesn't make sense to ask if a HWI fits in a double-int for example.

>> +  static wide_int from_double_int (enum machine_mode, double_int);
>>
>> the choice of passing a mode seems arbitrary (the natural interface
>> would be nothing - precision is 2 * HWI).  Passing it as first parameter
>> is even more strange to me ;)
>
> the first part of the question is answered above.   the second part of the
> question was fixed on my private tree a few days ago and will get pushed
> out.

Thanks.

>> +  static wide_int from_tree (const_tree);
>> +  static wide_int from_rtx (const_rtx, enum machine_mode);
>>
>> +  HOST_WIDE_INT to_shwi (unsigned int prec) const;
>>
>> See above - merges two basic operations.  You should write
>>
>>   w.sext (prec).to_shwi ()
>>
>> instead (I _suppose_ it should sign-extend, should it? ;)).  Btw, why
>> don't we need to always specify bitsize together with precision in all
>> the places?  (not that I am arguing for it, I'm arguing for the
>> removal of bitsize)
>
> because the bitsize and precision are part of the representation of the
> value.    You only have to specify them on the way into wide-int or if you
> need to change them (this is rare but it does happen).

My objection to to_shwi still stands.  to_shwi should be

 HOST_WIDE_INT to_shwi () const;

and it could gcc_assert () that the wide-int fits in a shwi (you should be
able to see a pattern in my complaints ...)

>> +  static wide_int max_value (unsigned int bitsize, unsigned int prec,
>> SignOp sgn);
>>
>> now that I am seeing this - is there any restriction on how the precision
>> of a partial integer mode may differ from its bitsize?  Can we have
>> POImode with 1 bit precision?  I suppose the solution for all this is
>> that when converting a wide-int to a RTX with a mode then we need to
>> zero-/sign-extend to the modes bitsize (and wide-int only cares about
>> precision).  Eventually a set_len can adjust the amount of BITS_PER_UNITs
>> we fill with meaningful values if needed.  Otherwise len == precision
>> / BITS_PER_UNIT (rounded to HWI for obvious practical reasons).
>
> the precision must be less than or equal to the bitsize.   That is the only
> restriction.
> I do not know if you can have poi1?    Every case that i have seen, the
> partial int is a partial of the next largest power of 2 mode. But there is
> nothing in wide-int that cares about this.
>
>
>>
>> +  inline static wide_int minus_one (unsigned int bitsize, unsigned int
>> prec);
>> +  inline static wide_int minus_one (const_tree type);
>> +  inline static wide_int minus_one (enum machine_mode mode);
>> +  inline static wide_int zero (unsigned int bitsize, unsigned int prec);
>> +  inline static wide_int zero (const_tree type);
>> +  inline static wide_int zero (enum machine_mode mode);
>> +  inline static wide_int one (unsigned int bitsize, unsigned int prec);
>> +  inline static wide_int one (const_tree type);
>> +  inline static wide_int one (enum machine_mode mode);
>> +  inline static wide_int two (unsigned int bitsize, unsigned int prec);
>> +  inline static wide_int two (const_tree type);
>> +  inline static wide_int two (enum machine_mode mode);
>> +  inline static wide_int ten (unsigned int bitsize, unsigned int prec);
>> +  inline static wide_int ten (const_tree type);
>> +  inline static wide_int ten (enum machine_mode mode);
>>
>> wheeee .... ;)
>
> yes, and they are all used.
>
>> What's wrong with from_uhwi (10, ...)?  double-int has the above for
>> compatibility reasons only.  And why should I care about type/mode/size
>> for something as simple as '1'?
>
> the 10 is an interesting case.   at least in my patches it is not used, but
> i had put it in because i started from double-int and it has it.   However,
> i believe in fat api's if something gets used a lot, then it should be part
> of the api.    all of them except for 10 are used a lot.
>
> I will point out than in my original patch, these were just macros that
> expanded into from_uhwi, but richi wanted them to be real functions.

Yeah, but now with all the fancy overloads (but see my other suggestion below).

>
>
>
>> +  inline unsigned short get_len () const;
>> +  inline unsigned int get_bitsize () const;
>> +  inline unsigned int get_precision () const;
>> +  inline unsigned int get_full_len () const;
>>
>> not sure which air you are pulling full_len from ;)
>
> when you need it, you need it.   the dwarf writer really needs it, because
> it wants to see all of the words on a multiword value, not just the ones
> that "need" to be represented so that it is easy to read.
>
> I have a big comment on when not to use this in my tree.


+  /* VRP appears to be badly broken and this is a very ugly fix.  */
+  if (i >= 0)
+    return val[i] >> (HOST_BITS_PER_WIDE_INT - 1);

Err??  You mean you are getting array-bound warnings or what?


>> +  wide_int force_to_size (unsigned int bitsize,
>> +                         unsigned int precision) const;
>>
>> or rather 'trunc'?  Seems to be truncation and set_len combined?
>
> why do you think it is only shortening it?

Because it does not say whether you sign- or zero-extend:

+/* Copy THIS replacing the mode with MODE.  */

Fix comment.

+wide_int
+wide_int::force_to_size (unsigned int bs, unsigned int prec) const
+{
+  wide_int result;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  int blocks_needed = BLOCKS_NEEDED (prec);
+  int i;
+
+  result.bitsize = bs;
+  result.precision = prec;
+  result.len = blocks_needed < len ? blocks_needed : len;
+  for (i = 0; i < result.len; i++)
+    result.val[i] = val[i];
+
+  if (small_prec & (blocks_needed == len))

what?  small_prec seems to tell you whether precision is a
multiple of a HWI, now you only look at bit 1 (so it's equal
to precision & (blocks_needed == len).

+    result.val[blocks_needed-1]
+      = sext_hwi (result.val[blocks_needed-1], small_prec);

But I really think you want sth else...  maybe &&?  You also unconditionally
sign-extend, so the function name should somehow reflect this.

+  return result;
+}


>
>>
>> I wonder if for the various ways to specify precision/len there is a nice
>> C++
>> way of moving this detail out of wide-int.  I can think only of one:
>>
>> struct WIntSpec {
>>    WIntSpec (unsigned int len, unsigned int precision);
>>    WIntSpec (const_tree);
>>    WIntSpec (enum machine_mode);
>>    unsigned int len;
>>    unsigned int precision;
>> };
>>
>> and then (sorry to pick one of the less useful functions):
>>
>>    inline static wide_int zero (WIntSpec)
>
> It depends on what you have available in your hands when you need to call a
> function.   At the rtl level we almost never have tree types, but we have
> modes.    At the tree level, you almost never have modes.    In general the
> convenience functions take anything and just extract the prec and bitsize
> for you.    But there are several places that need to specify the prec and
> bitsize and so these are now the base primitives.

You always have prec and bitsize.  GET_MODE_PRECISION and
GET_MODE_BITSIZE, TYPE_PRECISION and TYPE_SIZE tell you this.
The other overloads are for convenience only.  You didn't respond to
the idea of making the interface less cluttered by using some abstraction.

>> which you should be able to call like
>>
>>    wide_int::zero (SImode)
>>    wide_int::zero (integer_type_node)
>>
>> and (ugly)
>>
>>    wide_int::zero (WIntSpec (32, 32))
>>
>> with C++0x wide_int::zero ({32, 32}) should be possible?  Or we keep
>> the precision overload.  At least providing the WIntSpec abstraction
>> allows custom ways of specifying required bits to not pollute wide-int
>> itself too much.  Lawrence?
>>
>> +  /* Printing functions.  */
>> +
>> +  void print_dec (char *buf, SignOp sgn) const;
>> +  void print_dec (FILE *file, SignOp sgn) const;
>> +  void print_decs (char *buf) const;
>> +  void print_decs (FILE *file) const;
>> +  void print_decu (char *buf) const;
>> +  void print_decu (FILE *file) const;
>> +  void print_hex (char *buf) const;
>> +  void print_hex (FILE *file) const;
>>
>> consider moving them to standalone functions, out of wide-int.h
>
> I do not see much reason to do this.    They use the internals of wide-int
> and moving them somewhere else is just exposing the internals for no real
> reason.

All internals are exposed already.

>>
>> +  inline bool minus_one_p () const;
>> +  inline bool zero_p () const;
>> +  inline bool one_p () const;
>> +  inline bool neg_p () const;
>>
>> what's wrong with w == -1, w == 0, w == 1, etc.?
>
> I would love to do this and you seem to be somewhat knowledgeable of c++.
> But i cannot for the life of me figure out how to do it.
>
> say i have a TImode number, which must be represented in 4 ints on a 32 bit
> host (the same issue happens on 64 bit hosts, but the examples are simpler
> on 32 bit hosts) and i compare it to -1.    The value that i am going to see
> as the argument of the function is going have the value 0xffffffff.
> but the value that i have internally is 128 bits.   do i take this and 0 or
> sign extend it?   in particular if someone wants to compare a number to
> 0xdeadbeef  i have no idea what to do.   I tried defining two different
> functions, one that took a signed and one that took and unsigned number but
> then i wanted a cast in front of all the positive numbers.
>
> If there is a way to do this, then i will do it, but it is going to have to
> work properly for things larger than a HOST_WIDE_INT.
>
> I know that double-int does some of this and it does not carry around a
> notion of signedness either.   is this just code that has not been fully
> tested or is there a trick in c++ that i am missing?

Not sure, but at least minus_one_p, zero_p and one_p can be implemented
with a signed HWI overload of operator==

But I suppose leaving things as-is is fine as well.

>> +  bool only_sign_bit_p (unsigned int prec) const;
>> +  bool only_sign_bit_p () const;
>>
>> what's that?  Some of the less obvious functions should be documented
>> in the header I think.  Smells of combining two things again here.
>> Either wide-int has an intrinsic precision or it has not ... (like
>> double-int).
>
>
> Again, i have put in things that are useful, it is all driven from what the
> clients need.    This is done all over the back end.

What's the current function that is called?

>>
>> +  bool fits_u_p (unsigned int prec) const;
>> +  bool fits_s_p (unsigned int prec) const;
>>
>> See above.
>>
>> +  /* Extension  */
>> +
>> +  inline wide_int ext (unsigned int offset, SignOp sgn) const;
>> +  wide_int sext (unsigned int offset) const;
>> +  wide_int sext (enum machine_mode mode) const;
>> +  wide_int zext (unsigned int offset) const;
>> +  wide_int zext (enum machine_mode mode) const;
>>
>> 'offset'?  I suppose that's 'precision'.  Does that alter the
>> precision of *this?
>> I think it should (and thus there should be no set_precision function).
>> If it doesn't alter precision the functions don't make much sense to me.
>
> The ones that alter the precision take a precision and bitsize, the ones
> that just want do the extension from some place and end up with a bit
> pattern that looks a certain way take the offset.

Different semantics for a function overload is ... ugly.

>
>> +  wide_int set_bit (unsigned int bitpos) const;
>>
>> this kind of interface is strange.  You call it like w.set_bit (1) but it
>> doesn't actually set bit 1 in w but it constructs a new wide_int and
>> returns that.  So I suppose it should be
>>
>>    wide_int with_bit_set (unsigned int bitpos) const;
>
> the interface is pure.    if you want me to change the name, that is fine.
>
>>
>> or similar.  Or simply have a mutating set_bit.  Or leave it out entierly,
>> we cannot have many uses of this kind of weird interface.
>>
>> similar comments for the rest.
>>
>> .... rest skipped ...
>>
>> +                                   / HOST_BITS_PER_WIDE_INT + 32));
>> +  char *dump (char* buf) const;
>> + private:
>> +
>> +  /* Private utility routines.  */
>> +  wide_int decompress (unsigned int target, unsigned int bitsize,
>> +                      unsigned int precision) const;
>> +  static wide_int add_overflow (const wide_int *op0, const wide_int *op1,
>> +                               wide_int::SignOp sgn, bool *overflow);
>> +  static wide_int sub_overflow (const wide_int *op0, const wide_int *op1,
>> +                               wide_int::SignOp sgn, bool *overflow);
>> +};
>>
>>
>> IMHO way too many functions for a well tested initial implementation.
>> There are a lot of things that seem operation compositions.  Is your
>> concern efficiency here?  That would be bad as that means wide_ints
>> are too heavy weight.
>
> There are two sides of ease of use, you can force people to write out
> everything using a few primitives or you give them a rich interface.    I
> come from the rich interface school.
>
> If i was just going out and selling a new interface, something clean and
> small would be easier to sell.   However, that is not what i am doing.    I
> have converted substantially the entire back end to this and in the next few
> days i will submit patches that do the tree level.   So i am a big user and
> a rich api makes that conversion much easier.

Maybe for you, but not for people that come looking for a way to do
something with that interface.  They see a galore of functions with
similar names and weird arguments.  They try to figure out which to pick
and get confused.  They end up resorting to the _implementation_ :(

The interface is, frankly, a mess.  And I don't agree that it is a pre-existing
interface.  The wide-int class should have a clean and small interface.

You can write the rich interface as standalone functions, the clean and
small interface should provide all building blocks you need.

> Remember that these patches are not syntactic changes like the conversion of
> double-int to use c++ interfaces.    I am actually converting most of the
> code that only does transformations if the value fits in some fixed number
> of HWIs to work at the targets precision.   My motivation is that GCC does
> not actually work correctly with larger types.   I will do what it takes to
> get these patches in an acceptable form to the api police but my motivations
> is that the compiler is now neither correct nor robust with 128 bit types
> and above.

Well, you may gain that things work with 128 bit and above types, but
at the same time GCC will be a more confusing place to work with.

One general implementation detail comment:

+class wide_int {
+  /* Internal representation.  */
+
+  /* VAL is set to a size that is capable of computing a full
+     multiplication on the largest mode that is represented on the
+     target.  The full multiplication is use by tree-vrp.  If
+     operations are added that require larger buffers, then VAL needs
+     to be changed.  */
+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
+  unsigned short len;
+  unsigned int bitsize;
+  unsigned int precision;

put VAL last so one can allocate less storage, thus it should be a trailing
array.  Make all of wide-int have non-mutating 'len' (that's required to
make variable storage size for VAL viable).  Thus,

    const unsigned int len;
    unsigned int bitsize;
    unsigned int precision;
    HOST_WIDE_INT val[1 /* len */];

that is, how much of the wide-int implementation is mutating?  (I'd
even say bitsize and precision should be 'const').

Then of couse I'd factor out the bitsize / precision notion into a
wrapper to be able to do the double-int sharing I am after...

Richard.

> kenny
>
>
>> Can you use gcov to see which functions have (how much) coverage?
>>
>> Thanks,
>> Richard.
>>
>>
>>
>>> kenny
>>>
>>>
>
Mike Stump Oct. 24, 2012, 5:23 p.m. UTC | #9
On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> 
>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>> 
>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>> HOST_BITS_PER_WIDE_INT];
>>> 
>>> are we sure this rounds properly?  Consider a port with max byte mode
>>> size 4 on a 64bit host.
>> 
>> I do not believe that this can happen.   The core compiler includes all
>> modes up to TI mode, so by default we already up to 128 bits.
> 
> And mode bitsizes are always power-of-two?  I suppose so.

Actually, no, they are not.  Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size.  Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /  HOST_BITS_PER_WIDE_INT should do it.

>>> I still would like to have the ability to provide specializations of
>>> wide_int
>>> for "small" sizes, thus ideally wide_int would be a template templated
>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>> identical to double_int, thus we should be able to do
>>> 
>>> typedef wide_int<2> double_int;
>> 
>> If you want to go down this path after the patches get in, go for it.    I
>> see no use at all for this.
>> This was not meant to be a plug in replacement for double int. This goal of
>> this patch is to get the compiler to do the constant math the way that the
>> target does it.   Any such instantiation is by definition placing some
>> predefined limit that some target may not want.
> 
> Well, what I don't really like is that we now have two implementations
> of functions that perform integer math on two-HWI sized integers.  What
> I also don't like too much is that we have two different interfaces to operate
> on them!  Can't you see how I come to not liking this?  Especially the
> latter …

double_int is logically dead.  Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler.  All the necessary semantics and code of double-int _has_ been refactored into wide-int already.  Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed.  The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that.
Richard Biener Oct. 25, 2012, 10:42 a.m. UTC | #10
On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>
>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>> HOST_BITS_PER_WIDE_INT];
>>>>
>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>> size 4 on a 64bit host.
>>>
>>> I do not believe that this can happen.   The core compiler includes all
>>> modes up to TI mode, so by default we already up to 128 bits.
>>
>> And mode bitsizes are always power-of-two?  I suppose so.
>
> Actually, no, they are not.  Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size.  Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /  HOST_BITS_PER_WIDE_INT should do it.
>
>>>> I still would like to have the ability to provide specializations of
>>>> wide_int
>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>> identical to double_int, thus we should be able to do
>>>>
>>>> typedef wide_int<2> double_int;
>>>
>>> If you want to go down this path after the patches get in, go for it.    I
>>> see no use at all for this.
>>> This was not meant to be a plug in replacement for double int. This goal of
>>> this patch is to get the compiler to do the constant math the way that the
>>> target does it.   Any such instantiation is by definition placing some
>>> predefined limit that some target may not want.
>>
>> Well, what I don't really like is that we now have two implementations
>> of functions that perform integer math on two-HWI sized integers.  What
>> I also don't like too much is that we have two different interfaces to operate
>> on them!  Can't you see how I come to not liking this?  Especially the
>> latter …
>
> double_int is logically dead.  Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler.  All the necessary semantics and code of double-int _has_ been refactored into wide-int already.  Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed.  The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that.

double_int, compared to wide_int, is fast and lean.  I doubt we will
get rid of it - you
will make compile-time math a _lot_ slower.  Just profile when you for example
change get_inner_reference to use wide_ints.

To be able to remove double_int in favor of wide_int requires _at least_
templating wide_int on 'len' and providing specializations for 1 and 2.

It might be a non-issue for math that operates on trees or RTXen due to
the allocation overhead we pay, but in recent years we transitioned important
paths away from using tree math to using double_ints _for speed reasons_.

Richard.
Kenneth Zadeck Oct. 25, 2012, 10:55 a.m. UTC | #11
On 10/25/2012 06:42 AM, Richard Biener wrote:
> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>> <zadeck@naturalbridge.com> wrote:
>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>
>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>> size 4 on a 64bit host.
>>>> I do not believe that this can happen.   The core compiler includes all
>>>> modes up to TI mode, so by default we already up to 128 bits.
>>> And mode bitsizes are always power-of-two?  I suppose so.
>> Actually, no, they are not.  Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size.  Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /  HOST_BITS_PER_WIDE_INT should do it.
>>
>>>>> I still would like to have the ability to provide specializations of
>>>>> wide_int
>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>> identical to double_int, thus we should be able to do
>>>>>
>>>>> typedef wide_int<2> double_int;
>>>> If you want to go down this path after the patches get in, go for it.    I
>>>> see no use at all for this.
>>>> This was not meant to be a plug in replacement for double int. This goal of
>>>> this patch is to get the compiler to do the constant math the way that the
>>>> target does it.   Any such instantiation is by definition placing some
>>>> predefined limit that some target may not want.
>>> Well, what I don't really like is that we now have two implementations
>>> of functions that perform integer math on two-HWI sized integers.  What
>>> I also don't like too much is that we have two different interfaces to operate
>>> on them!  Can't you see how I come to not liking this?  Especially the
>>> latter …
>> double_int is logically dead.  Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler.  All the necessary semantics and code of double-int _has_ been refactored into wide-int already.  Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed.  The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that.
> double_int, compared to wide_int, is fast and lean.  I doubt we will
> get rid of it - you
> will make compile-time math a _lot_ slower.  Just profile when you for example
> change get_inner_reference to use wide_ints.
>
> To be able to remove double_int in favor of wide_int requires _at least_
> templating wide_int on 'len' and providing specializations for 1 and 2.
>
> It might be a non-issue for math that operates on trees or RTXen due to
> the allocation overhead we pay, but in recent years we transitioned important
> paths away from using tree math to using double_ints _for speed reasons_.
>
> Richard.
i do not know why you believe this about the speed.     double int 
always does synthetic math since you do everything at 128 bit precision.

the thing about wide int, is that since it does math to the precision's 
size, it almost never does uses synthetic operations since the sizes for 
almost every instance can be done using the native math on the 
machine.   almost every call has a check to see if the operation can be 
done natively.    I seriously doubt that you are going to do TI mode 
math much faster than i do it and if you do who cares.

the number of calls does not effect the performance in any negative way 
and it fact is more efficient since common things that require more than 
one operation in double in are typically done in a single operation.
Richard Biener Oct. 25, 2012, 11:58 a.m. UTC | #12
On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
>
> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>
>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>>>
>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>> wrote:
>>>>
>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>> <zadeck@naturalbridge.com> wrote:
>>>>>
>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>
>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>
>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>> size 4 on a 64bit host.
>>>>>
>>>>> I do not believe that this can happen.   The core compiler includes all
>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>
>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>
>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>> are not power of two, and, if there isn't an int mode that is bigger, we'd
>>> want to round up the partial int bit size.  Something like ((2 *
>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>> HOST_BITS_PER_WIDE_INT should do it.
>>>
>>>>>> I still would like to have the ability to provide specializations of
>>>>>> wide_int
>>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>> identical to double_int, thus we should be able to do
>>>>>>
>>>>>> typedef wide_int<2> double_int;
>>>>>
>>>>> If you want to go down this path after the patches get in, go for it.
>>>>> I
>>>>> see no use at all for this.
>>>>> This was not meant to be a plug in replacement for double int. This
>>>>> goal of
>>>>> this patch is to get the compiler to do the constant math the way that
>>>>> the
>>>>> target does it.   Any such instantiation is by definition placing some
>>>>> predefined limit that some target may not want.
>>>>
>>>> Well, what I don't really like is that we now have two implementations
>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>> I also don't like too much is that we have two different interfaces to
>>>> operate
>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>> latter …
>>>
>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>> waste of time, as the time is better spent removing double-int from the
>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>> anything to double-int is wrong, as once double-int is removed, then all the
>>> api changes to make double-int share from wide-int is wasted and must then
>>> be removed.  The path forward is the complete removal of double-int; it is
>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>
>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>> get rid of it - you
>> will make compile-time math a _lot_ slower.  Just profile when you for
>> example
>> change get_inner_reference to use wide_ints.
>>
>> To be able to remove double_int in favor of wide_int requires _at least_
>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>
>> It might be a non-issue for math that operates on trees or RTXen due to
>> the allocation overhead we pay, but in recent years we transitioned
>> important
>> paths away from using tree math to using double_ints _for speed reasons_.
>>
>> Richard.
>
> i do not know why you believe this about the speed.     double int always
> does synthetic math since you do everything at 128 bit precision.
>
> the thing about wide int, is that since it does math to the precision's
> size, it almost never does uses synthetic operations since the sizes for
> almost every instance can be done using the native math on the machine.
> almost every call has a check to see if the operation can be done natively.
> I seriously doubt that you are going to do TI mode math much faster than i
> do it and if you do who cares.
>
> the number of calls does not effect the performance in any negative way and
> it fact is more efficient since common things that require more than one
> operation in double in are typically done in a single operation.

Simple double-int operations like

inline double_int
double_int::and_not (double_int b) const
{
  double_int result;
  result.low = low & ~b.low;
  result.high = high & ~b.high;
  return result;
}

are always going to be faster than conditionally executing only one operation
(but inside an offline function).

Richard.
Richard Sandiford Oct. 31, 2012, 10:43 a.m. UTC | #13
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>>
>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>
>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>>>>
>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>
>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>
>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>
>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>> size 4 on a 64bit host.
>>>>>>
>>>>>> I do not believe that this can happen.   The core compiler includes all
>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>
>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>
>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>> are not power of two, and, if there isn't an int mode that is bigger, we'd
>>>> want to round up the partial int bit size.  Something like ((2 *
>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>
>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>> wide_int
>>>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>
>>>>>>> typedef wide_int<2> double_int;
>>>>>>
>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>> I
>>>>>> see no use at all for this.
>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>> goal of
>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>> the
>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>> predefined limit that some target may not want.
>>>>>
>>>>> Well, what I don't really like is that we now have two implementations
>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>> I also don't like too much is that we have two different interfaces to
>>>>> operate
>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>> latter …
>>>>
>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>> waste of time, as the time is better spent removing double-int from the
>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>> anything to double-int is wrong, as once double-int is removed, then all the
>>>> api changes to make double-int share from wide-int is wasted and must then
>>>> be removed.  The path forward is the complete removal of double-int; it is
>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>
>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>> get rid of it - you
>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>> example
>>> change get_inner_reference to use wide_ints.
>>>
>>> To be able to remove double_int in favor of wide_int requires _at least_
>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>
>>> It might be a non-issue for math that operates on trees or RTXen due to
>>> the allocation overhead we pay, but in recent years we transitioned
>>> important
>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>
>>> Richard.
>>
>> i do not know why you believe this about the speed.     double int always
>> does synthetic math since you do everything at 128 bit precision.
>>
>> the thing about wide int, is that since it does math to the precision's
>> size, it almost never does uses synthetic operations since the sizes for
>> almost every instance can be done using the native math on the machine.
>> almost every call has a check to see if the operation can be done natively.
>> I seriously doubt that you are going to do TI mode math much faster than i
>> do it and if you do who cares.
>>
>> the number of calls does not effect the performance in any negative way and
>> it fact is more efficient since common things that require more than one
>> operation in double in are typically done in a single operation.
>
> Simple double-int operations like
>
> inline double_int
> double_int::and_not (double_int b) const
> {
>   double_int result;
>   result.low = low & ~b.low;
>   result.high = high & ~b.high;
>   return result;
> }
>
> are always going to be faster than conditionally executing only one operation
> (but inside an offline function).

OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting that you gave this example, since before you were
complaining about too many fused ops.  Clearly this one could be
removed in favour of separate and() and not() operations, but why
not provide a fused one if there are clients who'll make use of it?
I think Kenny's API is just taking that to its logical conclusion.
There doesn't seem to be anything sacrosanct about the current choice
of what's fused and what isn't.

The speed problem we had using trees for internal arithmetic isn't
IMO a good argument for keeping double_int in preference to wide_int.
Allocating and constructing tree objects to hold temporary values,
storing an integer representation in it, then calling tree arithmetic
routines that pull out the integer representation again and create a
tree to hold the result, is going to be much slower than using either
double_int or wide_int.  I'd be very surprised if we notice any
measurable difference between double_int and wide_int here.

I still see no reason to keep double_int around.  The width of a host
wide integer really shouldn't have any significance.

Your main complaint seems to be that the wide_int API is different
from the double_int one, but we can't literally use the same API, since
double_int has an implicit precision and bitsize, and wide_int doesn't.
Having a precision that is separate from the underlying representation
is IMO the most important feature of wide_int, so:

   template wide_int<2> double_int;

is never going to be a drop-in, API-compatible replacement for double_int.

FWIW, I like the idea about having a class that wraps up the mode,
tree, and (precision, bitsize) choice.

Richard
Richard Biener Oct. 31, 2012, 11:53 a.m. UTC | #14
On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>
>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>>>>>
>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>
>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>> size 4 on a 64bit host.
>>>>>>>
>>>>>>> I do not believe that this can happen.   The core compiler includes all
>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>
>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>
>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>> are not power of two, and, if there isn't an int mode that is bigger, we'd
>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>
>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>> wide_int
>>>>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>
>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>
>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>> I
>>>>>>> see no use at all for this.
>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>> goal of
>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>> the
>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>> predefined limit that some target may not want.
>>>>>>
>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>> operate
>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>> latter …
>>>>>
>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>> waste of time, as the time is better spent removing double-int from the
>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>> anything to double-int is wrong, as once double-int is removed, then all the
>>>>> api changes to make double-int share from wide-int is wasted and must then
>>>>> be removed.  The path forward is the complete removal of double-int; it is
>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>
>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>> get rid of it - you
>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>> example
>>>> change get_inner_reference to use wide_ints.
>>>>
>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>
>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>> the allocation overhead we pay, but in recent years we transitioned
>>>> important
>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>
>>>> Richard.
>>>
>>> i do not know why you believe this about the speed.     double int always
>>> does synthetic math since you do everything at 128 bit precision.
>>>
>>> the thing about wide int, is that since it does math to the precision's
>>> size, it almost never does uses synthetic operations since the sizes for
>>> almost every instance can be done using the native math on the machine.
>>> almost every call has a check to see if the operation can be done natively.
>>> I seriously doubt that you are going to do TI mode math much faster than i
>>> do it and if you do who cares.
>>>
>>> the number of calls does not effect the performance in any negative way and
>>> it fact is more efficient since common things that require more than one
>>> operation in double in are typically done in a single operation.
>>
>> Simple double-int operations like
>>
>> inline double_int
>> double_int::and_not (double_int b) const
>> {
>>   double_int result;
>>   result.low = low & ~b.low;
>>   result.high = high & ~b.high;
>>   return result;
>> }
>>
>> are always going to be faster than conditionally executing only one operation
>> (but inside an offline function).
>
> OK, this is really in reply to the 4.8 thing, but it felt more
> appropriate here.
>
> It's interesting that you gave this example, since before you were
> complaining about too many fused ops.  Clearly this one could be
> removed in favour of separate and() and not() operations, but why
> not provide a fused one if there are clients who'll make use of it?

I was more concerned about fused operations that use precision
or bitsize as input.  That is for example

>> +  bool only_sign_bit_p (unsigned int prec) const;
>> +  bool only_sign_bit_p () const;

The first is construct a wide-int with precision prec (and sign- or
zero-extend it) and then call only_sign_bit_p on it.  Such function
should not be necessary and existing callers should be questioned
instead of introducing it.

In fact wide-int seems to have so many "fused" operations that
we run out of sensible recognizable names for them.  Which results
in a lot of confusion on what the functions actually do (at least for me).

> I think Kenny's API is just taking that to its logical conclusion.
> There doesn't seem to be anything sacrosanct about the current choice
> of what's fused and what isn't.

Maybe.  I'd rather have seen an initial small wide-int API and fused
operations introduced separately together with the places they are
used.  In the current way it's way too tedious to go over all of them
and match them with callers, lookup enough context and then
make up my mind on whether the caller should do sth different or not.

Thus, consider the big initial API a reason that all this review takes
so long ...

> The speed problem we had using trees for internal arithmetic isn't
> IMO a good argument for keeping double_int in preference to wide_int.
> Allocating and constructing tree objects to hold temporary values,
> storing an integer representation in it, then calling tree arithmetic
> routines that pull out the integer representation again and create a
> tree to hold the result, is going to be much slower than using either
> double_int or wide_int.  I'd be very surprised if we notice any
> measurable difference between double_int and wide_int here.
>
> I still see no reason to keep double_int around.  The width of a host
> wide integer really shouldn't have any significance.
>
> Your main complaint seems to be that the wide_int API is different
> from the double_int one, but we can't literally use the same API, since
> double_int has an implicit precision and bitsize, and wide_int doesn't.
> Having a precision that is separate from the underlying representation
> is IMO the most important feature of wide_int, so:
>
>    template wide_int<2> double_int;
>
> is never going to be a drop-in, API-compatible replacement for double_int.

My reasoning was that if you strip wide-int of precision and bitsize
you have a double_int<N> class.  Thus wide-int should have a base
of that kind and just add precision / bitsize ontop of that.  It wouldn't
be a step forward if we end up replacing double_int uses with
wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
would it?

> FWIW, I like the idea about having a class that wraps up the mode,
> tree, and (precision, bitsize) choice.

Yeah, if just to make the API look leaner.

Richard.

> Richard
Richard Sandiford Oct. 31, 2012, 12:05 p.m. UTC | #15
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>> <zadeck@naturalbridge.com> wrote:
>>>>
>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>
>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>>>>>>
>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>
>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>
>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>
>>>>>>>> I do not believe that this can happen.   The core compiler includes all
>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>
>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>
>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>> are not power of two, and, if there isn't an int mode that is bigger, we'd
>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>
>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>> wide_int
>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>
>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>
>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>> I
>>>>>>>> see no use at all for this.
>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>> goal of
>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>> the
>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>> predefined limit that some target may not want.
>>>>>>>
>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>> operate
>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>> latter …
>>>>>>
>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>> then all the
>>>>>> api changes to make double-int share from wide-int is wasted and must then
>>>>>> be removed.  The path forward is the complete removal of double-int; it is
>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>
>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>> get rid of it - you
>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>> example
>>>>> change get_inner_reference to use wide_ints.
>>>>>
>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>
>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>> important
>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>
>>>>> Richard.
>>>>
>>>> i do not know why you believe this about the speed.     double int always
>>>> does synthetic math since you do everything at 128 bit precision.
>>>>
>>>> the thing about wide int, is that since it does math to the precision's
>>>> size, it almost never does uses synthetic operations since the sizes for
>>>> almost every instance can be done using the native math on the machine.
>>>> almost every call has a check to see if the operation can be done natively.
>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>> do it and if you do who cares.
>>>>
>>>> the number of calls does not effect the performance in any negative way and
>>>> it fact is more efficient since common things that require more than one
>>>> operation in double in are typically done in a single operation.
>>>
>>> Simple double-int operations like
>>>
>>> inline double_int
>>> double_int::and_not (double_int b) const
>>> {
>>>   double_int result;
>>>   result.low = low & ~b.low;
>>>   result.high = high & ~b.high;
>>>   return result;
>>> }
>>>
>>> are always going to be faster than conditionally executing only one operation
>>> (but inside an offline function).
>>
>> OK, this is really in reply to the 4.8 thing, but it felt more
>> appropriate here.
>>
>> It's interesting that you gave this example, since before you were
>> complaining about too many fused ops.  Clearly this one could be
>> removed in favour of separate and() and not() operations, but why
>> not provide a fused one if there are clients who'll make use of it?
>
> I was more concerned about fused operations that use precision
> or bitsize as input.  That is for example
>
>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>> +  bool only_sign_bit_p () const;
>
> The first is construct a wide-int with precision prec (and sign- or
> zero-extend it) and then call only_sign_bit_p on it.  Such function
> should not be necessary and existing callers should be questioned
> instead of introducing it.
>
> In fact wide-int seems to have so many "fused" operations that
> we run out of sensible recognizable names for them.  Which results
> in a lot of confusion on what the functions actually do (at least for me).

Well, I suppose I can't really say anything useful either way on
that one, since I'm not writing the patch and I'm not reviewing it :-)

>> I think Kenny's API is just taking that to its logical conclusion.
>> There doesn't seem to be anything sacrosanct about the current choice
>> of what's fused and what isn't.
>
> Maybe.  I'd rather have seen an initial small wide-int API and fused
> operations introduced separately together with the places they are
> used.  In the current way it's way too tedious to go over all of them
> and match them with callers, lookup enough context and then
> make up my mind on whether the caller should do sth different or not.
>
> Thus, consider the big initial API a reason that all this review takes
> so long ...
>
>> The speed problem we had using trees for internal arithmetic isn't
>> IMO a good argument for keeping double_int in preference to wide_int.
>> Allocating and constructing tree objects to hold temporary values,
>> storing an integer representation in it, then calling tree arithmetic
>> routines that pull out the integer representation again and create a
>> tree to hold the result, is going to be much slower than using either
>> double_int or wide_int.  I'd be very surprised if we notice any
>> measurable difference between double_int and wide_int here.
>>
>> I still see no reason to keep double_int around.  The width of a host
>> wide integer really shouldn't have any significance.
>>
>> Your main complaint seems to be that the wide_int API is different
>> from the double_int one, but we can't literally use the same API, since
>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>> Having a precision that is separate from the underlying representation
>> is IMO the most important feature of wide_int, so:
>>
>>    template wide_int<2> double_int;
>>
>> is never going to be a drop-in, API-compatible replacement for double_int.
>
> My reasoning was that if you strip wide-int of precision and bitsize
> you have a double_int<N> class.

But you don't!  Because...

> Thus wide-int should have a base
> of that kind and just add precision / bitsize ontop of that.  It wouldn't
> be a step forward if we end up replacing double_int uses with
> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
> would it?

...the precision and bitsize isn't an optional extra, either conceptually
or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
the hood, but the value of N is an internal implementation detail.
No operations are done to N HWIs, they're done to the number of bits
in the operands.  Whereas a double_int<N> class does everything to N HWIs.

Richard
Richard Biener Oct. 31, 2012, 12:11 p.m. UTC | #16
On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>> <zadeck@naturalbridge.com> wrote:
>>>>>
>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>>>>>>>
>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>
>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>
>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>
>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>
>>>>>>>>> I do not believe that this can happen.   The core compiler includes all
>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>
>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>
>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>> are not power of two, and, if there isn't an int mode that is bigger, we'd
>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>
>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>> wide_int
>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>
>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>
>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>> I
>>>>>>>>> see no use at all for this.
>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>> goal of
>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>> the
>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>
>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>> operate
>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>> latter …
>>>>>>>
>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>> then all the
>>>>>>> api changes to make double-int share from wide-int is wasted and must then
>>>>>>> be removed.  The path forward is the complete removal of double-int; it is
>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>
>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>> get rid of it - you
>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>> example
>>>>>> change get_inner_reference to use wide_ints.
>>>>>>
>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>
>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>> important
>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>
>>>>>> Richard.
>>>>>
>>>>> i do not know why you believe this about the speed.     double int always
>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>
>>>>> the thing about wide int, is that since it does math to the precision's
>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>> almost every instance can be done using the native math on the machine.
>>>>> almost every call has a check to see if the operation can be done natively.
>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>> do it and if you do who cares.
>>>>>
>>>>> the number of calls does not effect the performance in any negative way and
>>>>> it fact is more efficient since common things that require more than one
>>>>> operation in double in are typically done in a single operation.
>>>>
>>>> Simple double-int operations like
>>>>
>>>> inline double_int
>>>> double_int::and_not (double_int b) const
>>>> {
>>>>   double_int result;
>>>>   result.low = low & ~b.low;
>>>>   result.high = high & ~b.high;
>>>>   return result;
>>>> }
>>>>
>>>> are always going to be faster than conditionally executing only one operation
>>>> (but inside an offline function).
>>>
>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>> appropriate here.
>>>
>>> It's interesting that you gave this example, since before you were
>>> complaining about too many fused ops.  Clearly this one could be
>>> removed in favour of separate and() and not() operations, but why
>>> not provide a fused one if there are clients who'll make use of it?
>>
>> I was more concerned about fused operations that use precision
>> or bitsize as input.  That is for example
>>
>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>> +  bool only_sign_bit_p () const;
>>
>> The first is construct a wide-int with precision prec (and sign- or
>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>> should not be necessary and existing callers should be questioned
>> instead of introducing it.
>>
>> In fact wide-int seems to have so many "fused" operations that
>> we run out of sensible recognizable names for them.  Which results
>> in a lot of confusion on what the functions actually do (at least for me).
>
> Well, I suppose I can't really say anything useful either way on
> that one, since I'm not writing the patch and I'm not reviewing it :-)
>
>>> I think Kenny's API is just taking that to its logical conclusion.
>>> There doesn't seem to be anything sacrosanct about the current choice
>>> of what's fused and what isn't.
>>
>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>> operations introduced separately together with the places they are
>> used.  In the current way it's way too tedious to go over all of them
>> and match them with callers, lookup enough context and then
>> make up my mind on whether the caller should do sth different or not.
>>
>> Thus, consider the big initial API a reason that all this review takes
>> so long ...
>>
>>> The speed problem we had using trees for internal arithmetic isn't
>>> IMO a good argument for keeping double_int in preference to wide_int.
>>> Allocating and constructing tree objects to hold temporary values,
>>> storing an integer representation in it, then calling tree arithmetic
>>> routines that pull out the integer representation again and create a
>>> tree to hold the result, is going to be much slower than using either
>>> double_int or wide_int.  I'd be very surprised if we notice any
>>> measurable difference between double_int and wide_int here.
>>>
>>> I still see no reason to keep double_int around.  The width of a host
>>> wide integer really shouldn't have any significance.
>>>
>>> Your main complaint seems to be that the wide_int API is different
>>> from the double_int one, but we can't literally use the same API, since
>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>> Having a precision that is separate from the underlying representation
>>> is IMO the most important feature of wide_int, so:
>>>
>>>    template wide_int<2> double_int;
>>>
>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>
>> My reasoning was that if you strip wide-int of precision and bitsize
>> you have a double_int<N> class.
>
> But you don't!  Because...
>
>> Thus wide-int should have a base
>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>> be a step forward if we end up replacing double_int uses with
>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>> would it?
>
> ...the precision and bitsize isn't an optional extra, either conceptually
> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
> the hood, but the value of N is an internal implementation detail.
> No operations are done to N HWIs, they're done to the number of bits
> in the operands.  Whereas a double_int<N> class does everything to N HWIs.

If that's the only effect then either bitsize or precision is redundant (and
we also have len ...).  Note I did not mention len above, thus the base
class would retain 'len' and double-int would simply use 2 for it
(if you don't template it but make it variable).

Richard.


> Richard
Richard Sandiford Oct. 31, 2012, 12:22 p.m. UTC | #17
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>
>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>
>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>> <mikestump@comcast.net> wrote:
>>>>>>>>
>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>
>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>
>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>
>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>> includes all
>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>
>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>
>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>> bigger, we'd
>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>
>>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>>> wide_int
>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>> templated
>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>
>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>
>>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>>> I
>>>>>>>>>> see no use at all for this.
>>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>>> goal of
>>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>>> the
>>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>
>>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>>> operate
>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>>> latter …
>>>>>>>>
>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>> then all the
>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>> must then
>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>> double-int; it is
>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>>
>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>> get rid of it - you
>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>>> example
>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>
>>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>>
>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>> important
>>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>>
>>>>>>> Richard.
>>>>>>
>>>>>> i do not know why you believe this about the speed.     double int always
>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>
>>>>>> the thing about wide int, is that since it does math to the precision's
>>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>>> almost every instance can be done using the native math on the machine.
>>>>>> almost every call has a check to see if the operation can be done
>>>>>> natively.
>>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>>> do it and if you do who cares.
>>>>>>
>>>>>> the number of calls does not effect the performance in any
>>>>>> negative way and
>>>>>> it fact is more efficient since common things that require more than one
>>>>>> operation in double in are typically done in a single operation.
>>>>>
>>>>> Simple double-int operations like
>>>>>
>>>>> inline double_int
>>>>> double_int::and_not (double_int b) const
>>>>> {
>>>>>   double_int result;
>>>>>   result.low = low & ~b.low;
>>>>>   result.high = high & ~b.high;
>>>>>   return result;
>>>>> }
>>>>>
>>>>> are always going to be faster than conditionally executing only one
>>>>> operation
>>>>> (but inside an offline function).
>>>>
>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>> appropriate here.
>>>>
>>>> It's interesting that you gave this example, since before you were
>>>> complaining about too many fused ops.  Clearly this one could be
>>>> removed in favour of separate and() and not() operations, but why
>>>> not provide a fused one if there are clients who'll make use of it?
>>>
>>> I was more concerned about fused operations that use precision
>>> or bitsize as input.  That is for example
>>>
>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>> +  bool only_sign_bit_p () const;
>>>
>>> The first is construct a wide-int with precision prec (and sign- or
>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>> should not be necessary and existing callers should be questioned
>>> instead of introducing it.
>>>
>>> In fact wide-int seems to have so many "fused" operations that
>>> we run out of sensible recognizable names for them.  Which results
>>> in a lot of confusion on what the functions actually do (at least for me).
>>
>> Well, I suppose I can't really say anything useful either way on
>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>
>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>> of what's fused and what isn't.
>>>
>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>> operations introduced separately together with the places they are
>>> used.  In the current way it's way too tedious to go over all of them
>>> and match them with callers, lookup enough context and then
>>> make up my mind on whether the caller should do sth different or not.
>>>
>>> Thus, consider the big initial API a reason that all this review takes
>>> so long ...
>>>
>>>> The speed problem we had using trees for internal arithmetic isn't
>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>> Allocating and constructing tree objects to hold temporary values,
>>>> storing an integer representation in it, then calling tree arithmetic
>>>> routines that pull out the integer representation again and create a
>>>> tree to hold the result, is going to be much slower than using either
>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>> measurable difference between double_int and wide_int here.
>>>>
>>>> I still see no reason to keep double_int around.  The width of a host
>>>> wide integer really shouldn't have any significance.
>>>>
>>>> Your main complaint seems to be that the wide_int API is different
>>>> from the double_int one, but we can't literally use the same API, since
>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>> Having a precision that is separate from the underlying representation
>>>> is IMO the most important feature of wide_int, so:
>>>>
>>>>    template wide_int<2> double_int;
>>>>
>>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>>
>>> My reasoning was that if you strip wide-int of precision and bitsize
>>> you have a double_int<N> class.
>>
>> But you don't!  Because...
>>
>>> Thus wide-int should have a base
>>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>>> be a step forward if we end up replacing double_int uses with
>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>> would it?
>>
>> ...the precision and bitsize isn't an optional extra, either conceptually
>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>> the hood, but the value of N is an internal implementation detail.
>> No operations are done to N HWIs, they're done to the number of bits
>> in the operands.  Whereas a double_int<N> class does everything to N HWIs.
>
> If that's the only effect then either bitsize or precision is redundant (and
> we also have len ...).  Note I did not mention len above, thus the base
> class would retain 'len' and double-int would simply use 2 for it
> (if you don't template it but make it variable).

But that means that wide_int has to model a P-bit operation as a
"normal" len*HOST_WIDE_INT operation and then fix up the result
after the fact, which seems unnecessarily convoluted.  I still don't
see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
X*HOST_WIDE_INT operation for any X) has any special meaning.

Richard
Richard Biener Oct. 31, 2012, 12:44 p.m. UTC | #18
On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>> <mikestump@comcast.net> wrote:
>>>>>>>>>
>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>
>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>
>>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>>> includes all
>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>
>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>
>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>> bigger, we'd
>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>
>>>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>>>> wide_int
>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>> templated
>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>
>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>
>>>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>>>> I
>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>>>> goal of
>>>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>>>> the
>>>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>
>>>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>>>> operate
>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>>>> latter …
>>>>>>>>>
>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>> then all the
>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>> must then
>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>> double-int; it is
>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>>>
>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>> get rid of it - you
>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>>>> example
>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>
>>>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>>>
>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>> important
>>>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>
>>>>>>> i do not know why you believe this about the speed.     double int always
>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>
>>>>>>> the thing about wide int, is that since it does math to the precision's
>>>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>>>> almost every instance can be done using the native math on the machine.
>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>> natively.
>>>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>>>> do it and if you do who cares.
>>>>>>>
>>>>>>> the number of calls does not effect the performance in any
>>>>>>> negative way and
>>>>>>> it fact is more efficient since common things that require more than one
>>>>>>> operation in double in are typically done in a single operation.
>>>>>>
>>>>>> Simple double-int operations like
>>>>>>
>>>>>> inline double_int
>>>>>> double_int::and_not (double_int b) const
>>>>>> {
>>>>>>   double_int result;
>>>>>>   result.low = low & ~b.low;
>>>>>>   result.high = high & ~b.high;
>>>>>>   return result;
>>>>>> }
>>>>>>
>>>>>> are always going to be faster than conditionally executing only one
>>>>>> operation
>>>>>> (but inside an offline function).
>>>>>
>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>> appropriate here.
>>>>>
>>>>> It's interesting that you gave this example, since before you were
>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>> removed in favour of separate and() and not() operations, but why
>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>
>>>> I was more concerned about fused operations that use precision
>>>> or bitsize as input.  That is for example
>>>>
>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>> +  bool only_sign_bit_p () const;
>>>>
>>>> The first is construct a wide-int with precision prec (and sign- or
>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>> should not be necessary and existing callers should be questioned
>>>> instead of introducing it.
>>>>
>>>> In fact wide-int seems to have so many "fused" operations that
>>>> we run out of sensible recognizable names for them.  Which results
>>>> in a lot of confusion on what the functions actually do (at least for me).
>>>
>>> Well, I suppose I can't really say anything useful either way on
>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>
>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>> of what's fused and what isn't.
>>>>
>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>> operations introduced separately together with the places they are
>>>> used.  In the current way it's way too tedious to go over all of them
>>>> and match them with callers, lookup enough context and then
>>>> make up my mind on whether the caller should do sth different or not.
>>>>
>>>> Thus, consider the big initial API a reason that all this review takes
>>>> so long ...
>>>>
>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>> routines that pull out the integer representation again and create a
>>>>> tree to hold the result, is going to be much slower than using either
>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>> measurable difference between double_int and wide_int here.
>>>>>
>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>> wide integer really shouldn't have any significance.
>>>>>
>>>>> Your main complaint seems to be that the wide_int API is different
>>>>> from the double_int one, but we can't literally use the same API, since
>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>> Having a precision that is separate from the underlying representation
>>>>> is IMO the most important feature of wide_int, so:
>>>>>
>>>>>    template wide_int<2> double_int;
>>>>>
>>>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>>>
>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>> you have a double_int<N> class.
>>>
>>> But you don't!  Because...
>>>
>>>> Thus wide-int should have a base
>>>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>>>> be a step forward if we end up replacing double_int uses with
>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>> would it?
>>>
>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>> the hood, but the value of N is an internal implementation detail.
>>> No operations are done to N HWIs, they're done to the number of bits
>>> in the operands.  Whereas a double_int<N> class does everything to N HWIs.
>>
>> If that's the only effect then either bitsize or precision is redundant (and
>> we also have len ...).  Note I did not mention len above, thus the base
>> class would retain 'len' and double-int would simply use 2 for it
>> (if you don't template it but make it variable).
>
> But that means that wide_int has to model a P-bit operation as a
> "normal" len*HOST_WIDE_INT operation and then fix up the result
> after the fact, which seems unnecessarily convoluted.

It does that right now.  The operations are carried out in a loop
over len HOST_WIDE_INT parts, the last HWI is then special-treated
to account for precision/size.  (yes, 'len' is also used as optimization - the
fact that len ends up being mutable is another thing I dislike about
wide-int.  If wide-ints are cheap then all ops should be non-mutating
(at least to 'len')).

>  I still don't
> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
> X*HOST_WIDE_INT operation for any X) has any special meaning.

Well, the same reason as a HOST_WIDE_INT variable has a meaning.
We use it to constrain what we (efficiently) want to work on.  For example
CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).

Oh, and I don't necessary see a use of double_int in its current form
but for an integer representation on the host that is efficient to manipulate
integer constants of a target dependent size.  For example the target
detail that we have partial integer modes with bitsize > precision and that
the bits > precision appearantly have a meaning when looking at the
bit-representation of a constant should not be part of the base class
of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
about the reason we track bitsize in addition to precision - I think it's
abstraction at the wrong level, the tree level does fine without knowing
about bitsize).

Richard.

> Richard
Richard Sandiford Oct. 31, 2012, 1:30 p.m. UTC | #19
Richard Biener <richard.guenther@gmail.com> writes:
>> But that means that wide_int has to model a P-bit operation as a
>> "normal" len*HOST_WIDE_INT operation and then fix up the result
>> after the fact, which seems unnecessarily convoluted.
>
> It does that right now.  The operations are carried out in a loop
> over len HOST_WIDE_INT parts, the last HWI is then special-treated
> to account for precision/size.  (yes, 'len' is also used as optimization - the
> fact that len ends up being mutable is another thing I dislike about
> wide-int.  If wide-ints are cheap then all ops should be non-mutating
> (at least to 'len')).

But the point of having a mutating len is that things like zero and -1
are common even for OImode values.  So if you're doing someting potentially
expensive like OImode multiplication, why do it to the number of
HOST_WIDE_INTs needed for an OImode value when the value we're
processing has only one significant HOST_WIDE_INT?

>>  I still don't
>> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
>> X*HOST_WIDE_INT operation for any X) has any special meaning.
>
> Well, the same reason as a HOST_WIDE_INT variable has a meaning.
> We use it to constrain what we (efficiently) want to work on.  For example
> CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
> doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).

But what about targets with modes wider than TImode?  Would double_int
still be appropriate then?  If not, why does CCP have to use a templated
type with a fixed number of HWIs (and all arithmetic done to a fixed
number of HWIs) rather than one that can adapt to the runtime values,
like wide_int can?

> Oh, and I don't necessary see a use of double_int in its current form
> but for an integer representation on the host that is efficient to manipulate
> integer constants of a target dependent size.  For example the target
> detail that we have partial integer modes with bitsize > precision and that
> the bits > precision appearantly have a meaning when looking at the
> bit-representation of a constant should not be part of the base class
> of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
> about the reason we track bitsize in addition to precision - I think it's
> abstraction at the wrong level, the tree level does fine without knowing
> about bitsize).

TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
tracked for shift truncation, and if so, I agree it makes sense
to do that separately.

But anyway, this whole discussion seems to have reached a stalemate.
Or I suppose a de-facto rejection, since you're the only person in
a position to approve the thing :-)

Richard
Kenneth Zadeck Oct. 31, 2012, 1:54 p.m. UTC | #20
On 10/31/2012 08:11 AM, Richard Biener wrote:
> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net> wrote:
>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>
>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>> I do not believe that this can happen.   The core compiler includes all
>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>>> are not power of two, and, if there isn't an int mode that is bigger, we'd
>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>
>>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>>> wide_int
>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template templated
>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>
>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>>> I
>>>>>>>>>> see no use at all for this.
>>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>>> goal of
>>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>>> the
>>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>>> operate
>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>>> latter …
>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>> then all the
>>>>>>>> api changes to make double-int share from wide-int is wasted and must then
>>>>>>>> be removed.  The path forward is the complete removal of double-int; it is
>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>> get rid of it - you
>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>>> example
>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>
>>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>>
>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>> important
>>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>>
>>>>>>> Richard.
>>>>>> i do not know why you believe this about the speed.     double int always
>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>
>>>>>> the thing about wide int, is that since it does math to the precision's
>>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>>> almost every instance can be done using the native math on the machine.
>>>>>> almost every call has a check to see if the operation can be done natively.
>>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>>> do it and if you do who cares.
>>>>>>
>>>>>> the number of calls does not effect the performance in any negative way and
>>>>>> it fact is more efficient since common things that require more than one
>>>>>> operation in double in are typically done in a single operation.
>>>>> Simple double-int operations like
>>>>>
>>>>> inline double_int
>>>>> double_int::and_not (double_int b) const
>>>>> {
>>>>>    double_int result;
>>>>>    result.low = low & ~b.low;
>>>>>    result.high = high & ~b.high;
>>>>>    return result;
>>>>> }
>>>>>
>>>>> are always going to be faster than conditionally executing only one operation
>>>>> (but inside an offline function).
>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>> appropriate here.
>>>>
>>>> It's interesting that you gave this example, since before you were
>>>> complaining about too many fused ops.  Clearly this one could be
>>>> removed in favour of separate and() and not() operations, but why
>>>> not provide a fused one if there are clients who'll make use of it?
>>> I was more concerned about fused operations that use precision
>>> or bitsize as input.  That is for example
>>>
>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>> +  bool only_sign_bit_p () const;
>>> The first is construct a wide-int with precision prec (and sign- or
>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>> should not be necessary and existing callers should be questioned
>>> instead of introducing it.
>>>
>>> In fact wide-int seems to have so many "fused" operations that
>>> we run out of sensible recognizable names for them.  Which results
>>> in a lot of confusion on what the functions actually do (at least for me).
>> Well, I suppose I can't really say anything useful either way on
>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>
>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>> of what's fused and what isn't.
>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>> operations introduced separately together with the places they are
>>> used.  In the current way it's way too tedious to go over all of them
>>> and match them with callers, lookup enough context and then
>>> make up my mind on whether the caller should do sth different or not.
>>>
>>> Thus, consider the big initial API a reason that all this review takes
>>> so long ...
>>>
>>>> The speed problem we had using trees for internal arithmetic isn't
>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>> Allocating and constructing tree objects to hold temporary values,
>>>> storing an integer representation in it, then calling tree arithmetic
>>>> routines that pull out the integer representation again and create a
>>>> tree to hold the result, is going to be much slower than using either
>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>> measurable difference between double_int and wide_int here.
>>>>
>>>> I still see no reason to keep double_int around.  The width of a host
>>>> wide integer really shouldn't have any significance.
>>>>
>>>> Your main complaint seems to be that the wide_int API is different
>>>> from the double_int one, but we can't literally use the same API, since
>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>> Having a precision that is separate from the underlying representation
>>>> is IMO the most important feature of wide_int, so:
>>>>
>>>>     template wide_int<2> double_int;
>>>>
>>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>> My reasoning was that if you strip wide-int of precision and bitsize
>>> you have a double_int<N> class.
>> But you don't!  Because...
>>
>>> Thus wide-int should have a base
>>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>>> be a step forward if we end up replacing double_int uses with
>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>> would it?
>> ...the precision and bitsize isn't an optional extra, either conceptually
>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>> the hood, but the value of N is an internal implementation detail.
>> No operations are done to N HWIs, they're done to the number of bits
>> in the operands.  Whereas a double_int<N> class does everything to N HWIs.
> If that's the only effect then either bitsize or precision is redundant (and
> we also have len ...).  Note I did not mention len above, thus the base
> class would retain 'len' and double-int would simply use 2 for it
> (if you don't template it but make it variable).
>
> Richard.
>
NO, in your own words, there are two parts of the compiler that want the 
infinite model.   The rest wants to do the math the way the target does 
it.   My version now accommodates both.    In tree vrp it scans the 
gimple and determines what the largest type is and that is the basis of 
all of the math in this pass.  If you just make double int bigger, then 
you are paying for big math everywhere.


>> Richard
Richard Biener Oct. 31, 2012, 1:54 p.m. UTC | #21
On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>>> But that means that wide_int has to model a P-bit operation as a
>>> "normal" len*HOST_WIDE_INT operation and then fix up the result
>>> after the fact, which seems unnecessarily convoluted.
>>
>> It does that right now.  The operations are carried out in a loop
>> over len HOST_WIDE_INT parts, the last HWI is then special-treated
>> to account for precision/size.  (yes, 'len' is also used as optimization - the
>> fact that len ends up being mutable is another thing I dislike about
>> wide-int.  If wide-ints are cheap then all ops should be non-mutating
>> (at least to 'len')).
>
> But the point of having a mutating len is that things like zero and -1
> are common even for OImode values.  So if you're doing someting potentially
> expensive like OImode multiplication, why do it to the number of
> HOST_WIDE_INTs needed for an OImode value when the value we're
> processing has only one significant HOST_WIDE_INT?

I don't propose doing that.  I propose that no wide-int member function
may _change_ it's len (to something larger).  Only that way you can
avoid allocating wasted space for zero and -1.  That way also the
artificial limit on 2 * largest-int-mode-hwis goes.

>>>  I still don't
>>> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
>>> X*HOST_WIDE_INT operation for any X) has any special meaning.
>>
>> Well, the same reason as a HOST_WIDE_INT variable has a meaning.
>> We use it to constrain what we (efficiently) want to work on.  For example
>> CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
>> doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).
>
> But what about targets with modes wider than TImode?  Would double_int
> still be appropriate then?  If not, why does CCP have to use a templated
> type with a fixed number of HWIs (and all arithmetic done to a fixed
> number of HWIs) rather than one that can adapt to the runtime values,
> like wide_int can?

Because nobody cares about accurate bit-tracking for modes larger than
TImode.  And because no convenient abstraction was available ;)

>> Oh, and I don't necessary see a use of double_int in its current form
>> but for an integer representation on the host that is efficient to manipulate
>> integer constants of a target dependent size.  For example the target
>> detail that we have partial integer modes with bitsize > precision and that
>> the bits > precision appearantly have a meaning when looking at the
>> bit-representation of a constant should not be part of the base class
>> of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
>> about the reason we track bitsize in addition to precision - I think it's
>> abstraction at the wrong level, the tree level does fine without knowing
>> about bitsize).
>
> TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
> tracked for shift truncation, and if so, I agree it makes sense
> to do that separately.

So, can we please remove all traces of bitsize from wide-int then?

> But anyway, this whole discussion seems to have reached a stalemate.
> Or I suppose a de-facto rejection, since you're the only person in
> a position to approve the thing :-)

There are many (silent) people that are able to approve the thing.  But the
point is I have too many issues with the current patch that I'm unable
to point at a specific thing I want Kenny to change after which the patch
would be fine.  So I rely on some guesswork from Kenny giving my
advices "leaner API", "less fused ops", "get rid of bitsize", "think of
abstracting the core HWI[len] operation", "there should be no tree or
RTL dependencies in the wide-int API" to produce an updated variant.
Which of course takes time, which of course crosses my vacation, which
in the end means it isn't going to make 4.8 (I _do_ like the idea of not
having a dependence on host properties for integer constant representation).

Btw, a good hint at what a minimal wide-int API would look like is if
you _just_ replace double-int users with it.  Then you obviously have to
implement only the double-int interface and conversion from/to double-int.

Richard.


> Richard
Richard Biener Oct. 31, 2012, 2:05 p.m. UTC | #22
On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
>
> On 10/31/2012 08:11 AM, Richard Biener wrote:
>>
>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>>
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>
>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>>
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>
>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener
>>>>>>>>> <richard.guenther@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>
>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte
>>>>>>>>>>>> mode
>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>
>>>>>>>>>>> I do not believe that this can happen.   The core compiler
>>>>>>>>>>> includes all
>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>
>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>
>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes
>>>>>>>>> that
>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>> bigger, we'd
>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>
>>>>>>>>>>>> I still would like to have the ability to provide
>>>>>>>>>>>> specializations of
>>>>>>>>>>>> wide_int
>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>> templated
>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should
>>>>>>>>>>>> be
>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>
>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>
>>>>>>>>>>> If you want to go down this path after the patches get in, go for
>>>>>>>>>>> it.
>>>>>>>>>>> I
>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>> This was not meant to be a plug in replacement for double int.
>>>>>>>>>>> This
>>>>>>>>>>> goal of
>>>>>>>>>>> this patch is to get the compiler to do the constant math the way
>>>>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>> target does it.   Any such instantiation is by definition placing
>>>>>>>>>>> some
>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>
>>>>>>>>>> Well, what I don't really like is that we now have two
>>>>>>>>>> implementations
>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.
>>>>>>>>>> What
>>>>>>>>>> I also don't like too much is that we have two different
>>>>>>>>>> interfaces to
>>>>>>>>>> operate
>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially
>>>>>>>>>> the
>>>>>>>>>> latter …
>>>>>>>>>
>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int
>>>>>>>>> is a
>>>>>>>>> waste of time, as the time is better spent removing double-int from
>>>>>>>>> the
>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_
>>>>>>>>> been
>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to
>>>>>>>>> vend
>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>> then all the
>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>> must then
>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>> double-int; it is
>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change
>>>>>>>>> that.
>>>>>>>>
>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>> get rid of it - you
>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you
>>>>>>>> for
>>>>>>>> example
>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>
>>>>>>>> To be able to remove double_int in favor of wide_int requires _at
>>>>>>>> least_
>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and
>>>>>>>> 2.
>>>>>>>>
>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due
>>>>>>>> to
>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>> important
>>>>>>>> paths away from using tree math to using double_ints _for speed
>>>>>>>> reasons_.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>
>>>>>>> i do not know why you believe this about the speed.     double int
>>>>>>> always
>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>
>>>>>>> the thing about wide int, is that since it does math to the
>>>>>>> precision's
>>>>>>> size, it almost never does uses synthetic operations since the sizes
>>>>>>> for
>>>>>>> almost every instance can be done using the native math on the
>>>>>>> machine.
>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>> natively.
>>>>>>> I seriously doubt that you are going to do TI mode math much faster
>>>>>>> than i
>>>>>>> do it and if you do who cares.
>>>>>>>
>>>>>>> the number of calls does not effect the performance in any negative
>>>>>>> way and
>>>>>>> it fact is more efficient since common things that require more than
>>>>>>> one
>>>>>>> operation in double in are typically done in a single operation.
>>>>>>
>>>>>> Simple double-int operations like
>>>>>>
>>>>>> inline double_int
>>>>>> double_int::and_not (double_int b) const
>>>>>> {
>>>>>>    double_int result;
>>>>>>    result.low = low & ~b.low;
>>>>>>    result.high = high & ~b.high;
>>>>>>    return result;
>>>>>> }
>>>>>>
>>>>>> are always going to be faster than conditionally executing only one
>>>>>> operation
>>>>>> (but inside an offline function).
>>>>>
>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>> appropriate here.
>>>>>
>>>>> It's interesting that you gave this example, since before you were
>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>> removed in favour of separate and() and not() operations, but why
>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>
>>>> I was more concerned about fused operations that use precision
>>>> or bitsize as input.  That is for example
>>>>
>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>> +  bool only_sign_bit_p () const;
>>>>
>>>> The first is construct a wide-int with precision prec (and sign- or
>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>> should not be necessary and existing callers should be questioned
>>>> instead of introducing it.
>>>>
>>>> In fact wide-int seems to have so many "fused" operations that
>>>> we run out of sensible recognizable names for them.  Which results
>>>> in a lot of confusion on what the functions actually do (at least for
>>>> me).
>>>
>>> Well, I suppose I can't really say anything useful either way on
>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>
>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>> of what's fused and what isn't.
>>>>
>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>> operations introduced separately together with the places they are
>>>> used.  In the current way it's way too tedious to go over all of them
>>>> and match them with callers, lookup enough context and then
>>>> make up my mind on whether the caller should do sth different or not.
>>>>
>>>> Thus, consider the big initial API a reason that all this review takes
>>>> so long ...
>>>>
>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>> routines that pull out the integer representation again and create a
>>>>> tree to hold the result, is going to be much slower than using either
>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>> measurable difference between double_int and wide_int here.
>>>>>
>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>> wide integer really shouldn't have any significance.
>>>>>
>>>>> Your main complaint seems to be that the wide_int API is different
>>>>> from the double_int one, but we can't literally use the same API, since
>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>> Having a precision that is separate from the underlying representation
>>>>> is IMO the most important feature of wide_int, so:
>>>>>
>>>>>     template wide_int<2> double_int;
>>>>>
>>>>> is never going to be a drop-in, API-compatible replacement for
>>>>> double_int.
>>>>
>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>> you have a double_int<N> class.
>>>
>>> But you don't!  Because...
>>>
>>>> Thus wide-int should have a base
>>>> of that kind and just add precision / bitsize ontop of that.  It
>>>> wouldn't
>>>> be a step forward if we end up replacing double_int uses with
>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>> would it?
>>>
>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>> the hood, but the value of N is an internal implementation detail.
>>> No operations are done to N HWIs, they're done to the number of bits
>>> in the operands.  Whereas a double_int<N> class does everything to N
>>> HWIs.
>>
>> If that's the only effect then either bitsize or precision is redundant
>> (and
>> we also have len ...).  Note I did not mention len above, thus the base
>> class would retain 'len' and double-int would simply use 2 for it
>> (if you don't template it but make it variable).
>>
>> Richard.
>>
> NO, in your own words, there are two parts of the compiler that want the
> infinite model.   The rest wants to do the math the way the target does it.
> My version now accommodates both.    In tree vrp it scans the gimple and
> determines what the largest type is and that is the basis of all of the math
> in this pass.  If you just make double int bigger, then you are paying for
> big math everywhere.

You have an artificial limit on what 'len' can be.  And you do not accomodate
users that do not want to pay the storage penalty for that arbitrary upper limit
choice.  That's all because 'len' may grow (mutate).  You could alternatively
not allow bitsize to grow / mutate and have allocation tied to bitsize instead
of len.

Ideally the wide-int interface would have two storage models:

class alloc_storage
{
  unsigned len; /* or bitsize */
  HOST_WIDE_INT *hwis;

  HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
}

class max_mode
{
  HOST_WIDE_INT hwis[largest integer mode size in hwi];

  HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
}

template <class storage>
class wide_int : storage
{

so you can even re-use the (const) in-place storage of INTEGER_CSTs
or RTL CONST_WIDEs.  And VRP would simply have its own storage
supporting 2 times the largest integer mode (or whatever choice it has).
And double-int would simply embed two.

Maybe this is the perfect example for introducing virtual functions as well
to ease inter-operability between the wide-int variants without making each
member a template on the 2nd wide-int operand (it's of course auto-deduced,
but well ...).

The above is just a brain-dump, details may need further thinking
Like the operator[], maybe the storage model should just be able
to return a pointer to the array of HWIs, this way the actual workers
can be out-of-line and non-templates.

Richard.

>
>>> Richard
>
>
Kenneth Zadeck Oct. 31, 2012, 2:18 p.m. UTC | #23
On 10/31/2012 10:05 AM, Richard Biener wrote:
> On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> On 10/31/2012 08:11 AM, Richard Biener wrote:
>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikestump@comcast.net>
>>>>>>>>> wrote:
>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener
>>>>>>>>>> <richard.guenther@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>>
>>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte
>>>>>>>>>>>>> mode
>>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>> I do not believe that this can happen.   The core compiler
>>>>>>>>>>>> includes all
>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes
>>>>>>>>>> that
>>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>>> bigger, we'd
>>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>>
>>>>>>>>>>>>> I still would like to have the ability to provide
>>>>>>>>>>>>> specializations of
>>>>>>>>>>>>> wide_int
>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>>> templated
>>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should
>>>>>>>>>>>>> be
>>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>>
>>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>> If you want to go down this path after the patches get in, go for
>>>>>>>>>>>> it.
>>>>>>>>>>>> I
>>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>>> This was not meant to be a plug in replacement for double int.
>>>>>>>>>>>> This
>>>>>>>>>>>> goal of
>>>>>>>>>>>> this patch is to get the compiler to do the constant math the way
>>>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>> target does it.   Any such instantiation is by definition placing
>>>>>>>>>>>> some
>>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>> Well, what I don't really like is that we now have two
>>>>>>>>>>> implementations
>>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.
>>>>>>>>>>> What
>>>>>>>>>>> I also don't like too much is that we have two different
>>>>>>>>>>> interfaces to
>>>>>>>>>>> operate
>>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially
>>>>>>>>>>> the
>>>>>>>>>>> latter …
>>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int
>>>>>>>>>> is a
>>>>>>>>>> waste of time, as the time is better spent removing double-int from
>>>>>>>>>> the
>>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_
>>>>>>>>>> been
>>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to
>>>>>>>>>> vend
>>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>>> then all the
>>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>>> must then
>>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>>> double-int; it is
>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change
>>>>>>>>>> that.
>>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>>> get rid of it - you
>>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you
>>>>>>>>> for
>>>>>>>>> example
>>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>>
>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at
>>>>>>>>> least_
>>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and
>>>>>>>>> 2.
>>>>>>>>>
>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due
>>>>>>>>> to
>>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>>> important
>>>>>>>>> paths away from using tree math to using double_ints _for speed
>>>>>>>>> reasons_.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>> i do not know why you believe this about the speed.     double int
>>>>>>>> always
>>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>>
>>>>>>>> the thing about wide int, is that since it does math to the
>>>>>>>> precision's
>>>>>>>> size, it almost never does uses synthetic operations since the sizes
>>>>>>>> for
>>>>>>>> almost every instance can be done using the native math on the
>>>>>>>> machine.
>>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>>> natively.
>>>>>>>> I seriously doubt that you are going to do TI mode math much faster
>>>>>>>> than i
>>>>>>>> do it and if you do who cares.
>>>>>>>>
>>>>>>>> the number of calls does not effect the performance in any negative
>>>>>>>> way and
>>>>>>>> it fact is more efficient since common things that require more than
>>>>>>>> one
>>>>>>>> operation in double in are typically done in a single operation.
>>>>>>> Simple double-int operations like
>>>>>>>
>>>>>>> inline double_int
>>>>>>> double_int::and_not (double_int b) const
>>>>>>> {
>>>>>>>     double_int result;
>>>>>>>     result.low = low & ~b.low;
>>>>>>>     result.high = high & ~b.high;
>>>>>>>     return result;
>>>>>>> }
>>>>>>>
>>>>>>> are always going to be faster than conditionally executing only one
>>>>>>> operation
>>>>>>> (but inside an offline function).
>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>>> appropriate here.
>>>>>>
>>>>>> It's interesting that you gave this example, since before you were
>>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>>> removed in favour of separate and() and not() operations, but why
>>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>> I was more concerned about fused operations that use precision
>>>>> or bitsize as input.  That is for example
>>>>>
>>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>>> +  bool only_sign_bit_p () const;
>>>>> The first is construct a wide-int with precision prec (and sign- or
>>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>>> should not be necessary and existing callers should be questioned
>>>>> instead of introducing it.
>>>>>
>>>>> In fact wide-int seems to have so many "fused" operations that
>>>>> we run out of sensible recognizable names for them.  Which results
>>>>> in a lot of confusion on what the functions actually do (at least for
>>>>> me).
>>>> Well, I suppose I can't really say anything useful either way on
>>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>>
>>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>>> of what's fused and what isn't.
>>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>>> operations introduced separately together with the places they are
>>>>> used.  In the current way it's way too tedious to go over all of them
>>>>> and match them with callers, lookup enough context and then
>>>>> make up my mind on whether the caller should do sth different or not.
>>>>>
>>>>> Thus, consider the big initial API a reason that all this review takes
>>>>> so long ...
>>>>>
>>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>>> routines that pull out the integer representation again and create a
>>>>>> tree to hold the result, is going to be much slower than using either
>>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>>> measurable difference between double_int and wide_int here.
>>>>>>
>>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>>> wide integer really shouldn't have any significance.
>>>>>>
>>>>>> Your main complaint seems to be that the wide_int API is different
>>>>>> from the double_int one, but we can't literally use the same API, since
>>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>>> Having a precision that is separate from the underlying representation
>>>>>> is IMO the most important feature of wide_int, so:
>>>>>>
>>>>>>      template wide_int<2> double_int;
>>>>>>
>>>>>> is never going to be a drop-in, API-compatible replacement for
>>>>>> double_int.
>>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>>> you have a double_int<N> class.
>>>> But you don't!  Because...
>>>>
>>>>> Thus wide-int should have a base
>>>>> of that kind and just add precision / bitsize ontop of that.  It
>>>>> wouldn't
>>>>> be a step forward if we end up replacing double_int uses with
>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>>> would it?
>>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>>> the hood, but the value of N is an internal implementation detail.
>>>> No operations are done to N HWIs, they're done to the number of bits
>>>> in the operands.  Whereas a double_int<N> class does everything to N
>>>> HWIs.
>>> If that's the only effect then either bitsize or precision is redundant
>>> (and
>>> we also have len ...).  Note I did not mention len above, thus the base
>>> class would retain 'len' and double-int would simply use 2 for it
>>> (if you don't template it but make it variable).
>>>
>>> Richard.
>>>
>> NO, in your own words, there are two parts of the compiler that want the
>> infinite model.   The rest wants to do the math the way the target does it.
>> My version now accommodates both.    In tree vrp it scans the gimple and
>> determines what the largest type is and that is the basis of all of the math
>> in this pass.  If you just make double int bigger, then you are paying for
>> big math everywhere.
> You have an artificial limit on what 'len' can be.  And you do not accomodate
> users that do not want to pay the storage penalty for that arbitrary upper limit
> choice.  That's all because 'len' may grow (mutate).  You could alternatively
> not allow bitsize to grow / mutate and have allocation tied to bitsize instead
> of len.
It is not artificial, it is based on the target.  I chose to do it that 
way because i "knew" that having a fixed size would be faster than doing 
an alloca on for every wide-int.    If we feel that it is important to 
be able to do truly arbitrary infinite precision arithmetic (as opposed 
to just some fixed amount larger than the size of the type) then we can 
have a subclass that does this. However, there is not a need to do this 
in the compiler currently and so using this as an argument against 
wide-int is really not fair.

However, as the machines allow wider math, your really do not want to 
penalize every program that is compiled with this burden.   It was ok 
for double int to do so, but when machines commonly have oi mode, it 
will still be the case that more than 99% of the variables will be 64 
bits or less.

I do worry a lot about adding layers like this on the efficiency of the 
compiler.   you made a valid point that a lot of the double int routines 
could be done inline and my plan is take the parts of the functions that 
notice that the precision fits in a hwi and do them inline.

If your proposed solution causes a function call to access the elements, 
then we are doomed.
> Ideally the wide-int interface would have two storage models:
>
> class alloc_storage
> {
>    unsigned len; /* or bitsize */
>    HOST_WIDE_INT *hwis;
>
>    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
> }
>
> class max_mode
> {
>    HOST_WIDE_INT hwis[largest integer mode size in hwi];
>
>    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
> }
>
> template <class storage>
> class wide_int : storage
> {
>
> so you can even re-use the (const) in-place storage of INTEGER_CSTs
> or RTL CONST_WIDEs.  And VRP would simply have its own storage
> supporting 2 times the largest integer mode (or whatever choice it has).
> And double-int would simply embed two.
>
> Maybe this is the perfect example for introducing virtual functions as well
> to ease inter-operability between the wide-int variants without making each
> member a template on the 2nd wide-int operand (it's of course auto-deduced,
> but well ...).
>
> The above is just a brain-dump, details may need further thinking
> Like the operator[], maybe the storage model should just be able
> to return a pointer to the array of HWIs, this way the actual workers
> can be out-of-line and non-templates.
>
> Richard.
>
>>>> Richard
>>
Richard Biener Oct. 31, 2012, 2:24 p.m. UTC | #24
On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck
<zadeck@naturalbridge.com> wrote:
>
> On 10/31/2012 10:05 AM, Richard Biener wrote:
>>
>> On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
>> <zadeck@naturalbridge.com> wrote:
>>>
>>> On 10/31/2012 08:11 AM, Richard Biener wrote:
>>>>
>>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>>
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>
>>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>>>
>>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>>>
>>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>
>>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>>>> <mikestump@comcast.net>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener
>>>>>>>>>>> <richard.guenther@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max
>>>>>>>>>>>>>> byte
>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not believe that this can happen.   The core compiler
>>>>>>>>>>>>> includes all
>>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>>>
>>>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>>>
>>>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes
>>>>>>>>>>> that
>>>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>>>> bigger, we'd
>>>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>>>
>>>>>>>>>>>>>> I still would like to have the ability to provide
>>>>>>>>>>>>>> specializations of
>>>>>>>>>>>>>> wide_int
>>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>>>> templated
>>>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2>
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you want to go down this path after the patches get in, go
>>>>>>>>>>>>> for
>>>>>>>>>>>>> it.
>>>>>>>>>>>>> I
>>>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>>>> This was not meant to be a plug in replacement for double int.
>>>>>>>>>>>>> This
>>>>>>>>>>>>> goal of
>>>>>>>>>>>>> this patch is to get the compiler to do the constant math the
>>>>>>>>>>>>> way
>>>>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>> target does it.   Any such instantiation is by definition
>>>>>>>>>>>>> placing
>>>>>>>>>>>>> some
>>>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, what I don't really like is that we now have two
>>>>>>>>>>>> implementations
>>>>>>>>>>>> of functions that perform integer math on two-HWI sized
>>>>>>>>>>>> integers.
>>>>>>>>>>>> What
>>>>>>>>>>>> I also don't like too much is that we have two different
>>>>>>>>>>>> interfaces to
>>>>>>>>>>>> operate
>>>>>>>>>>>> on them!  Can't you see how I come to not liking this?
>>>>>>>>>>>> Especially
>>>>>>>>>>>> the
>>>>>>>>>>>> latter …
>>>>>>>>>>>
>>>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int
>>>>>>>>>>> is a
>>>>>>>>>>> waste of time, as the time is better spent removing double-int
>>>>>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>> compiler.  All the necessary semantics and code of double-int
>>>>>>>>>>> _has_
>>>>>>>>>>> been
>>>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way
>>>>>>>>>>> to
>>>>>>>>>>> vend
>>>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>>>> then all the
>>>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>>>> must then
>>>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>>>> double-int; it is
>>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can
>>>>>>>>>>> change
>>>>>>>>>>> that.
>>>>>>>>>>
>>>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we
>>>>>>>>>> will
>>>>>>>>>> get rid of it - you
>>>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you
>>>>>>>>>> for
>>>>>>>>>> example
>>>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>>>
>>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at
>>>>>>>>>> least_
>>>>>>>>>> templating wide_int on 'len' and providing specializations for 1
>>>>>>>>>> and
>>>>>>>>>> 2.
>>>>>>>>>>
>>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen
>>>>>>>>>> due
>>>>>>>>>> to
>>>>>>>>>> the allocation overhead we pay, but in recent years we
>>>>>>>>>> transitioned
>>>>>>>>>> important
>>>>>>>>>> paths away from using tree math to using double_ints _for speed
>>>>>>>>>> reasons_.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>> i do not know why you believe this about the speed.     double int
>>>>>>>>> always
>>>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>>>
>>>>>>>>> the thing about wide int, is that since it does math to the
>>>>>>>>> precision's
>>>>>>>>> size, it almost never does uses synthetic operations since the
>>>>>>>>> sizes
>>>>>>>>> for
>>>>>>>>> almost every instance can be done using the native math on the
>>>>>>>>> machine.
>>>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>>>> natively.
>>>>>>>>> I seriously doubt that you are going to do TI mode math much faster
>>>>>>>>> than i
>>>>>>>>> do it and if you do who cares.
>>>>>>>>>
>>>>>>>>> the number of calls does not effect the performance in any negative
>>>>>>>>> way and
>>>>>>>>> it fact is more efficient since common things that require more
>>>>>>>>> than
>>>>>>>>> one
>>>>>>>>> operation in double in are typically done in a single operation.
>>>>>>>>
>>>>>>>> Simple double-int operations like
>>>>>>>>
>>>>>>>> inline double_int
>>>>>>>> double_int::and_not (double_int b) const
>>>>>>>> {
>>>>>>>>     double_int result;
>>>>>>>>     result.low = low & ~b.low;
>>>>>>>>     result.high = high & ~b.high;
>>>>>>>>     return result;
>>>>>>>> }
>>>>>>>>
>>>>>>>> are always going to be faster than conditionally executing only one
>>>>>>>> operation
>>>>>>>> (but inside an offline function).
>>>>>>>
>>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>>>> appropriate here.
>>>>>>>
>>>>>>> It's interesting that you gave this example, since before you were
>>>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>>>> removed in favour of separate and() and not() operations, but why
>>>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>>>
>>>>>> I was more concerned about fused operations that use precision
>>>>>> or bitsize as input.  That is for example
>>>>>>
>>>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>>>> +  bool only_sign_bit_p () const;
>>>>>>
>>>>>> The first is construct a wide-int with precision prec (and sign- or
>>>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>>>> should not be necessary and existing callers should be questioned
>>>>>> instead of introducing it.
>>>>>>
>>>>>> In fact wide-int seems to have so many "fused" operations that
>>>>>> we run out of sensible recognizable names for them.  Which results
>>>>>> in a lot of confusion on what the functions actually do (at least for
>>>>>> me).
>>>>>
>>>>> Well, I suppose I can't really say anything useful either way on
>>>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>>>
>>>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>>>> of what's fused and what isn't.
>>>>>>
>>>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>>>> operations introduced separately together with the places they are
>>>>>> used.  In the current way it's way too tedious to go over all of them
>>>>>> and match them with callers, lookup enough context and then
>>>>>> make up my mind on whether the caller should do sth different or not.
>>>>>>
>>>>>> Thus, consider the big initial API a reason that all this review takes
>>>>>> so long ...
>>>>>>
>>>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>>>> routines that pull out the integer representation again and create a
>>>>>>> tree to hold the result, is going to be much slower than using either
>>>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>>>> measurable difference between double_int and wide_int here.
>>>>>>>
>>>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>>>> wide integer really shouldn't have any significance.
>>>>>>>
>>>>>>> Your main complaint seems to be that the wide_int API is different
>>>>>>> from the double_int one, but we can't literally use the same API,
>>>>>>> since
>>>>>>> double_int has an implicit precision and bitsize, and wide_int
>>>>>>> doesn't.
>>>>>>> Having a precision that is separate from the underlying
>>>>>>> representation
>>>>>>> is IMO the most important feature of wide_int, so:
>>>>>>>
>>>>>>>      template wide_int<2> double_int;
>>>>>>>
>>>>>>> is never going to be a drop-in, API-compatible replacement for
>>>>>>> double_int.
>>>>>>
>>>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>>>> you have a double_int<N> class.
>>>>>
>>>>> But you don't!  Because...
>>>>>
>>>>>> Thus wide-int should have a base
>>>>>> of that kind and just add precision / bitsize ontop of that.  It
>>>>>> wouldn't
>>>>>> be a step forward if we end up replacing double_int uses with
>>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>>>> would it?
>>>>>
>>>>> ...the precision and bitsize isn't an optional extra, either
>>>>> conceptually
>>>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>>>> the hood, but the value of N is an internal implementation detail.
>>>>> No operations are done to N HWIs, they're done to the number of bits
>>>>> in the operands.  Whereas a double_int<N> class does everything to N
>>>>> HWIs.
>>>>
>>>> If that's the only effect then either bitsize or precision is redundant
>>>> (and
>>>> we also have len ...).  Note I did not mention len above, thus the base
>>>> class would retain 'len' and double-int would simply use 2 for it
>>>> (if you don't template it but make it variable).
>>>>
>>>> Richard.
>>>>
>>> NO, in your own words, there are two parts of the compiler that want the
>>> infinite model.   The rest wants to do the math the way the target does
>>> it.
>>> My version now accommodates both.    In tree vrp it scans the gimple and
>>> determines what the largest type is and that is the basis of all of the
>>> math
>>> in this pass.  If you just make double int bigger, then you are paying
>>> for
>>> big math everywhere.
>>
>> You have an artificial limit on what 'len' can be.  And you do not
>> accomodate
>> users that do not want to pay the storage penalty for that arbitrary upper
>> limit
>> choice.  That's all because 'len' may grow (mutate).  You could
>> alternatively
>> not allow bitsize to grow / mutate and have allocation tied to bitsize
>> instead
>> of len.
>
> It is not artificial, it is based on the target.  I chose to do it that way
> because i "knew" that having a fixed size would be faster than doing an
> alloca on for every wide-int.    If we feel that it is important to be able
> to do truly arbitrary infinite precision arithmetic (as opposed to just some
> fixed amount larger than the size of the type) then we can have a subclass
> that does this. However, there is not a need to do this in the compiler
> currently and so using this as an argument against wide-int is really not
> fair.

Well, it is artifical as you had to increase it by a factor of two to handle
the use in VRP.  It is also "artificial" as it wastes storage.

> However, as the machines allow wider math, your really do not want to
> penalize every program that is compiled with this burden.   It was ok for
> double int to do so, but when machines commonly have oi mode, it will still
> be the case that more than 99% of the variables will be 64 bits or less.
>
> I do worry a lot about adding layers like this on the efficiency of the
> compiler.   you made a valid point that a lot of the double int routines
> could be done inline and my plan is take the parts of the functions that
> notice that the precision fits in a hwi and do them inline.
>
> If your proposed solution causes a function call to access the elements,
> then we are doomed.

Well, if it causes a function call to access a pointer to the array of elements
which you can cache then it wouldn't be that bad.  And with templates
we can inline the access anyway.

Richard.

>> Ideally the wide-int interface would have two storage models:
>>
>> class alloc_storage
>> {
>>    unsigned len; /* or bitsize */
>>    HOST_WIDE_INT *hwis;
>>
>>    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
>> }
>>
>> class max_mode
>> {
>>    HOST_WIDE_INT hwis[largest integer mode size in hwi];
>>
>>    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
>> }
>>
>> template <class storage>
>> class wide_int : storage
>> {
>>
>> so you can even re-use the (const) in-place storage of INTEGER_CSTs
>> or RTL CONST_WIDEs.  And VRP would simply have its own storage
>> supporting 2 times the largest integer mode (or whatever choice it has).
>> And double-int would simply embed two.
>>
>> Maybe this is the perfect example for introducing virtual functions as
>> well
>> to ease inter-operability between the wide-int variants without making
>> each
>> member a template on the 2nd wide-int operand (it's of course
>> auto-deduced,
>> but well ...).
>>
>> The above is just a brain-dump, details may need further thinking
>> Like the operator[], maybe the storage model should just be able
>> to return a pointer to the array of HWIs, this way the actual workers
>> can be out-of-line and non-templates.
>>
>> Richard.
>>
>>>>> Richard
>>>
>>>
>
Kenneth Zadeck Oct. 31, 2012, 2:24 p.m. UTC | #25
On 10/31/2012 09:54 AM, Richard Biener wrote:
> On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> But that means that wide_int has to model a P-bit operation as a
>>>> "normal" len*HOST_WIDE_INT operation and then fix up the result
>>>> after the fact, which seems unnecessarily convoluted.
>>> It does that right now.  The operations are carried out in a loop
>>> over len HOST_WIDE_INT parts, the last HWI is then special-treated
>>> to account for precision/size.  (yes, 'len' is also used as optimization - the
>>> fact that len ends up being mutable is another thing I dislike about
>>> wide-int.  If wide-ints are cheap then all ops should be non-mutating
>>> (at least to 'len')).
>> But the point of having a mutating len is that things like zero and -1
>> are common even for OImode values.  So if you're doing someting potentially
>> expensive like OImode multiplication, why do it to the number of
>> HOST_WIDE_INTs needed for an OImode value when the value we're
>> processing has only one significant HOST_WIDE_INT?
> I don't propose doing that.  I propose that no wide-int member function
> may _change_ it's len (to something larger).  Only that way you can
> avoid allocating wasted space for zero and -1.  That way also the
> artificial limit on 2 * largest-int-mode-hwis goes.
it is now 4x not 2x to accomodate the extra bit in tree-vrp.

remember that the space burden is minimal.    wide-ints are not 
persistent and there are never more than a handful at a time.

>>>>   I still don't
>>>> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
>>>> X*HOST_WIDE_INT operation for any X) has any special meaning.
>>> Well, the same reason as a HOST_WIDE_INT variable has a meaning.
>>> We use it to constrain what we (efficiently) want to work on.  For example
>>> CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
>>> doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).
>> But what about targets with modes wider than TImode?  Would double_int
>> still be appropriate then?  If not, why does CCP have to use a templated
>> type with a fixed number of HWIs (and all arithmetic done to a fixed
>> number of HWIs) rather than one that can adapt to the runtime values,
>> like wide_int can?
> Because nobody cares about accurate bit-tracking for modes larger than
> TImode.  And because no convenient abstraction was available ;)
yes, but tree-vrp does not even work for timode.  and there are not 
tests to scale it back when it does see ti-mode.   I understand that 
these can be added, but they so far have not been.

I would also point out that i was corrected on this point by (i believe) 
lawrence.   He points out that tree-vrp is still important for 
converting signed to unsigned for larger modes.


>>> Oh, and I don't necessary see a use of double_int in its current form
>>> but for an integer representation on the host that is efficient to manipulate
>>> integer constants of a target dependent size.  For example the target
>>> detail that we have partial integer modes with bitsize > precision and that
>>> the bits > precision appearantly have a meaning when looking at the
>>> bit-representation of a constant should not be part of the base class
>>> of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
>>> about the reason we track bitsize in addition to precision - I think it's
>>> abstraction at the wrong level, the tree level does fine without knowing
>>> about bitsize).
>> TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
>> tracked for shift truncation, and if so, I agree it makes sense
>> to do that separately.
> So, can we please remove all traces of bitsize from wide-int then?
>
>> But anyway, this whole discussion seems to have reached a stalemate.
>> Or I suppose a de-facto rejection, since you're the only person in
>> a position to approve the thing :-)
> There are many (silent) people that are able to approve the thing.  But the
> point is I have too many issues with the current patch that I'm unable
> to point at a specific thing I want Kenny to change after which the patch
> would be fine.  So I rely on some guesswork from Kenny giving my
> advices "leaner API", "less fused ops", "get rid of bitsize", "think of
> abstracting the core HWI[len] operation", "there should be no tree or
> RTL dependencies in the wide-int API" to produce an updated variant.
> Which of course takes time, which of course crosses my vacation, which
> in the end means it isn't going to make 4.8 (I _do_ like the idea of not
> having a dependence on host properties for integer constant representation).
>
> Btw, a good hint at what a minimal wide-int API would look like is if
> you _just_ replace double-int users with it.  Then you obviously have to
> implement only the double-int interface and conversion from/to double-int.
>
> Richard.
>
>
>> Richard
Kenneth Zadeck Oct. 31, 2012, 2:26 p.m. UTC | #26
On 10/31/2012 10:24 AM, Richard Biener wrote:
> On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck
> <zadeck@naturalbridge.com> wrote:
>> On 10/31/2012 10:05 AM, Richard Biener wrote:
>>> On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
>>> <zadeck@naturalbridge.com> wrote:
>>>> On 10/31/2012 08:11 AM, Richard Biener wrote:
>>>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>>>>> <mikestump@comcast.net>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener
>>>>>>>>>>>> <richard.guenther@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max
>>>>>>>>>>>>>>> byte
>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>>>> I do not believe that this can happen.   The core compiler
>>>>>>>>>>>>>> includes all
>>>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes
>>>>>>>>>>>> that
>>>>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>>>>> bigger, we'd
>>>>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still would like to have the ability to provide
>>>>>>>>>>>>>>> specializations of
>>>>>>>>>>>>>>> wide_int
>>>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>>>>> templated
>>>>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2>
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>>>> If you want to go down this path after the patches get in, go
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>>>>> This was not meant to be a plug in replacement for double int.
>>>>>>>>>>>>>> This
>>>>>>>>>>>>>> goal of
>>>>>>>>>>>>>> this patch is to get the compiler to do the constant math the
>>>>>>>>>>>>>> way
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> target does it.   Any such instantiation is by definition
>>>>>>>>>>>>>> placing
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>>>> Well, what I don't really like is that we now have two
>>>>>>>>>>>>> implementations
>>>>>>>>>>>>> of functions that perform integer math on two-HWI sized
>>>>>>>>>>>>> integers.
>>>>>>>>>>>>> What
>>>>>>>>>>>>> I also don't like too much is that we have two different
>>>>>>>>>>>>> interfaces to
>>>>>>>>>>>>> operate
>>>>>>>>>>>>> on them!  Can't you see how I come to not liking this?
>>>>>>>>>>>>> Especially
>>>>>>>>>>>>> the
>>>>>>>>>>>>> latter …
>>>>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int
>>>>>>>>>>>> is a
>>>>>>>>>>>> waste of time, as the time is better spent removing double-int
>>>>>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>> compiler.  All the necessary semantics and code of double-int
>>>>>>>>>>>> _has_
>>>>>>>>>>>> been
>>>>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way
>>>>>>>>>>>> to
>>>>>>>>>>>> vend
>>>>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>>>>> then all the
>>>>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>>>>> must then
>>>>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>>>>> double-int; it is
>>>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can
>>>>>>>>>>>> change
>>>>>>>>>>>> that.
>>>>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we
>>>>>>>>>>> will
>>>>>>>>>>> get rid of it - you
>>>>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you
>>>>>>>>>>> for
>>>>>>>>>>> example
>>>>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>>>>
>>>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at
>>>>>>>>>>> least_
>>>>>>>>>>> templating wide_int on 'len' and providing specializations for 1
>>>>>>>>>>> and
>>>>>>>>>>> 2.
>>>>>>>>>>>
>>>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen
>>>>>>>>>>> due
>>>>>>>>>>> to
>>>>>>>>>>> the allocation overhead we pay, but in recent years we
>>>>>>>>>>> transitioned
>>>>>>>>>>> important
>>>>>>>>>>> paths away from using tree math to using double_ints _for speed
>>>>>>>>>>> reasons_.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>> i do not know why you believe this about the speed.     double int
>>>>>>>>>> always
>>>>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>>>>
>>>>>>>>>> the thing about wide int, is that since it does math to the
>>>>>>>>>> precision's
>>>>>>>>>> size, it almost never does uses synthetic operations since the
>>>>>>>>>> sizes
>>>>>>>>>> for
>>>>>>>>>> almost every instance can be done using the native math on the
>>>>>>>>>> machine.
>>>>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>>>>> natively.
>>>>>>>>>> I seriously doubt that you are going to do TI mode math much faster
>>>>>>>>>> than i
>>>>>>>>>> do it and if you do who cares.
>>>>>>>>>>
>>>>>>>>>> the number of calls does not effect the performance in any negative
>>>>>>>>>> way and
>>>>>>>>>> it fact is more efficient since common things that require more
>>>>>>>>>> than
>>>>>>>>>> one
>>>>>>>>>> operation in double in are typically done in a single operation.
>>>>>>>>> Simple double-int operations like
>>>>>>>>>
>>>>>>>>> inline double_int
>>>>>>>>> double_int::and_not (double_int b) const
>>>>>>>>> {
>>>>>>>>>      double_int result;
>>>>>>>>>      result.low = low & ~b.low;
>>>>>>>>>      result.high = high & ~b.high;
>>>>>>>>>      return result;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> are always going to be faster than conditionally executing only one
>>>>>>>>> operation
>>>>>>>>> (but inside an offline function).
>>>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>>>>> appropriate here.
>>>>>>>>
>>>>>>>> It's interesting that you gave this example, since before you were
>>>>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>>>>> removed in favour of separate and() and not() operations, but why
>>>>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>>>> I was more concerned about fused operations that use precision
>>>>>>> or bitsize as input.  That is for example
>>>>>>>
>>>>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>>>>> +  bool only_sign_bit_p () const;
>>>>>>> The first is construct a wide-int with precision prec (and sign- or
>>>>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>>>>> should not be necessary and existing callers should be questioned
>>>>>>> instead of introducing it.
>>>>>>>
>>>>>>> In fact wide-int seems to have so many "fused" operations that
>>>>>>> we run out of sensible recognizable names for them.  Which results
>>>>>>> in a lot of confusion on what the functions actually do (at least for
>>>>>>> me).
>>>>>> Well, I suppose I can't really say anything useful either way on
>>>>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>>>>
>>>>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>>>>> of what's fused and what isn't.
>>>>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>>>>> operations introduced separately together with the places they are
>>>>>>> used.  In the current way it's way too tedious to go over all of them
>>>>>>> and match them with callers, lookup enough context and then
>>>>>>> make up my mind on whether the caller should do sth different or not.
>>>>>>>
>>>>>>> Thus, consider the big initial API a reason that all this review takes
>>>>>>> so long ...
>>>>>>>
>>>>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>>>>> routines that pull out the integer representation again and create a
>>>>>>>> tree to hold the result, is going to be much slower than using either
>>>>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>>>>> measurable difference between double_int and wide_int here.
>>>>>>>>
>>>>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>>>>> wide integer really shouldn't have any significance.
>>>>>>>>
>>>>>>>> Your main complaint seems to be that the wide_int API is different
>>>>>>>> from the double_int one, but we can't literally use the same API,
>>>>>>>> since
>>>>>>>> double_int has an implicit precision and bitsize, and wide_int
>>>>>>>> doesn't.
>>>>>>>> Having a precision that is separate from the underlying
>>>>>>>> representation
>>>>>>>> is IMO the most important feature of wide_int, so:
>>>>>>>>
>>>>>>>>       template wide_int<2> double_int;
>>>>>>>>
>>>>>>>> is never going to be a drop-in, API-compatible replacement for
>>>>>>>> double_int.
>>>>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>>>>> you have a double_int<N> class.
>>>>>> But you don't!  Because...
>>>>>>
>>>>>>> Thus wide-int should have a base
>>>>>>> of that kind and just add precision / bitsize ontop of that.  It
>>>>>>> wouldn't
>>>>>>> be a step forward if we end up replacing double_int uses with
>>>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>>>>> would it?
>>>>>> ...the precision and bitsize isn't an optional extra, either
>>>>>> conceptually
>>>>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>>>>> the hood, but the value of N is an internal implementation detail.
>>>>>> No operations are done to N HWIs, they're done to the number of bits
>>>>>> in the operands.  Whereas a double_int<N> class does everything to N
>>>>>> HWIs.
>>>>> If that's the only effect then either bitsize or precision is redundant
>>>>> (and
>>>>> we also have len ...).  Note I did not mention len above, thus the base
>>>>> class would retain 'len' and double-int would simply use 2 for it
>>>>> (if you don't template it but make it variable).
>>>>>
>>>>> Richard.
>>>>>
>>>> NO, in your own words, there are two parts of the compiler that want the
>>>> infinite model.   The rest wants to do the math the way the target does
>>>> it.
>>>> My version now accommodates both.    In tree vrp it scans the gimple and
>>>> determines what the largest type is and that is the basis of all of the
>>>> math
>>>> in this pass.  If you just make double int bigger, then you are paying
>>>> for
>>>> big math everywhere.
>>> You have an artificial limit on what 'len' can be.  And you do not
>>> accomodate
>>> users that do not want to pay the storage penalty for that arbitrary upper
>>> limit
>>> choice.  That's all because 'len' may grow (mutate).  You could
>>> alternatively
>>> not allow bitsize to grow / mutate and have allocation tied to bitsize
>>> instead
>>> of len.
>> It is not artificial, it is based on the target.  I chose to do it that way
>> because i "knew" that having a fixed size would be faster than doing an
>> alloca on for every wide-int.    If we feel that it is important to be able
>> to do truly arbitrary infinite precision arithmetic (as opposed to just some
>> fixed amount larger than the size of the type) then we can have a subclass
>> that does this. However, there is not a need to do this in the compiler
>> currently and so using this as an argument against wide-int is really not
>> fair.
> Well, it is artifical as you had to increase it by a factor of two to handle
> the use in VRP.  It is also "artificial" as it wastes storage.
>
>> However, as the machines allow wider math, your really do not want to
>> penalize every program that is compiled with this burden.   It was ok for
>> double int to do so, but when machines commonly have oi mode, it will still
>> be the case that more than 99% of the variables will be 64 bits or less.
>>
>> I do worry a lot about adding layers like this on the efficiency of the
>> compiler.   you made a valid point that a lot of the double int routines
>> could be done inline and my plan is take the parts of the functions that
>> notice that the precision fits in a hwi and do them inline.
>>
>> If your proposed solution causes a function call to access the elements,
>> then we are doomed.
> Well, if it causes a function call to access a pointer to the array of elements
> which you can cache then it wouldn't be that bad.  And with templates
> we can inline the access anyway.
I do not see how to do this with templates without building in the 
largest template size.


> Richard.
>
>>> Ideally the wide-int interface would have two storage models:
>>>
>>> class alloc_storage
>>> {
>>>     unsigned len; /* or bitsize */
>>>     HOST_WIDE_INT *hwis;
>>>
>>>     HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
>>> }
>>>
>>> class max_mode
>>> {
>>>     HOST_WIDE_INT hwis[largest integer mode size in hwi];
>>>
>>>     HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
>>> }
>>>
>>> template <class storage>
>>> class wide_int : storage
>>> {
>>>
>>> so you can even re-use the (const) in-place storage of INTEGER_CSTs
>>> or RTL CONST_WIDEs.  And VRP would simply have its own storage
>>> supporting 2 times the largest integer mode (or whatever choice it has).
>>> And double-int would simply embed two.
>>>
>>> Maybe this is the perfect example for introducing virtual functions as
>>> well
>>> to ease inter-operability between the wide-int variants without making
>>> each
>>> member a template on the 2nd wide-int operand (it's of course
>>> auto-deduced,
>>> but well ...).
>>>
>>> The above is just a brain-dump, details may need further thinking
>>> Like the operator[], maybe the storage model should just be able
>>> to return a pointer to the array of HWIs, this way the actual workers
>>> can be out-of-line and non-templates.
>>>
>>> Richard.
>>>
>>>>>> Richard
>>>>
Kenneth Zadeck Oct. 31, 2012, 2:31 p.m. UTC | #27
On 10/31/2012 08:44 AM, Richard Biener wrote:
> On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>>> <rdsandiford@googlemail.com> wrote:
>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>>> <mikestump@comcast.net> wrote:
>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>>
>>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>>>> includes all
>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>>> bigger, we'd
>>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>>
>>>>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>>>>> wide_int
>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>>> templated
>>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>>
>>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>>>>> I
>>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>>>>> goal of
>>>>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>>>>> the
>>>>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>>>>> operate
>>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>>>>> latter …
>>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>>> then all the
>>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>>> must then
>>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>>> double-int; it is
>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>>> get rid of it - you
>>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>>>>> example
>>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>>
>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>>>>
>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>>> important
>>>>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>> i do not know why you believe this about the speed.     double int always
>>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>>
>>>>>>>> the thing about wide int, is that since it does math to the precision's
>>>>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>>>>> almost every instance can be done using the native math on the machine.
>>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>>> natively.
>>>>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>>>>> do it and if you do who cares.
>>>>>>>>
>>>>>>>> the number of calls does not effect the performance in any
>>>>>>>> negative way and
>>>>>>>> it fact is more efficient since common things that require more than one
>>>>>>>> operation in double in are typically done in a single operation.
>>>>>>> Simple double-int operations like
>>>>>>>
>>>>>>> inline double_int
>>>>>>> double_int::and_not (double_int b) const
>>>>>>> {
>>>>>>>    double_int result;
>>>>>>>    result.low = low & ~b.low;
>>>>>>>    result.high = high & ~b.high;
>>>>>>>    return result;
>>>>>>> }
>>>>>>>
>>>>>>> are always going to be faster than conditionally executing only one
>>>>>>> operation
>>>>>>> (but inside an offline function).
>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>>> appropriate here.
>>>>>>
>>>>>> It's interesting that you gave this example, since before you were
>>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>>> removed in favour of separate and() and not() operations, but why
>>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>> I was more concerned about fused operations that use precision
>>>>> or bitsize as input.  That is for example
>>>>>
>>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>>> +  bool only_sign_bit_p () const;
>>>>> The first is construct a wide-int with precision prec (and sign- or
>>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>>> should not be necessary and existing callers should be questioned
>>>>> instead of introducing it.
>>>>>
>>>>> In fact wide-int seems to have so many "fused" operations that
>>>>> we run out of sensible recognizable names for them.  Which results
>>>>> in a lot of confusion on what the functions actually do (at least for me).
>>>> Well, I suppose I can't really say anything useful either way on
>>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>>
>>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>>> of what's fused and what isn't.
>>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>>> operations introduced separately together with the places they are
>>>>> used.  In the current way it's way too tedious to go over all of them
>>>>> and match them with callers, lookup enough context and then
>>>>> make up my mind on whether the caller should do sth different or not.
>>>>>
>>>>> Thus, consider the big initial API a reason that all this review takes
>>>>> so long ...
>>>>>
>>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>>> routines that pull out the integer representation again and create a
>>>>>> tree to hold the result, is going to be much slower than using either
>>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>>> measurable difference between double_int and wide_int here.
>>>>>>
>>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>>> wide integer really shouldn't have any significance.
>>>>>>
>>>>>> Your main complaint seems to be that the wide_int API is different
>>>>>> from the double_int one, but we can't literally use the same API, since
>>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>>> Having a precision that is separate from the underlying representation
>>>>>> is IMO the most important feature of wide_int, so:
>>>>>>
>>>>>>     template wide_int<2> double_int;
>>>>>>
>>>>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>>> you have a double_int<N> class.
>>>> But you don't!  Because...
>>>>
>>>>> Thus wide-int should have a base
>>>>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>>>>> be a step forward if we end up replacing double_int uses with
>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>>> would it?
>>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>>> the hood, but the value of N is an internal implementation detail.
>>>> No operations are done to N HWIs, they're done to the number of bits
>>>> in the operands.  Whereas a double_int<N> class does everything to N HWIs.
>>> If that's the only effect then either bitsize or precision is redundant (and
>>> we also have len ...).  Note I did not mention len above, thus the base
>>> class would retain 'len' and double-int would simply use 2 for it
>>> (if you don't template it but make it variable).
>> But that means that wide_int has to model a P-bit operation as a
>> "normal" len*HOST_WIDE_INT operation and then fix up the result
>> after the fact, which seems unnecessarily convoluted.
> It does that right now.  The operations are carried out in a loop
> over len HOST_WIDE_INT parts, the last HWI is then special-treated
> to account for precision/size.  (yes, 'len' is also used as optimization - the
> fact that len ends up being mutable is another thing I dislike about
> wide-int.  If wide-ints are cheap then all ops should be non-mutating
> (at least to 'len')).
There are currently two places where len is mutable.    They are parts 
where i did not believe that i had the expertise to rewrite the double 
int code.    They are in the conversion to and from float and the 
conversion to and from fixed.    In those cases the api is exposed so 
that wide-ints could be built.    I did not think that such temporary 
scaffolding would be the source of ridicule.

All of the uses of wide int as integers are truly functional.

>
>>   I still don't
>> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
>> X*HOST_WIDE_INT operation for any X) has any special meaning.
> Well, the same reason as a HOST_WIDE_INT variable has a meaning.
> We use it to constrain what we (efficiently) want to work on.  For example
> CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
> doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).
>
> Oh, and I don't necessary see a use of double_int in its current form
> but for an integer representation on the host that is efficient to manipulate
> integer constants of a target dependent size.  For example the target
> detail that we have partial integer modes with bitsize > precision and that
> the bits > precision appearantly have a meaning when looking at the
> bit-representation of a constant should not be part of the base class
> of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
> about the reason we track bitsize in addition to precision - I think it's
> abstraction at the wrong level, the tree level does fine without knowing
> about bitsize).
>
> Richard.
>
>> Richard
Kenneth Zadeck Oct. 31, 2012, 2:56 p.m. UTC | #28
On 10/31/2012 09:30 AM, Richard Sandiford wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>>> But that means that wide_int has to model a P-bit operation as a
>>> "normal" len*HOST_WIDE_INT operation and then fix up the result
>>> after the fact, which seems unnecessarily convoluted.
>> It does that right now.  The operations are carried out in a loop
>> over len HOST_WIDE_INT parts, the last HWI is then special-treated
>> to account for precision/size.  (yes, 'len' is also used as optimization - the
>> fact that len ends up being mutable is another thing I dislike about
>> wide-int.  If wide-ints are cheap then all ops should be non-mutating
>> (at least to 'len')).
> But the point of having a mutating len is that things like zero and -1
> are common even for OImode values.  So if you're doing someting potentially
> expensive like OImode multiplication, why do it to the number of
> HOST_WIDE_INTs needed for an OImode value when the value we're
> processing has only one significant HOST_WIDE_INT?
I think with a little thought i can add some special constructors and 
get rid of the mutating aspects of the interface.

>
>>>   I still don't
>>> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
>>> X*HOST_WIDE_INT operation for any X) has any special meaning.
>> Well, the same reason as a HOST_WIDE_INT variable has a meaning.
>> We use it to constrain what we (efficiently) want to work on.  For example
>> CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
>> doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).
> But what about targets with modes wider than TImode?  Would double_int
> still be appropriate then?  If not, why does CCP have to use a templated
> type with a fixed number of HWIs (and all arithmetic done to a fixed
> number of HWIs) rather than one that can adapt to the runtime values,
> like wide_int can?
>
>> Oh, and I don't necessary see a use of double_int in its current form
>> but for an integer representation on the host that is efficient to manipulate
>> integer constants of a target dependent size.  For example the target
>> detail that we have partial integer modes with bitsize > precision and that
>> the bits > precision appearantly have a meaning when looking at the
>> bit-representation of a constant should not be part of the base class
>> of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
>> about the reason we track bitsize in addition to precision - I think it's
>> abstraction at the wrong level, the tree level does fine without knowing
>> about bitsize).
> TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
> tracked for shift truncation, and if so, I agree it makes sense
> to do that separately.
>
> But anyway, this whole discussion seems to have reached a stalemate.
> Or I suppose a de-facto rejection, since you're the only person in
> a position to approve the thing :-)
>
> Richard
Mike Stump Oct. 31, 2012, 7:12 p.m. UTC | #29
On Oct 31, 2012, at 5:44 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> the
> fact that len ends up being mutable is another thing I dislike about
> wide-int.

We expose len for construction only, it is non-mutating.  During construction, there is no previous value.

>  If wide-ints are cheap then all ops should be non-mutating
> (at least to 'len')).

It is.  Construction modifies the object as construction must be defined as initializing the state of the data.  Before construction, there is no data, so, we are constructing the data, not mutating the data.  Surely you don't object to construction?
Mike Stump Oct. 31, 2012, 7:22 p.m. UTC | #30
On Oct 31, 2012, at 6:54 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> I propose that no wide-int member function
> may _change_ it's len (to something larger).

We never do that, so, we already do as you wish.  We construct wide ints, and we have member functions to construct values.  We need to construct values as some parts of the compiler want to create values.  The construction of values can be removed when the rest of the compiler no longer wishes to construct values. LTO is an example of a client that wanted to construct a value.  I'll let the LTO people chime in if they wish to no loner construct values.
Mike Stump Oct. 31, 2012, 7:45 p.m. UTC | #31
On Oct 31, 2012, at 7:05 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> You have an artificial limit on what 'len' can be.

No.  There is no limit, and nothing artificial.  We take the maximum of the needs of the target, the maximum of the front-ends and the maximum of the mid-end and the back-end.  We can drop a category, if that category no longer wishes to be our client.  Any client is free to stop using wide-int, any time they want.  For example, vrp could use gmp, if they wanted to, and the need to serve them drops.  You have imagined the cost is high to do this, the reality is all long lived objects are small, and all short lived objects are so transitory that we are talking about maybe 5 live at a time.

> And you do not accomodate
> users that do not want to pay the storage penalty for that arbitrary upper limit
> choice.

This is also wrong.  First, there is no arbitrary upper limit.  Second, all long lived objects are small.  We accommodated them by having all long lived objects be small.  The transitory objects are big, but there are only 5 of them alive at a time.

>  That's all because 'len' may grow (mutate).

This is also wrong.
diff mbox

Patch

diff --git a/gcc/wide-int.c b/gcc/wide-int.c
new file mode 100644
index 0000000..bf25467
--- /dev/null
+++ b/gcc/wide-int.c
@@ -0,0 +1,4248 @@ 
+/* Operations with very long integers.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Kenneth Zadeck <zadeck@naturalbridge.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "hwint.h"
+#include "wide-int.h"
+#include "rtl.h"
+#include "tree.h"
+#include "dumpfile.h"
+
+#ifdef DEBUG_WIDE_INT
+  /* Debugging routines.  */
+  static void debug_vw  (const char* name, int r, const wide_int& o0);
+  static void debug_vwh (const char* name, int r, const wide_int &o0,
+			 HOST_WIDE_INT o1);
+  static void debug_vww (const char* name, int r, const wide_int &o0,
+			 const wide_int &o1);
+  static void debug_wv (const char* name, const wide_int &r, int v0);
+  static void debug_wvv (const char* name, const wide_int &r, int v0,
+			 int v1);
+  static void debug_wvvv (const char* name, const wide_int &r, int v0,
+			  int v1, int v2);
+  static void debug_wwv (const char* name, const wide_int &r,
+			 const wide_int &o0, int v0);
+  static void debug_wwwvv (const char* name, const wide_int &r,
+			   const wide_int &o0, const wide_int &o1,
+			   int v0, int v1);
+  static void debug_ww (const char* name, const wide_int &r,
+			const wide_int &o0);
+  static void debug_www (const char* name, const wide_int &r,
+			 const wide_int &o0, const wide_int &o1);
+  static void debug_wwwv (const char* name, const wide_int &r,
+			  const wide_int &o0, const wide_int &o1,
+			  int v0);
+  static void debug_wwww (const char* name, const wide_int &r,
+			  const wide_int &o0, const wide_int &o1, 
+			  const wide_int &o2);
+#endif
+// using wide_int::;
+
+/* Debugging routines.  */
+
+/* This is the maximal size of the buffer needed for dump.  */
+const int MAX = 2 * (MAX_BITSIZE_MODE_ANY_INT / 4
+		     + MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 32);
+
+/*
+ * Internal utilities.
+ */
+
+/* Quantities to deal with values that hold half of a wide int.  Used
+   in multiply and divide.  */
+#define HALF_INT_MASK (((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1)
+
+#define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT)
+#define BLOCKS_NEEDED(PREC) \
+  (((PREC) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT)
+
+/*
+ * Conversion routines in and out of wide-int.
+ */
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, unsigned int bitsize, unsigned int precision)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    op0 = sext_hwi (op0, precision);
+
+  result.val[0] = op0;
+  result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    {
+      char buf0[MAX];
+      fprintf (dump_file, "%s: %s = " HOST_WIDE_INT_PRINT_HEX "\n",
+	       "wide_int::from_shwi", result.dump (buf0), op0);
+    }
+#endif
+
+  return result;
+}
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  The overflow bit are set if the number was too
+   large to fit in the mode.  */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, unsigned int bitsize,
+		     unsigned int precision, bool *overflow)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      HOST_WIDE_INT t = sext_hwi (op0, precision);
+      if (t != op0)
+	*overflow = true; 
+      op0 = t;
+    }
+
+  result.val[0] = op0;
+  result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    {
+      char buf0[MAX];
+      fprintf (dump_file, "%s: %s = " HOST_WIDE_INT_PRINT_HEX "\n",
+	       "wide_int::from_shwi", result.dump (buf0), op0);
+    }
+#endif
+
+  return result;
+}
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0,
+		     unsigned int bitsize, unsigned int precision)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    op0 = zext_hwi (op0, precision);
+
+  result.val[0] = op0;
+
+  /* If the top bit is a 1, we need to add another word of 0s since
+     that would not expand the right value since the infinite
+     expansion of any unsigned number must have 0s at the top.  */
+  if ((HOST_WIDE_INT)op0 < 0 && precision > HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[1] = 0;
+      result.len = 2;
+    }
+  else
+    result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    {
+      char buf0[MAX];
+      fprintf (dump_file, "%s: %s = " HOST_WIDE_INT_PRINT_HEX "\n",
+	       "wide_int::from_uhwi", result.dump (buf0), op0);
+    }
+#endif
+
+  return result;
+}
+
+/* Convert OP0 into a wide int of BITSIZE and PRECISION.  If the
+   precision is less than HOST_BITS_PER_WIDE_INT, zero extend the
+   value of the word.  The overflow bit are set if the number was too
+   large to fit in the mode.  */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0, unsigned int bitsize, 
+		     unsigned int precision, bool *overflow)
+{
+  wide_int result;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      unsigned HOST_WIDE_INT t = zext_hwi (op0, precision);
+      if (t != op0)
+	*overflow = true; 
+      op0 = t;
+    }
+
+  result.val[0] = op0;
+
+  /* If the top bit is a 1, we need to add another word of 0s since
+     that would not expand the right value since the infinite
+     expansion of any unsigned number must have 0s at the top.  */
+  if ((HOST_WIDE_INT)op0 < 0 && precision > HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[1] = 0;
+      result.len = 2;
+    }
+  else
+    result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    {
+      char buf0[MAX];
+      fprintf (dump_file, "%s: %s = " HOST_WIDE_INT_PRINT_HEX "\n",
+	       "wide_int::from_uhwi", result.dump (buf0), op0);
+    }
+#endif
+
+  return result;
+}
+
+/* Convert a double int into a wide int.  */
+
+wide_int
+wide_int::from_double_int (enum machine_mode mode, double_int di)
+{
+  wide_int result;
+  result.bitsize = GET_MODE_BITSIZE (mode);
+  result.precision = GET_MODE_PRECISION (mode);
+  result.len = 2;
+  result.val[0] = di.low;
+  result.val[1] = di.high;
+  result.canonize ();
+  return result;
+}
+
+/* Convert a integer cst into a wide int.  */
+
+wide_int
+wide_int::from_tree (const_tree tcst)
+{
+#if 1
+  wide_int result;
+  tree type = TREE_TYPE (tcst);
+  unsigned int prec = TYPE_PRECISION (type);
+  HOST_WIDE_INT op = TREE_INT_CST_LOW (tcst);
+
+  result.precision = prec;
+  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  result.len = (prec <= HOST_BITS_PER_WIDE_INT) ? 1 : 2;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    {
+      if (TYPE_UNSIGNED (type))
+	result.val[0] = zext_hwi (op, prec);
+      else
+	result.val[0] = sext_hwi (op, prec);
+    }
+  else
+    {
+      result.val[0] = op;
+      if (prec > HOST_BITS_PER_WIDE_INT)
+	{
+	  if (prec < HOST_BITS_PER_DOUBLE_INT)
+	    {
+	      op = TREE_INT_CST_HIGH (tcst);
+	      if (TYPE_UNSIGNED (type))
+		result.val[1] = zext_hwi (op, prec);
+	      else
+		result.val[1] = sext_hwi (op, prec);
+	    }
+	  else
+	    result.val[1] = TREE_INT_CST_HIGH (tcst);
+	}
+    }
+
+  if (result.len == 2)
+    result.canonize ();
+
+  return result;
+#endif
+  /* This is the code once the tree level is converted.  */
+#if 0
+  wide_int result;
+  int i;
+
+  tree type = TREE_TYPE (tcst);
+
+  result.bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  result.precision = TYPE_PRECISION (type);
+  result.len = TREE_INT_CST_LEN (tcst);
+  for (i = 0; i < result.len; i++)
+    result.val[i] = TREE_INT_CST_ELT (tcst, i);
+
+  return result;
+#endif
+}
+
+/* Extract a constant integer from the X of type MODE.  The bits of
+   the integer are returned.  */
+
+wide_int
+wide_int::from_rtx (const_rtx x, enum machine_mode mode)
+{
+  wide_int result;
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  gcc_assert (mode != VOIDmode);
+
+  result.bitsize = GET_MODE_BITSIZE (mode);
+  result.precision = prec;
+
+  switch (GET_CODE (x))
+    {
+    case CONST_INT:
+      if ((prec & (HOST_BITS_PER_WIDE_INT - 1)) != 0)
+	result.val[0] = sext_hwi (INTVAL (x), prec);
+      else
+	result.val[0] = INTVAL (x);
+      result.len = 1;
+      break;
+
+#if TARGET_SUPPORTS_WIDE_INT
+    case CONST_WIDE_INT:
+      {
+	int i;
+	result.len = CONST_WIDE_INT_NUNITS (x);
+	
+	for (i = 0; i < result.len; i++)
+	  result.val[i] = CONST_WIDE_INT_ELT (x, i);
+      }
+      break;
+#else
+    case CONST_DOUBLE:
+      result.len = 2;
+      result.val[0] = CONST_DOUBLE_LOW (x);
+      result.val[1] = CONST_DOUBLE_HIGH (x);
+      result.canonize ();
+      break;
+#endif
+
+    default:
+      gcc_unreachable ();
+    }
+
+  return result;
+}
+
+/* Return THIS as a signed HOST_WIDE_INT.  If THIS does not fit in
+   PREC, the information is lost. */
+
+HOST_WIDE_INT 
+wide_int::to_shwi (unsigned int prec) const
+{
+  HOST_WIDE_INT result;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    result = sext_hwi (val[0], prec);
+  else
+    result = val[0];
+
+  return result;
+}
+
+/* Return THIS as a signed HOST_WIDE_INT.  If THIS is too large for
+   the mode's precision, the information is lost. */
+
+HOST_WIDE_INT 
+wide_int::to_shwi () const
+{
+  return to_shwi (precision);
+}
+
+/* Return THIS as an unsigned HOST_WIDE_INT.  If THIS does not fit in
+   PREC, the information is lost. */
+
+unsigned HOST_WIDE_INT 
+wide_int::to_uhwi (unsigned int prec) const
+{
+  HOST_WIDE_INT result;
+
+  if (prec < HOST_BITS_PER_WIDE_INT)
+    result = zext_hwi (val[0], prec);
+  else
+    result = val[0];
+
+  return result;
+}
+
+/* Return THIS as an unsigned HOST_WIDE_INT.  If THIS is too large for
+   the mode's precision, the information is lost. */
+
+unsigned HOST_WIDE_INT 
+wide_int::to_uhwi () const
+{
+  return to_uhwi (precision);
+}
+
+/*
+ * Largest and smallest values in a mode.
+ */
+
+/* Produce the largest number that is represented in BITSIZE of
+   PREC. SGN must be SIGNED or
+   UNSIGNED.  */
+
+wide_int
+wide_int::max_value (unsigned int bitsize, unsigned int prec, SignOp sgn)
+{
+  wide_int result;
+  
+  result.bitsize = bitsize;
+  result.precision = prec;
+
+  if (sgn == UNSIGNED)
+    {
+      /* The unsigned max is just all ones, for which the compressed
+	 rep is just a single HWI.  */ 
+      result.len = 1;
+      result.val[0] = (HOST_WIDE_INT)-1;
+    }
+  else
+    {
+      /* The signed max is all ones except the top bit.  This must be
+	 explicitly represented.  */
+      int i;
+      int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+      int shift = (small_prec == 0) 
+	? HOST_BITS_PER_WIDE_INT - 1 : small_prec - 1;
+
+      result.len = BLOCKS_NEEDED (prec);
+      for (i = 0; i < result.len - 1; i++)
+	result.val[i] = (HOST_WIDE_INT)-1;
+
+      result.val[result.len - 1] = ((HOST_WIDE_INT)1 << shift) - 1;
+    }
+  
+  return result;
+}
+
+/* Produce the largest number that is represented in MODE. The
+   bitsize and precision are taken from mode.  SGN must be SIGNED or
+   UNSIGNED.  */
+
+wide_int
+wide_int::max_value (enum machine_mode mode, SignOp sgn)
+{
+  return max_value (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode), sgn);
+}
+
+/* Produce the largest number that is represented in TYPE. The
+   bitsize and precision and sign are taken from TYPE.  */
+
+wide_int
+wide_int::max_value (const_tree type)
+{
+  return max_value (GET_MODE_BITSIZE (TYPE_MODE (type)), 
+		    TYPE_PRECISION (type), 
+		    TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED);
+}
+
+/* Produce the smallest number that is represented in BITSIZE of
+   PREC. SGN must be SIGNED or
+   UNSIGNED.  */
+
+wide_int
+wide_int::min_value (unsigned int bitsize, unsigned int prec, SignOp sgn)
+{
+  if (sgn == UNSIGNED)
+    {
+      /* The unsigned min is just all zeros, for which the compressed
+	 rep is just a single HWI.  */ 
+      wide_int result;
+      result.len = 1;
+      result.bitsize = bitsize;
+      result.precision = prec;
+      result.val[0] = 0;
+      return result;
+    }
+  else
+    {
+      /* The signed min is all zeros except the top bit.  This must be
+	 explicitly represented.  */
+      return set_bit_in_zero (prec - 1, bitsize, prec);
+    }
+}
+
+/* Produce the smallest number that is represented in MODE. The
+   bitsize and precision are taken from mode.  SGN must be SIGNED or
+   UNSIGNED.  */
+
+wide_int
+wide_int::min_value (enum machine_mode mode, SignOp sgn)
+{
+  return min_value (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode), sgn);
+}
+
+/* Produce the smallest number that is represented in TYPE. The
+   bitsize and precision and sign are taken from TYPE.  */
+
+wide_int
+wide_int::min_value (const_tree type)
+{
+  return min_value (GET_MODE_BITSIZE (TYPE_MODE (type)), 
+		    TYPE_PRECISION (type), 
+		    TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED);
+}
+
+/*
+ * Public utilities.
+ */
+
+/* Check the upper HOST_WIDE_INTs of src to see if the length can be
+   shortened.  An upper HOST_WIDE_INT is unnecessary if it is all ones
+   or zeros and the top bit of the next lower word matches.
+
+   This function may change the representation of THIS, but does not
+   change the value that THIS represents.  It does not sign extend in
+   the case that the size of the mode is less than
+   HOST_BITS_PER_WIDE_INT.  */
+
+void
+wide_int::canonize ()
+{
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  int blocks_needed = BLOCKS_NEEDED (precision);
+  HOST_WIDE_INT top;
+  int i;
+
+  if (len > blocks_needed)
+    len = blocks_needed;
+
+  /* Clean up the top bits for any mode that is not a multiple of a HWI.  */
+  if (len == blocks_needed && small_prec)
+    val[len - 1] = sext_hwi (val[len - 1], small_prec);
+
+  if (len == 1)
+    return;
+
+  top = val[len - 1];
+  if (top != 0 && top != (HOST_WIDE_INT)-1)
+    return;
+
+  /* At this point we know that the top is either 0 or -1.  Find the
+     first block that is not a copy of this.  */
+  for (i = len - 2; i >= 0; i--)
+    {
+      HOST_WIDE_INT x = val[i];
+      if (x != top)
+	{
+	  if (x >> (HOST_BITS_PER_WIDE_INT - 1) == top)
+	    {
+	      len = i + 1;
+	      return;
+	    }
+
+	  /* We need an extra block because the top bit block i does
+	     not match the extension.  */
+	  len = i + 2;
+	  return;
+	}
+    }
+
+  /* The number is 0 or -1.  */
+  len = 1;
+}
+
+/* Copy THIS replacing the mode with MODE.  */
+
+wide_int
+wide_int::force_to_size (unsigned int bs, unsigned int prec) const
+{
+  wide_int result;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  int blocks_needed = BLOCKS_NEEDED (prec);
+  int i;
+
+  result.bitsize = bs;
+  result.precision = prec;
+  result.len = blocks_needed < len ? blocks_needed : len;
+  for (i = 0; i < result.len; i++)
+    result.val[i] = val[i];
+
+  if (small_prec & (blocks_needed == len))
+    result.val[blocks_needed-1]
+      = sext_hwi (result.val[blocks_needed-1], small_prec);
+  return result;
+}
+
+/*
+ * public printing routines.
+ */
+
+/* Try to print the signed self in decimal to BUF if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decs (char *buf) const
+{
+  if ((precision <= HOST_BITS_PER_WIDE_INT)
+      || (len == 1 && !neg_p ()))
+      sprintf (buf, HOST_WIDE_INT_PRINT_DEC, val[0]);
+  else
+    print_hex (buf);
+}
+
+/* Try to print the signed self in decimal to FILE if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decs (FILE *file) const
+{
+  char buf[(2 * MAX_BITSIZE_MODE_ANY_INT / BITS_PER_UNIT) + 4];
+  print_decs (buf);
+  fputs (buf, file);
+}
+
+/* Try to print the unsigned self in decimal to BUF if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decu (char *buf) const
+{
+  if ((precision <= HOST_BITS_PER_WIDE_INT)
+      || (len == 1 && !neg_p ()))
+      sprintf (buf, HOST_WIDE_INT_PRINT_UNSIGNED, val[0]);
+  else
+    print_hex (buf);
+}
+
+/* Try to print the signed self in decimal to FILE if the number fits
+   in a HWI.  Other print in hex.  */
+
+void 
+wide_int::print_decu (FILE *file) const
+{
+  char buf[(2 * MAX_BITSIZE_MODE_ANY_INT / BITS_PER_UNIT) + 4];
+  print_decu (buf);
+  fputs (buf, file);
+}
+
+void 
+wide_int::print_hex (char *buf) const
+{
+  int i = len;
+
+  if (zero_p ())
+    sprintf (buf, "0x");
+  else
+    {
+      if (neg_p ())
+	{
+	  int j;
+	  /* If the number is negative, we may need to pad value with
+	     0xFFF...  because the leading elements may be missing and
+	     we do not print a '-' with hex.  */
+	  for (j = BLOCKS_NEEDED (precision); j > i; j--)
+	    buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, (HOST_WIDE_INT) -1);
+	    
+	}
+      else
+	buf += sprintf (buf, HOST_WIDE_INT_PRINT_HEX, val [--i]);
+      while (-- i >= 0)
+	buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, val [i]);
+    }
+}
+
+/* Print one big hex number to FILE.  Note that some assemblers may not
+   accept this for large modes.  */
+void 
+wide_int::print_hex (FILE *file) const
+{
+  char buf[(2 * MAX_BITSIZE_MODE_ANY_INT / BITS_PER_UNIT) + 4];
+  print_hex (buf);
+  fputs (buf, file);
+}
+
+/*
+ * Comparisons, note that only equality is an operator.  The other
+ * comparisons cannot be operators since they are inherently singed or
+ * unsigned and C++ has no such operators.
+ */
+
+/* Return true if THIS == OP1.  */
+
+bool
+wide_int::operator == (const wide_int &op1) const
+{
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool result;
+
+  if (this == &op1)
+    {
+      result = true;
+      goto ex;
+    }
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      unsigned HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << precision) - 1;
+      result = (val[0] & mask) == (op1.val[0] & mask);
+      goto ex;
+    }
+
+  while (l0 > l1)
+    if (val[l0--] != op1.sign_mask ())
+      {
+	result = false;
+	goto ex;
+      }
+
+  while (l1 > l0)
+    if (op1.val[l1--] != sign_mask ())
+      {
+	result = false;
+	goto ex;
+      }
+
+  while (l0 >= 0)
+    if (val[l0--] != op1.val[l1--])
+      {
+	result = false;
+	goto ex;
+      }
+
+  result = true;
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("operator ==", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS > OP1 using signed comparisons.  */
+
+bool
+wide_int::gts_p (const HOST_WIDE_INT op1) const
+{
+  bool result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      /* The values are already logically sign extended.  */
+      result = val[0] > sext_hwi (op1, precision);
+      goto ex;
+    }
+  
+  result = !neg_p ();
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int::gts_p", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS > OP1 using unsigned comparisons.  */
+
+bool
+wide_int::gtu_p (const unsigned HOST_WIDE_INT op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  bool result;
+
+  if (precision < HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      x0 = zext_hwi (val[0], precision);
+      x1 = zext_hwi (op1, precision);
+
+      result = x0 > x1;
+    }
+  else
+    result = true;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int::gtu_p", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using signed comparisons.  */
+
+bool
+wide_int::lts_p (const HOST_WIDE_INT op1) const
+{
+  bool result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      /* The values are already logically sign extended.  */
+      result = val[0] < sext_hwi (op1, precision);
+      goto ex;
+    }
+  
+  result = neg_p ();
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int::lts_p", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using signed comparisons.  */
+
+bool
+wide_int::lts_p (const wide_int &op1) const
+{
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool result;
+
+  if (this == &op1)
+    {
+      result = false;
+      goto ex;
+    }
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      /* The values are already logically sign extended.  */
+      result = val[0] < op1.val[0];
+      goto ex;
+    }
+
+  while (l0 > l1)
+    if (val[l0--] < op1.sign_mask ())
+      {
+	result = true;
+	goto ex;
+      }
+
+  while (l1 > l0)
+    if (sign_mask () < op1.val[l1--])
+      {
+	result = true;
+	goto ex;
+      }
+
+  while (l0 >= 0)
+    if (val[l0--] < op1.val[l1--])
+      {
+	result = true;
+	goto ex;
+      }
+
+  result = false;
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int::lts_p", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using unsigned comparisons.  */
+
+bool
+wide_int::ltu_p (const unsigned HOST_WIDE_INT op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  bool result;
+
+  if (precision < HOST_BITS_PER_WIDE_INT || len == 1)
+    {
+      x0 = zext_hwi (val[0], precision);
+      x1 = zext_hwi (op1, precision);
+
+      result = x0 < x1;
+    }
+  else
+    result = false;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vwh ("wide_int::ltu_p", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS < OP1 using unsigned comparisons.  */
+
+bool
+wide_int::ltu_p (const wide_int &op1) const
+{
+  unsigned HOST_WIDE_INT x0;
+  unsigned HOST_WIDE_INT x1;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool result;
+
+  if (this == &op1)
+    {
+      result = false;
+      goto ex;
+    }
+
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      x0 = zext_hwi (val[0], precision);
+      x1 = zext_hwi (op1.val[0], precision);
+
+      result = x0 < x1;
+      goto ex;
+    }
+
+  while (l0 > l1)
+    {
+      x0 = val[l0--];
+      x1 = op1.sign_mask ();
+      if (x0 < x1)
+	{
+	  result = true;
+	  goto ex;
+	}
+    }
+
+  while (l1 > l0)
+    {
+      x0 = sign_mask ();
+      x1 = op1.val[l1--];
+      if (x0 < x1)
+	{
+	  result = true;
+	  goto ex;
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      x0 = val[l0--];
+      x1 = op1.val[l1--];
+      if (x0 < x1)
+	{
+	  result = true;
+	  goto ex;
+	}
+    }
+
+  result = false;
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vww ("wide_int::ltu_p", result, *this, op1);
+#endif
+
+  return result;
+}
+
+/* Return true if THIS has the sign bit set to 1 and all other bits are
+   zero.  */
+
+bool
+wide_int::only_sign_bit_p (unsigned int prec) const
+{
+  int i;
+  HOST_WIDE_INT x;
+  int small_prec;
+  bool result;
+
+  if (BLOCKS_NEEDED (prec) != len)
+    {
+      result = false;
+      goto ex;
+    }
+
+  for (i=0; i < len - 1; i++)
+    if (val[i] != 0)
+      {
+	result = false;
+	goto ex;
+      }
+
+  x = val[len - 1];
+  small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec)
+    x = x << (HOST_BITS_PER_WIDE_INT - small_prec);
+
+  result = x == ((HOST_WIDE_INT)1) << (HOST_BITS_PER_WIDE_INT - 1);
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int::only_sign_bit_p", result, *this);
+#endif
+
+  return result;
+}
+
+bool
+wide_int::only_sign_bit_p () const
+{
+  return only_sign_bit_p (precision);
+}
+
+/* Returns true if THIS fits into range of TYPE.  Signedness of OP0 is
+   assumed to be the same as the signedness of TYPE.  */
+
+bool
+wide_int::fits_to_tree_p (const_tree type) const
+{
+  int type_prec = TYPE_PRECISION (type);
+
+  if (TYPE_UNSIGNED (type))
+    return fits_u_p (type_prec);
+  else
+    return fits_s_p (type_prec);
+}
+
+/* Returns true of THIS fits in the unsigned range of precision.  */
+
+bool
+wide_int::fits_s_p (unsigned int prec) const
+{
+  if (len < BLOCKS_NEEDED (prec))
+    return true;
+
+  if (precision <= prec)
+    return true;
+
+  return *this == sext (prec);
+}
+
+
+/* Returns true if THIS fits into range of TYPE.  */
+
+bool
+wide_int::fits_u_p (unsigned int prec) const
+{
+  if (len < BLOCKS_NEEDED (prec))
+    return true;
+
+  if (precision <= prec)
+    return true;
+
+  return *this == zext (prec);
+}
+
+/*
+ * Extension.
+ */
+
+/* Sign extend THIS starting at OFFSET within the precision of the mode.  */
+
+wide_int
+wide_int::sext (unsigned int offset) const
+{
+  wide_int result;
+  int off;
+
+  gcc_assert (precision >= offset);
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[0] = sext_hwi (val[0], offset);
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::sext", result, *this, offset);
+#endif
+
+      return result;
+    }
+
+  if (precision == offset)
+    {
+      result = force_to_size (bitsize, precision);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::sext", result, *this, offset);
+#endif
+      return result;
+    }
+
+  result = decompress (offset, bitsize, precision);
+
+  /* Now we can do the real sign extension.  */
+  off = offset & (HOST_BITS_PER_WIDE_INT - 1);
+  if (off)
+    {
+      int block = BLOCK_OF (offset);
+      result.val[block] = sext_hwi (val[block], off);
+      result.len = block + 1;
+    }
+  /* We never need an extra element for sign extended values.  */
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::sext", result, *this, offset);
+#endif
+
+  return result;
+}
+
+/* Sign extend THIS to mode M.  */
+
+wide_int
+wide_int::sext (enum machine_mode m) const
+{
+  /* Assuming that MODE is larger than THIS, the compressed value of
+     op0 and the result will be the same.  The only thing that is
+     different is that the mode of the result will be different.  */
+  return force_to_size (GET_MODE_BITSIZE (m), GET_MODE_PRECISION (m));
+}
+
+/* Zero extend THIS starting at OFFSET within the precision of the mode.  */
+
+wide_int
+wide_int::zext (unsigned int offset) const
+{
+  wide_int result;
+  int off;
+  int block;
+
+  gcc_assert (precision >= offset);
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[0] = zext_hwi (val[0], offset);
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::zext", result, *this, offset);
+#endif
+
+      return result;
+    }
+
+  if (precision == offset)
+    {
+      result = force_to_size (bitsize, precision);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::zext", result, *this, offset);
+#endif
+      return result;
+    }
+
+  result = decompress (offset, bitsize, precision);
+
+  /* Now we can do the real zero extension.  */
+  off = offset & (HOST_BITS_PER_WIDE_INT - 1);
+  block = BLOCK_OF (offset);
+  if (off)
+    {
+      result.val[block] = zext_hwi (val[block], off);
+      result.len = block + 1;
+    }
+  else
+    /* See if we need an extra zero element to satisfy the compression
+       rule.  */
+    if (val[block - 1] < 0 && offset < precision)
+      {
+	result.val[block] = 0;
+	result.len += 1;
+      }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::zext", result, *this, offset);
+#endif
+
+  return result;
+}
+
+/* Zero extend THIS to mode M.  */
+
+wide_int
+wide_int::zext (enum machine_mode m) const
+{
+  wide_int result;
+  int off;
+  int block;
+  unsigned int res_bitsize = GET_MODE_BITSIZE (m);
+  unsigned int res_prec = GET_MODE_PRECISION (m);
+
+  gcc_assert (res_prec >= precision);
+
+  if (res_prec < HOST_BITS_PER_WIDE_INT)
+    {
+      result.bitsize = res_bitsize;
+      result.precision = res_prec;
+      result.val[0] = zext_hwi (val[0], precision);
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::zext", result, *this, res_prec);
+#endif
+      return result;
+    }
+
+  result = decompress (precision, res_bitsize, res_prec);
+
+  /* Now we can do the real zero extension.  */
+  off = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  block = BLOCK_OF (precision);
+  if (off)
+    {
+      result.val[block] = zext_hwi (val[block], off);
+      result.len = block + 1;
+    }
+  else
+    /* See if we need an extra zero element to satisfy the compression
+       rule.  */
+    if (val[block - 1] < 0 && precision < res_prec)
+      {
+	result.val[block] = 0;
+	result.len += 1;
+      }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::zext", result, *this, res_prec);
+#endif
+
+  return result;
+}
+
+/*
+ * Masking, inserting, shifting, rotating.
+ */
+
+/* Return a value with a one bit inserted in THIS at BITPOS.  */
+
+wide_int
+wide_int::set_bit (unsigned int bitpos) const
+{
+  wide_int result;
+  int i, j;
+
+  if (bitpos >= precision)
+    result = force_to_size (bitsize, precision);
+  else
+    {
+      result = decompress (bitpos, bitsize, precision);
+      j = bitpos / HOST_BITS_PER_WIDE_INT;
+      i = bitpos & (HOST_BITS_PER_WIDE_INT - 1);
+      result.val[j] |= ((HOST_WIDE_INT)1) << i;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::set_bit", result, *this, bitpos);
+#endif
+
+  return result;
+}
+
+/* Insert a 1 bit into 0 at BITPOS producing an number with BITSIZE
+   and PRECISION.  */
+
+wide_int
+wide_int::set_bit_in_zero (unsigned int bitpos, 
+			   unsigned int bitsize, unsigned int prec)
+{
+  wide_int result;
+  int blocks_needed = BLOCKS_NEEDED (bitpos);
+  int i, j;
+
+  result.bitsize = bitsize;
+  result.precision = prec;
+  if (bitpos >= prec)
+    {
+      result.len = 1;
+      result.val[0] = 0;
+    }
+  else
+    {
+      result.len = blocks_needed;
+      for (i = 0; i < blocks_needed; i++)
+	result.val[i] = 0;
+      
+      j = bitpos / HOST_BITS_PER_WIDE_INT;
+      i = bitpos & (HOST_BITS_PER_WIDE_INT - 1);
+      result.val[j] |= ((HOST_WIDE_INT)1) << i;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wv ("wide_int::set_bit_in_zero", result, bitpos);
+#endif
+
+  return result;
+}
+
+/* Insert WIDTH bits from OP0 into THIS starting at START.  */
+
+wide_int
+wide_int::insert (const wide_int &op0, unsigned int start, 
+		  unsigned int width) const
+{
+  wide_int result;
+  wide_int mask;
+  wide_int tmp;
+
+  if (start + width >= precision) 
+    width = precision - start;
+
+  mask = shifted_mask (start, width, false, bitsize, precision);
+  tmp = op0.lshift (start, NONE, bitsize, precision);
+  result = tmp & mask;
+
+  tmp = and_not (mask);
+  result = result | tmp;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwwvv ("wide_int::insert", result, *this, op0, start, width);
+#endif
+
+  return result;
+}
+
+/* bswap THIS.  */
+
+wide_int
+wide_int::bswap () const
+{
+  wide_int result;
+  int i, s;
+  int end;
+  int len = BLOCKS_NEEDED (precision);
+  HOST_WIDE_INT mask = sign_mask ();
+
+  /* This is not a well defined operation if the precision is not a
+     multiple of 8.  */
+  gcc_assert ((precision & 0x7) == 0);
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  result.len = len;
+
+  for (i = 0; i < len; i++)
+    result.val[0] = mask;
+
+  /* Only swap the bytes that are not the padding.  */
+  if ((precision & (HOST_BITS_PER_WIDE_INT - 1))
+      && (this->len == len))
+    end = precision;
+  else
+    end = this->len * HOST_BITS_PER_WIDE_INT;
+
+  for (s = 0; s < end; s += 8)
+    {
+      unsigned int d = precision - s - 8;
+      unsigned HOST_WIDE_INT byte;
+
+      int block = s / HOST_BITS_PER_WIDE_INT;
+      int offset = s & (HOST_BITS_PER_WIDE_INT - 1);
+
+      byte = (val[block] >> offset) & 0xff;
+
+      block = d / HOST_BITS_PER_WIDE_INT;
+      offset = d & (HOST_BITS_PER_WIDE_INT - 1);
+
+      result.val[block] &= ((((HOST_WIDE_INT)1 << offset) + 8)
+			    - ((HOST_WIDE_INT)1 << offset));
+      result.val[block] |= byte << offset;
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_ww ("wide_int::bswap", result, *this);
+#endif
+
+  return result;
+}
+
+/* Return a result mask where the lower WIDTH bits are ones and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  The result is made with BITSIZE and
+   PREC. */
+
+wide_int
+wide_int::mask (unsigned int width, bool negate, 
+		unsigned int bitsize, unsigned int prec)
+{
+  wide_int result;
+  unsigned int i = 0;
+  int shift;
+
+  if (width == 0)
+    {
+      if (negate)
+	result = wide_int::minus_one (bitsize, prec);
+      else
+	result = wide_int::zero (bitsize, prec);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wvv ("wide_int::mask", result, width, negate);
+#endif
+      return result;
+    }
+
+  result.bitsize = bitsize;
+  result.precision = prec;
+
+  while (i < width / HOST_BITS_PER_WIDE_INT)
+    result.val[i++] = negate ? 0 : (HOST_WIDE_INT)-1;
+
+  shift = width & (HOST_BITS_PER_WIDE_INT - 1);
+  if (shift != 0)
+    {
+      HOST_WIDE_INT last = (((HOST_WIDE_INT)1) << shift) - 1;
+      result.val[i++] = negate ? ~last : last;
+    }
+  result.len = i;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvv ("wide_int::mask", result, width, negate);
+#endif
+
+  return result;
+}
+
+/* Return a result mask of WIDTH ones starting at START and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  */
+wide_int
+wide_int::shifted_mask (unsigned int start, unsigned int width, 
+			bool negate,
+			unsigned int bitsize, unsigned int prec)
+{
+  wide_int result;
+  unsigned int i = 0;
+  unsigned int shift;
+  unsigned int end = start + width;
+  HOST_WIDE_INT block;
+
+  if (start + width > prec)
+    width = prec - start;
+ 
+  if (width == 0)
+    {
+      if (negate)
+	result = wide_int::minus_one (bitsize, prec);
+      else
+	result = wide_int::zero (bitsize, prec);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wvv ("wide_int::shifted_mask", result, width, negate);
+#endif
+      return result;
+    }
+
+  result.bitsize = bitsize;
+  result.precision = prec;
+
+  while (i < start / HOST_BITS_PER_WIDE_INT)
+    result.val[i++] = negate ? (HOST_WIDE_INT)-1 : 0;
+
+  shift = start & (HOST_BITS_PER_WIDE_INT - 1);
+  if (shift)
+    {
+      block = (((HOST_WIDE_INT)1) << shift) - 1;
+      shift = (end) & (HOST_BITS_PER_WIDE_INT - 1);
+      if (shift)
+	{
+	  /* case 000111000 */
+	  block = (((HOST_WIDE_INT)1) << shift) - block - 1;
+	  result.val[i++] = negate ? ~block : block;
+	  result.len = i;
+
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_wvvv ("wide_int::shifted_mask", result, start,
+			width, negate);
+#endif
+	  return result;
+	}
+      else
+	/* ...111000 */
+	result.val[i++] = negate ? block : ~block;
+    }
+
+  while (i < end / HOST_BITS_PER_WIDE_INT)
+    /* 1111111 */
+    result.val[i++] = negate ? 0 : (HOST_WIDE_INT)-1;
+
+  shift = end & (HOST_BITS_PER_WIDE_INT - 1);
+  if (shift != 0)
+    {
+      /* 000011111 */
+      block = (((HOST_WIDE_INT)1) << shift) - 1;
+      result.val[i++] = negate ? ~block : block;
+    }
+
+  result.len = i;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wvvv ("wide_int::shifted_mask", result, start, width,
+		negate);
+#endif
+
+  return result;
+}
+
+
+/*
+ * logical operations.
+ */
+
+/* Return THIS & OP1.  */
+
+wide_int
+wide_int::operator & (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = len > op1.len ? len : op1.len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () == 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () == 0)
+	  result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l0] = op1.val[l0];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] & op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator &", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Return THIS & ~OP1.  */
+
+wide_int
+wide_int::and_not (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = len > op1.len ? len : op1.len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () != 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () == 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l0] = ~op1.val[l0];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] & ~op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::and_not", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Return THIS | OP1.  */
+
+wide_int
+wide_int::operator | (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = len > op1.len ? len : op1.len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () != 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () != 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l0] = op1.val[l0];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] | op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator |", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Return the logical negation (bitwise complement) of THIS.  */
+
+wide_int
+wide_int::operator ~ () const
+{
+  wide_int result;
+  int l0 = len - 1;
+
+  result.len = len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = ~val[l0];
+      l0--;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_ww ("wide_int::operator ~", result, *this);
+#endif
+  return result;
+}
+
+/* Return THIS | ~OP1.  */
+
+wide_int
+wide_int::or_not (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+  bool need_canon = true;
+
+  result.len = len > op1.len ? len : op1.len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (l0 > l1)
+    {
+      if (op1.sign_mask () == 0)
+	{
+	  l0 = l1;
+	  result.len = l1 + 1;
+	}
+      else
+	{
+	  need_canon = false;
+	  while (l0 > l1)
+	    {
+	      result.val[l0] = val[l0];
+	      l0--;
+	    }
+	}
+    }
+  else if (l1 > l0)
+    {
+      if (sign_mask () != 0)
+	result.len = l0 + 1;
+      else
+	{
+	  need_canon = false;
+	  while (l1 > l0)
+	    {
+	      result.val[l0] = ~op1.val[l0];
+	      l1--;
+	    }
+	}
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] | ~op1.val[l0];
+      l0--;
+    }
+
+  if (need_canon)
+    result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::and_not", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Return the exclusive ior (xor) of THIS and OP1.  */
+
+wide_int
+wide_int::operator ^ (const wide_int &op1) const
+{
+  wide_int result;
+  int l0 = len - 1;
+  int l1 = op1.len - 1;
+
+  result.len = len > op1.len ? len : op1.len;
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  while (l0 > l1)
+    {
+      result.val[l0] = val[l0] ^ op1.sign_mask ();
+      l0--;
+    }
+
+  while (l1 > l0)
+    {
+      result.val[l0] = sign_mask () ^ op1.val[l0];
+      l1--;
+    }
+
+  while (l0 >= 0)
+    {
+      result.val[l0] = val[l0] ^ op1.val[l0];
+      l0--;
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator ^", result, *this, op1);
+#endif
+  return result;
+}
+
+/*
+ * math
+ */
+
+/* Absolute value of THIS.  */
+
+wide_int
+wide_int::abs () const
+{
+  if (sign_mask ())
+    return neg ();
+
+  wide_int result = force_to_size (bitsize, precision);
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_ww ("wide_int::abs", result, *this);
+#endif
+  return result;
+}
+
+/* Add of THIS and OP1.  No overflow is detected.  */
+
+wide_int
+wide_int::operator + (const wide_int &op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT carry = 0;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec, stop;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      o0 = val[0];
+      o1 = op1.val[0];
+      result.val[0] = sext_hwi (o0 + o1, precision);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator +", result, *this, op1);
+#endif
+      return result;
+    }
+
+  stop = len > op1.len ? len : op1.len;
+  /* Need to do one extra block just to handle the special cases.  */
+  if (stop < (unsigned)BLOCKS_NEEDED (precision))
+    stop++;
+
+  result.len = stop;
+  mask0 = sign_mask ();
+  mask1 = op1.sign_mask ();
+  /* Add all of the explicitly defined elements.  */
+  for (i = 0; i < stop; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < op1.len ? (unsigned HOST_WIDE_INT)op1.val[i] : mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator +", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Add of THIS and signed OP1.  No overflow is detected.  */
+
+wide_int
+wide_int::operator + (HOST_WIDE_INT op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT carry = 0;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec, stop;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      o0 = val[0];
+      result.val[0] = sext_hwi (o0 + op1, precision);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::add", result, *this, op1);
+#endif
+      return result;
+    }
+
+  stop = len;
+  /* Need to do one extra block just to handle the special cases.  */
+  if (stop < (unsigned)BLOCKS_NEEDED (precision))
+    stop++;
+
+  result.len = stop;
+  mask0 = sign_mask ();
+  mask1 = op1 >> (HOST_BITS_PER_WIDE_INT - 1);
+  /* Add all of the explicitly defined elements.  */
+  for (i = 0; i < len; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < 1 ? (unsigned HOST_WIDE_INT)op1 : mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator +", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Add of THIS and unsigned OP1.  No overflow is detected.  */
+
+wide_int
+wide_int::operator + (unsigned HOST_WIDE_INT op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT carry = 0;
+  unsigned HOST_WIDE_INT mask0;
+  unsigned int i, small_prec, stop;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      o0 = val[0];
+      result.val[0] = sext_hwi (o0 + op1, precision);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator +", result, *this, op1);
+#endif
+      return result;
+    }
+
+  stop = len;
+  /* Need to do one extra block just to handle the special cases.  */
+  if (stop < (unsigned)BLOCKS_NEEDED (precision))
+    stop++;
+
+  result.len = stop;
+  mask0 = sign_mask ();
+  /* Add all of the explicitly defined elements.  */
+  for (i = 0; i < stop; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < 1 ? (unsigned HOST_WIDE_INT)op1 : 0;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator +", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Add of OP0 and OP1 with overflow checking.  If the result overflows
+   within the precision, set OVERFLOW.  OVERFLOW is assumed to be
+   sticky so it should be initialized.  SGN controls if signed or
+   unsigned overflow is checked.  */
+
+wide_int
+wide_int::add_overflow (const wide_int *op0, const wide_int *op1,
+			wide_int::SignOp sgn, bool *overflow)
+{
+  wide_int result;
+  const wide_int *tmp;
+  unsigned HOST_WIDE_INT o0 = 0;
+  unsigned HOST_WIDE_INT o1 = 0;
+  unsigned HOST_WIDE_INT x = 0;
+  unsigned HOST_WIDE_INT carry = 0;
+  unsigned int prec = op0->precision;
+  int i, small_prec;
+
+  result.precision = op0->precision;
+  result.bitsize = op0->bitsize;
+
+  /* Put the longer one first.  */
+  if (op0->len > op1->len)
+    {
+      tmp = op0;
+      op0 = op1;
+      op1 = tmp;
+    }
+
+  /* Add all of the explicitly defined elements.  */
+  for (i = 0; i < op0->len; i++)
+    {
+      o0 = op0->val[i];
+      o1 = op1->val[i];
+      x = o0 + o1 + carry;
+      result.elt_ref (i) = x;
+      carry = x < o0;
+    }
+
+  /* Uncompress the rest.  */
+  if (op0->len < op1->len)
+    {
+      unsigned HOST_WIDE_INT mask = op1->sign_mask ();
+      for (i = op0->len; i < op1->len; i++)
+	{
+	  o0 = op0->val[i];
+	  o1 = mask;
+	  x = o0 + o1 + carry;
+	  result.val[i] = x;
+	  carry = x < o0;
+	}
+    }
+
+  result.set_len (op0->len);
+  small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec == 0)
+    {
+      if (op0->len * HOST_BITS_PER_WIDE_INT < prec)
+	{
+	  /* If the carry is 1, then we need another word.  If the carry
+	     is 0, we only need another word if the top bit is 1.  */
+	  if (carry == 1
+	      || (x >> (HOST_BITS_PER_WIDE_INT - 1) == 1))
+	    /* Check for signed overflow.  */
+	    {
+	      result.val[result.len] = carry;
+	      result.len++;
+	    }
+	  if (sgn == wide_int::SIGNED)
+	    {
+	      if (((x ^ o0) & (x ^ o1)) >> (HOST_BITS_PER_WIDE_INT - 1))
+		*overflow = true;
+	    }
+	  else if (carry)
+	    {
+	      if ((~o0) < o1)
+		*overflow = true;
+	    }
+	  else
+	    {
+	      if ((~o0) <= o1)
+		*overflow = true;
+	    }
+	}
+    }
+  else
+    {
+      /* Overflow in this case is easy since we can see bits beyond
+	 the precision.  If the value computed is not the sign
+	 extended value, then we have overflow.  */
+      unsigned HOST_WIDE_INT y;
+
+      if (sgn == wide_int::UNSIGNED)
+	{
+	  /* The caveat for unsigned is to get rid of the bits above
+	     the precision before doing the addition.  To check the
+	     overflow, clear these bits and then redo the last
+	     addition.  Then the rest of the code just works.  */
+	  o0 = zext_hwi (o0, small_prec);
+	  o1 = zext_hwi (o1, small_prec);
+	  x = o0 + o1 + carry;
+	}
+      /* Short integers and modes with weird precisions.  */
+      y = sext_hwi (x, small_prec);
+      result.len = op1->len;
+      if (BLOCKS_NEEDED (prec) == result.len && x != y)
+	*overflow = true;
+      /* Then put the sign extended form back because that is the
+	 canonical form.  */
+      result.val[result.len - 1] = y;
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::add_overflow", result, *op0, *op1);
+#endif
+  return result;
+}
+
+/* Add this and X.  If overflow occurs, set OVERFLOW.  */
+
+wide_int
+wide_int::add (const wide_int &x, SignOp sgn, bool *overflow) const
+{
+  return add_overflow (this, &x, sgn, overflow);
+}
+
+/* Count leading zeros of THIS but only looking at the bits in the
+   smallest HWI of size mode.  */
+
+wide_int
+wide_int::clz (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (clz (), bs, prec);
+}
+
+/* Count leading zeros of THIS.  */
+
+int
+wide_int::clz () const
+{
+  int i;
+  int start;
+  int count;
+  HOST_WIDE_INT v;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+
+  if (zero_p ())
+    {
+      enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
+      if (mode == BLKmode)
+	mode_for_size (precision, MODE_PARTIAL_INT, 0); 
+
+      /* Even if the value at zero is undefined, we have to come up
+	 with some replacement.  Seems good enough.  */
+      if (mode == BLKmode)
+	count = precision;
+      else if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, count))
+	count = precision;
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_vw ("wide_int::clz", count, *this);
+#endif
+      return count;
+    }
+
+  /* The high order block is special if it is the last block and the
+     precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
+     have to clear out any ones above the precision before doing clz
+     on this block.  */
+  if (BLOCKS_NEEDED (precision) == len && small_prec)
+    {
+      v = zext_hwi (val[len - 1], small_prec);
+      count = clz_hwi (v) - (HOST_BITS_PER_WIDE_INT - small_prec);
+      start = len - 2;
+      if (v != 0)
+	{
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_vw ("wide_int::clz", count, *this);
+#endif
+	  return count;
+	}
+    }
+  else
+    {
+      count = HOST_BITS_PER_WIDE_INT * (BLOCKS_NEEDED (precision) - len);
+      start = len - 1;
+    }
+
+  for (i = start; i >= 0; i--)
+    {
+      v = elt (i);
+      count += clz_hwi (v);
+      if (v != 0)
+	break;
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int::clz", count, *this);
+#endif
+  return count;
+}
+
+wide_int
+wide_int::clrsb (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (clrsb (), bs, prec);
+}
+
+/* Count the number of redundant leading bits of THIS.  Return result
+   as a HOST_WIDE_INT.  There is a wrapper to convert this into a
+   wide_int.  */
+
+int
+wide_int::clrsb () const
+{
+  if (neg_p ())
+    return operator ~ ().clz () - 1;
+
+  return clz () - 1;
+}
+
+wide_int
+wide_int::ctz (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (ctz (), bs, prec);
+}
+
+/* Count zeros of THIS.  Return result as a HOST_WIDE_INT.  There is a
+   wrapper to convert this into a wide_int.  */
+
+int
+wide_int::ctz () const
+{
+  int i;
+  unsigned int count = 0;
+  HOST_WIDE_INT v;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  int end;
+  bool more_to_do;
+
+  if (zero_p ())
+    {
+      enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
+      if (mode == BLKmode)
+	mode_for_size (precision, MODE_PARTIAL_INT, 0); 
+
+      /* Even if the value at zero is undefined, we have to come up
+	 with some replacement.  Seems good enough.  */
+      if (mode == BLKmode)
+	count = precision;
+      else if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, count))
+	count = precision;
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_vw ("wide_int::ctz", count, *this);
+#endif
+      return count;
+    }
+
+  /* The high order block is special if it is the last block and the
+     precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
+     have to clear out any ones above the precision before doing clz
+     on this block.  */
+  if (BLOCKS_NEEDED (precision) == len && small_prec)
+    {
+      end = len - 1;
+      more_to_do = true;
+    }
+  else
+    {
+      end = len;
+      more_to_do = false;
+    }
+
+  for (i = 0; i < end; i++)
+    {
+      v = val[i];
+      count += ctz_hwi (v);
+      if (v != 0)
+	{
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_vw ("wide_int::ctz", count, *this);
+#endif
+	  return count;
+	}
+    }
+
+  if (more_to_do)
+    {
+      v = zext_hwi (val[len - 1], small_prec);
+      count = ctz_hwi (v);
+      /* The top word was all zeros so we have to cut it back to prec,
+	 because we are counting some of the zeros above the
+	 interesting part.  */
+      if (count > precision)
+	count = precision;
+    }
+  else
+    /* Skip over the blocks that are not represented.  They must be
+       all zeros at this point.  */
+    count = precision;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int::ctz", count, *this);
+#endif
+  return count;
+}
+
+/* ffs of THIS.  */
+
+wide_int
+wide_int::ffs () const
+{
+  HOST_WIDE_INT count = ctz ();
+  if (count == precision)
+    count = 0;
+  else
+    count += 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int::ffs", count, *this);
+#endif
+  return wide_int::from_shwi (count, word_mode);
+}
+
+/* Subroutines of the multiplication and division operations.  Unpack
+   the first IN_LEN HOST_WIDE_INTs in INPUT into 2 * IN_LEN
+   HOST_HALF_WID_INTs of RESULT.  The rest of RESULT is filled by
+   uncompressing the top bit of INPUT[IN_LEN - 1].  */
+
+static void
+wi_unpack (unsigned HOST_HALF_WIDE_INT *result, 
+	   const unsigned HOST_WIDE_INT *input,
+	   int in_len, int out_len)
+{
+  int i;
+  int j = 0;
+  HOST_WIDE_INT mask;
+
+  for (i = 0; i <in_len; i++)
+    {
+      result[j++] = input[i];
+      result[j++] = input[i] >> HOST_BITS_PER_HALF_WIDE_INT;
+    }
+  mask = ((HOST_WIDE_INT)input[in_len - 1]) >> (HOST_BITS_PER_WIDE_INT - 1);
+  mask &= HALF_INT_MASK;
+
+  /* Smear the sign bit.  */
+  while (j < out_len)
+    result[j++] = mask;
+}
+
+/* The inverse of wi_unpack.  */
+
+static void
+wi_pack (unsigned HOST_WIDE_INT *result, 
+	 const unsigned HOST_HALF_WIDE_INT *input, 
+	 int in_len)
+{
+  int i = 0;
+  int j = 0;
+
+  while (i < in_len)
+    {
+      result[j++] = (unsigned HOST_WIDE_INT)input[i] 
+	| ((unsigned HOST_WIDE_INT)input[i + 1] << HOST_BITS_PER_HALF_WIDE_INT);
+      i += 2;
+    }
+}
+
+/* Return an integer that is the exact log2 of THIS.  */
+
+int
+wide_int::exact_log2 () const
+{
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  HOST_WIDE_INT count;
+  HOST_WIDE_INT result;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      HOST_WIDE_INT v;
+      if (small_prec < HOST_BITS_PER_WIDE_INT)
+	v = sext_hwi (val[0], small_prec);
+      else
+	v = val[0];
+      result = ::exact_log2 (v);
+      goto ex;
+    }
+
+  count = ctz ();
+  if (clz () + count + 1 == precision)
+    {
+      result = count;
+      goto ex;
+    }
+
+  result = -1;
+
+ ex:
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int::exact_log2", result, *this);
+#endif
+  return result;
+}
+
+/* Multiply Op1 by Op2.  If HIGH is set, only the upper half of the
+   result is returned.  If FULL is set, the entire result is returned
+   in a mode that is twice the width of the inputs.  However, that
+   mode needs to exist if the value is to be usable.  Clients that use
+   FULL need to check for this.
+
+   If HIGH or FULL are not setm throw away the upper half after the check
+   is made to see if it overflows.  Unfortunately there is no better
+   way to check for overflow than to do this.  OVERFLOW is assumed to
+   be sticky so it should be initialized.  SGN controls the signess
+   and is used to check overflow or if HIGH or FULL is set.  */
+
+static wide_int
+mul_internal (bool high, bool full, 
+			const wide_int *op1, const wide_int *op2,
+			wide_int::SignOp sgn,  bool *overflow, bool needs_overflow)
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1, k, t;
+  unsigned int i;
+  unsigned int j;
+  unsigned int prec = op1->get_precision ();
+  unsigned int blocks_needed = 2 * BLOCKS_NEEDED (prec);
+  unsigned HOST_HALF_WIDE_INT u[2 * MAX_BITSIZE_MODE_ANY_INT
+			   / HOST_BITS_PER_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT v[2 * MAX_BITSIZE_MODE_ANY_INT
+			   / HOST_BITS_PER_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT r[4 * MAX_BITSIZE_MODE_ANY_INT
+			   / HOST_BITS_PER_WIDE_INT];
+  HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << (HOST_BITS_PER_WIDE_INT / 2)) - 1;
+
+  result.set_bitsize (op1->get_bitsize ());
+  result.set_precision (op1->get_precision ());
+
+  if (high || full || needs_overflow)
+    {
+      /* If we need to check for overflow, we can only do half wide
+	 multiplies quickly because we need to look at the top bits to
+	 check for the overflow.  */
+      if (prec <= HOST_BITS_PER_HALF_WIDE_INT)
+	{
+	  HOST_WIDE_INT t;
+	  result.set_len (1);
+	  o0 = op1->elt (0);
+	  o1 = op2->elt (0);
+	  t = o0 * o1;
+	  /* Signed shift down will leave 0 or -1 if there was no
+	     overflow for signed or 0 for unsigned.  */
+	  t = t >> (HOST_BITS_PER_HALF_WIDE_INT - 1);
+	  if (needs_overflow)
+	    {
+	      if (sgn == wide_int::SIGNED)
+		{
+		  if (t != (HOST_WIDE_INT)-1 && t != 0)
+		    *overflow = true;
+		}
+	      else
+		{
+		  if (t != 0)
+		    *overflow = true;
+		}
+	    }
+	  if (full)
+	    {
+	      result.elt_ref (0) = sext_hwi (t, prec << 1);
+	      result.set_bitsize (op1->get_bitsize () * 2);
+	      result.set_precision (op1->get_precision () * 2);
+	    }
+	  else if (high)
+	    result.elt_ref (0) = sext_hwi (t >> (prec >> 1), prec);
+	  else
+	    result.elt_ref (0) = sext_hwi (t, prec);
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_www ("wide_int::mul_overflow", result, *op1, *op2);
+#endif
+	  return result;
+	}
+    }
+  else
+    {
+      if (prec <= HOST_BITS_PER_WIDE_INT)
+	{
+	  result.set_len (1);
+	  o0 = op1->elt (0);
+	  o1 = op2->elt (0);
+	  result.elt_ref (0) = sext_hwi (o0 * o1, prec);
+	  
+#ifdef DEBUG_WIDE_INT
+	  if (dump_file)
+	    debug_www ("wide_int::mul_overflow", result, *op1, *op2);
+#endif
+	  return result;
+	}
+    }
+
+  wi_unpack (u, &op1->uelt_ref (0), op1->get_len (), blocks_needed);
+  wi_unpack (v, &op2->uelt_ref (0), op2->get_len (), blocks_needed);
+
+  memset (r, 0, blocks_needed * 2 * HOST_BITS_PER_WIDE_INT / BITS_PER_UNIT);
+
+  for (j = 0; j < blocks_needed; j++)
+    {
+      k = 0;
+      for (i = 0; i < blocks_needed; i++)
+	{
+	  t = ((unsigned HOST_WIDE_INT)u[i] * (unsigned HOST_WIDE_INT)v[j]
+	       + r[i + j] + k);
+	  r[i + j] = t & HALF_INT_MASK;
+	  k = t >> HOST_BITS_PER_HALF_WIDE_INT;
+	}
+      r[j + blocks_needed] = k;
+    }
+
+  if (needs_overflow)
+    {
+      HOST_WIDE_INT top;
+
+      /* For unsigned, overflow is true if any of the top bits are set.
+	 For signed, overflow is true if any of the top bits are not equal
+	 to the sign bit.  */
+      if (sgn == wide_int::UNSIGNED)
+	top = 0;
+      else
+	{
+	  top = r[blocks_needed - 1];
+	  top = ((top << (HOST_BITS_PER_WIDE_INT / 2))
+		 >> (HOST_BITS_PER_WIDE_INT - 1));
+	  top &= mask;
+	}
+      
+      for (i = blocks_needed; i < 2 * blocks_needed; i++)
+	if (((HOST_WIDE_INT)(r[i] & mask)) != top)
+	  *overflow = true; 
+    }
+
+  if (full)
+    {
+      /* compute [2prec] <- [prec] * [prec] */
+      wi_pack (&result.uelt_ref (0), r, blocks_needed);
+      result.set_len (blocks_needed);
+      result.set_bitsize (op1->get_bitsize () * 2);
+      result.set_precision (op1->get_precision () * 2);
+    }
+  else if (high)
+    {
+      /* compute [prec] <- ([prec] * [prec]) >> [prec] */
+      wi_pack (&result.uelt_ref (blocks_needed >> 1), r, blocks_needed >> 1);
+      result.set_len (blocks_needed / 2);
+    }
+  else
+    {
+      /* compute [prec] <- ([prec] * [prec]) && ((1 << [prec]) - 1) */
+      wi_pack (&result.uelt_ref (0), r, blocks_needed >> 1);
+      result.set_len (blocks_needed / 2);
+    }
+      
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwwv ("wide_int::mul_overflow", result, *op1, *op2, *overflow);
+#endif
+  return result;
+}
+
+/* Multiply THIS and OP1.  The result is the same precision as the
+   operands, so there is no reason for signed or unsigned
+   versions.  */
+
+wide_int
+wide_int::operator * (const wide_int &op1) const
+{
+  bool overflow;
+
+  return mul_internal (false, false, this, &op1, UNSIGNED, &overflow, false);
+}
+
+/* Multiply THIS and OP1.  The signess is specified with SGN.
+   OVERFLOW is set true if the result overflows.  */
+
+wide_int 
+wide_int::mul (const wide_int &op1, SignOp sgn, bool *overflow) const
+{
+  return mul_internal (false, false, this, &op1, sgn, overflow, true);
+}
+
+/* Multiply THIS and OP1.  The signess is specified with SGN.  The
+   result is twice the precision as the operands.  The signess is
+   specified with SGN.  */
+
+wide_int
+wide_int::mul_full (const wide_int &op1, SignOp sgn) const
+{
+  bool overflow;
+
+  return mul_internal (false, true, this, &op1, sgn, &overflow, false);
+}
+
+/* Multiply THIS and OP1 and return the high part of that result.  The
+   signess is specified with SGN.  The result is the same precision as
+   the operands.  The mode is the same mode as the operands.  The
+   signess is specified with y.  */
+
+wide_int
+wide_int::mul_high (const wide_int &op1, SignOp sgn) const
+{
+  bool overflow;
+
+  return mul_internal (true, false, this, &op1, sgn, &overflow, false);
+}
+
+/* Negate THIS.  */
+
+wide_int
+wide_int::neg () const
+{
+  wide_int z = wide_int::from_shwi (0, bitsize, precision);
+  return z - *this;
+}
+
+/* Compute the parity of THIS.  */
+
+wide_int
+wide_int::parity (unsigned int bs, unsigned int prec) const
+{
+  int count = popcount ();
+  return wide_int::from_shwi (count & 1, bs, prec);
+}
+
+/* Compute the population count of THIS producing a number with
+   BITSIZE and PREC.  */
+
+wide_int
+wide_int::popcount (unsigned int bs, unsigned int prec) const
+{
+  return wide_int::from_shwi (popcount (), bs, prec);
+}
+
+/* Compute the population count of THIS.  */
+
+int
+wide_int::popcount () const
+{
+  int i;
+  int start;
+  int count;
+  HOST_WIDE_INT v;
+  int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+
+  /* The high order block is special if it is the last block and the
+     precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
+     have to clear out any ones above the precision before doing clz
+     on this block.  */
+  if (BLOCKS_NEEDED (precision) == len && small_prec)
+    {
+      v = zext_hwi (val[len - 1], small_prec);
+      count = popcount_hwi (v);
+      start = len - 2;
+    }
+  else
+    {
+      if (sign_mask ())
+	count = HOST_BITS_PER_WIDE_INT * (BLOCKS_NEEDED (precision) - len);
+      else
+	count = 0;
+      start = len - 1;
+    }
+
+  for (i = start; i >= 0; i--)
+    {
+      v = val[i];
+      count += popcount_hwi (v);
+    }
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_vw ("wide_int::popcount", count, *this);
+#endif
+  return count;
+}
+
+/* Subtract of THIS and OP1.  No overflow is detected.  */
+
+wide_int
+wide_int::operator - (const wide_int &op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  /* We implement subtraction as an in place negate and add.  Negation
+     is just inversion and add 1, so we can do the add of 1 by just
+     starting the carry in of the first element at 1.  */
+  unsigned HOST_WIDE_INT carry = 1;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec, stop;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      o0 = val[0];
+      o1 = op1.val[0];
+      result.val[0] = sext_hwi (o0 - o1, precision);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator -", result, *this, op1);
+#endif
+      return result;
+    }
+
+  stop = len > op1.len ? len : op1.len;
+  /* Need to do one extra block just to handle the special cases.  */
+  if (stop < (unsigned)BLOCKS_NEEDED (precision))
+    stop++;
+
+  result.len = stop;
+  mask0 = sign_mask ();
+  mask1 = ~op1.sign_mask ();
+
+  /* Subtract all of the explicitly defined elements.  */
+  for (i = 0; i < stop; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < op1.len ? (unsigned HOST_WIDE_INT)~op1.val[i] : mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::operator -", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Subtract of signed OP1 from THIS.  No overflow is detected.  */
+
+wide_int
+wide_int::operator - (HOST_WIDE_INT op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  /* We implement subtraction as an in place negate and add.  Negation
+     is just inversion and add 1, so we can do the add of 1 by just
+     starting the carry in of the first element at 1.  */
+  unsigned HOST_WIDE_INT carry = 1;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec, stop;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      o0 = val[0];
+      result.val[0] = sext_hwi (o0 - op1, precision);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator -", result, *this, op1);
+#endif
+      return result;
+    }
+
+  stop = len;
+  /* Need to do one extra block just to handle the special cases.  */
+  if (stop < (unsigned)BLOCKS_NEEDED (precision))
+    stop++;
+
+  result.len = stop;
+  mask0 = sign_mask ();
+  mask1 = ~(op1 >> (HOST_BITS_PER_WIDE_INT - 1));
+
+  /* Subtract all of the explicitly defined elements.  */
+  for (i = 0; i < stop; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < 1 ? (unsigned HOST_WIDE_INT)~op1 : mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator -", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Subtract of unsigned OP1 from THIS.  No overflow is detected.  */
+
+wide_int
+wide_int::operator - (unsigned HOST_WIDE_INT op1) const
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0, o1;
+  unsigned HOST_WIDE_INT x = 0;
+  /* We implement subtraction as an in place negate and add.  Negation
+     is just inversion and add 1, so we can do the add of 1 by just
+     starting the carry in of the first element at 1.  */
+  unsigned HOST_WIDE_INT carry = 1;
+  unsigned HOST_WIDE_INT mask0, mask1;
+  unsigned int i, small_prec, stop;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+ 
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.len = 1;
+      o0 = val[0];
+      result.val[0] = sext_hwi (o0 - op1, precision);
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator -", result, *this, op1);
+#endif
+      return result;
+    }
+
+  stop = len;
+  /* Need to do one extra block just to handle the special cases.  */
+  if (stop < (unsigned)BLOCKS_NEEDED (precision))
+    stop++;
+
+  result.len = stop;
+  mask0 = sign_mask ();
+  mask1 = (HOST_WIDE_INT) -1;
+
+  /* Subtract all of the explicitly defined elements.  */
+  for (i = 0; i < stop; i++)
+    {
+      o0 = i < len ? (unsigned HOST_WIDE_INT)val[i] : mask0;
+      o1 = i < 1 ? (unsigned HOST_WIDE_INT)~op1 : mask1;
+      x = o0 + o1 + carry;
+      result.val[i] = x;
+      carry = x < o0;
+    }
+
+  small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec != 0 && BLOCKS_NEEDED (precision) == result.len)
+    {
+      /* Modes with weird precisions.  */
+      i = result.len - 1;
+      result.val[i] = sext_hwi (result.val[i], small_prec);
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::operator -", result, *this, op1);
+#endif
+  return result;
+}
+
+/* Subtract of THIS and OP1 with overflow checking.  If the result
+   overflows within the precision, set OVERFLOW.  OVERFLOW is assumed
+   to be sticky so it should be initialized.  SGN controls if signed or
+   unsigned overflow is checked.  */
+
+wide_int
+wide_int::sub_overflow (const wide_int *op0, const wide_int *op1, 
+			wide_int::SignOp sgn, bool *overflow)
+{
+  wide_int result;
+  unsigned HOST_WIDE_INT o0 = 0;
+  unsigned HOST_WIDE_INT o1 = 0;
+  unsigned HOST_WIDE_INT x = 0;
+  /* We implement subtraction as an in place negate and add.  Negation
+     is just inversion and add 1, so we can do the add of 1 by just
+     starting the carry in of the first element at 1.  */
+  unsigned HOST_WIDE_INT carry = 1;
+  int i, small_prec;
+
+  result.bitsize = op0->bitsize;
+  result.precision = op0->precision;
+
+  /* Subtract all of the explicitly defined elements.  */
+  for (i = 0; i < op0->len; i++)
+    {
+      o0 = op0->val[i];
+      o1 = ~op1->val[i];
+      x = o0 + o1 + carry;
+      result.elt_ref (i) = x;
+      carry = x < o0;
+    }
+
+  /* Uncompress the rest.  */
+  if (op1->len < op1->len)
+    {
+      unsigned HOST_WIDE_INT mask = op1->sign_mask ();
+      for (i = op0->len; i < op1->len; i++)
+	{
+	  o0 = op0->val[i];
+	  o1 = ~mask;
+	  x = o0 + o1 + carry;
+	  result.elt_ref (i) = x;
+	  carry = x < o0;
+	}
+    }
+  else if (op0->len > op1->len)
+    {
+      unsigned HOST_WIDE_INT mask = op1->sign_mask ();
+      for (i = op0->len; i < op1->len; i++)
+	{
+	  o0 = mask;
+	  o1 = ~op1->val[i];
+	  x = o0 + o1 + carry;
+	  result.val[i] = x;
+	  carry = x < o0;
+	}
+    }
+
+  result.set_len (op0->len);
+  small_prec = op0->precision & (HOST_BITS_PER_WIDE_INT - 1);
+  if (small_prec == 0)
+    {
+      if (op0->len * HOST_BITS_PER_WIDE_INT < op0->precision)
+	{
+	  /* If the carry is 1, then we need another word.  If the carry
+	     is 0, we only need another word if the top bit is 1.  */
+	  if (carry == 1
+	      || (x >> (HOST_BITS_PER_WIDE_INT - 1) == 1))
+	    {
+	      /* Check for signed overflow.  */
+	      result.val[result.len] = carry;
+	      result.len++;
+	    }
+	  if (sgn == wide_int::SIGNED)
+	    {
+	      if (((x ^ o0) & (x ^ o1)) >> (HOST_BITS_PER_WIDE_INT - 1))
+		*overflow = true;
+	    }
+	  else if (carry)
+	    {
+	      if ((~o0) < o1)
+		*overflow = true;
+	    }
+	  else
+	    {
+	      if ((~o0) <= o1)
+		*overflow = true;
+	    }
+	}
+    }
+  else
+    {
+      /* Overflow in this case is easy since we can see bits beyond
+	 the precision.  If the value computed is not the sign
+	 extended value, then we have overflow.  */
+      unsigned HOST_WIDE_INT y;
+
+      if (sgn == wide_int::UNSIGNED)
+	{
+	  /* The caveat for unsigned is to get rid of the bits above
+	     the precision before doing the addition.  To check the
+	     overflow, clear these bits and then redo the last
+	     addition.  Then the rest of the code just works.  */
+	  o0 = zext_hwi (o0, small_prec);
+	  o1 = zext_hwi (o1, small_prec);
+	  x = o0 + o1 + carry;
+	}
+      /* Short integers and modes with weird precisions.  */
+      y = sext_hwi (x, small_prec);
+      result.len =  op1->len;
+      if (BLOCKS_NEEDED (op1->precision) == result.len && x != y)
+	*overflow = true;
+      /* Then put the sign extended form back because that is the
+	 canonical form.  */
+      result.val[result.len - 1] = y;
+    }
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_www ("wide_int::sub_overflow", result, *op0, *op1);
+#endif
+  return result;
+}
+
+/* sub X from THIS.  If overflow occurs, set OVERFLOW.  */
+
+wide_int
+wide_int::sub (const wide_int &x, SignOp sgn, bool *overflow) const
+{
+  return sub_overflow (this, &x, sgn, overflow);
+}
+
+/*
+ * Division and Mod
+ */
+
+/* Compute B_QUOTIENT and B_REMAINDER from B_DIVIDEND/B_DIVISOR.  The
+   algorithm is a small modification of the algorithm in Hacker's
+   Delight by Warren, which itself is a small modification of Knuth's
+   algorithm.  M is the number of significant elements of U however
+   there needs to be at least one extra element of B_DIVIDEND
+   allocated, N is the number of elements of B_DIVISOR.  */
+
+static void
+divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient, 
+		   unsigned HOST_HALF_WIDE_INT *b_remainder,
+		   unsigned HOST_HALF_WIDE_INT *b_dividend, 
+		   unsigned HOST_HALF_WIDE_INT *b_divisor, 
+		   int m, int n)
+{
+  /* The "digits" are a HOST_HALF_WIDE_INT which the size of half of a
+     HOST_WIDE_INT and stored in the lower bits of each word.  This
+     algorithm should work properly on both 32 and 64 bit
+     machines.  */
+  unsigned HOST_WIDE_INT b
+    = (unsigned HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT;
+  unsigned HOST_WIDE_INT qhat;   /* Estimate of quotient digit.  */
+  unsigned HOST_WIDE_INT rhat;   /* A remainder.  */
+  unsigned HOST_WIDE_INT p;      /* Product of two digits.  */
+  HOST_WIDE_INT s, i, j, t, k;
+
+  /* Single digit divisor.  */
+  if (n == 1)
+    {
+      k = 0;
+      for (j = m - 1; j >= 0; j--)
+	{
+	  b_quotient[j] = (k * b + b_dividend[j])/b_divisor[0];
+	  k = ((k * b + b_dividend[j])
+	       - ((unsigned HOST_WIDE_INT)b_quotient[j]
+		  * (unsigned HOST_WIDE_INT)b_divisor[0]));
+	}
+      b_remainder[0] = k;
+      return;
+    }
+
+  s = clz_hwi (b_divisor[n-1]) - HOST_BITS_PER_HALF_WIDE_INT; /* CHECK clz */
+
+  /* Normalize B_DIVIDEND and B_DIVISOR.  Unlike the published
+     algorithm, we can overwrite b_dividend and b_divisor, so we do
+     that.  */
+  for (i = n - 1; i > 0; i--)
+    b_divisor[i] = (b_divisor[i] << s)
+      | (b_divisor[i-1] >> (HOST_BITS_PER_HALF_WIDE_INT - s));
+  b_divisor[0] = b_divisor[0] << s;
+
+  b_dividend[m] = b_dividend[m-1] >> (HOST_BITS_PER_HALF_WIDE_INT - s);
+  for (i = m - 1; i > 0; i--)
+    b_dividend[i] = (b_dividend[i] << s)
+      | (b_dividend[i-1] >> (HOST_BITS_PER_HALF_WIDE_INT - s));
+  b_dividend[0] = b_dividend[0] << s;
+
+  /* Main loop.  */
+  for (j = m - n; j >= 0; j--)
+    {
+      qhat = (b_dividend[j+n] * b + b_dividend[j+n-1]) / b_divisor[n-1];
+      rhat = (b_dividend[j+n] * b + b_dividend[j+n-1]) - qhat * b_divisor[n-1];
+    again:
+      if (qhat >= b || qhat * b_divisor[n-2] > b * rhat + b_dividend[j+n-2])
+	{
+	  qhat -= 1;
+	  rhat += b_divisor[n-1];
+	  if (rhat < b)
+	    goto again;
+	}
+
+      /* Multiply and subtract.  */
+      k = 0;
+      for (i = 0; i < n; i++)
+	{
+	  p = qhat * b_divisor[i];
+	  t = b_dividend[i+j] - k - (p & HALF_INT_MASK);
+	  b_dividend[i + j] = t;
+	  k = ((p >> HOST_BITS_PER_HALF_WIDE_INT)
+	       - (t >> HOST_BITS_PER_HALF_WIDE_INT));
+	}
+      t = b_dividend[j+n] - k;
+      b_dividend[j+n] = t;
+
+      b_quotient[j] = qhat;
+      if (t < 0)
+	{
+	  b_quotient[j] -= 1;
+	  k = 0;
+	  for (i = 0; i < n; i++)
+	    {
+	      t = (HOST_WIDE_INT)b_dividend[i+j] + b_divisor[i] + k;
+	      b_dividend[i+j] = t;
+	      k = t >> HOST_BITS_PER_HALF_WIDE_INT;
+	    }
+	  b_dividend[j+n] += k;
+	}
+    }
+  for (i = 0; i < n; i++)
+    b_remainder[i] = (b_dividend[i] >> s) 
+      | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s));
+}
+
+
+/* Do a truncating divide DIVISOR into DIVIDEND.  The result is the
+   same size as the operands.  SIGN is either wide_int::SIGNED or
+   wide_int::UNSIGNED.  */
+
+static wide_int
+divmod_internal (bool compute_quotient, 
+		 const wide_int *dividend, const wide_int *divisor,
+		 wide_int::SignOp sgn, wide_int *remainder,
+		 bool compute_remainder, 
+		 bool *overflow)
+{
+  wide_int quotient, u0, u1;
+  unsigned int prec = dividend->get_precision();
+  unsigned int bs = dividend->get_bitsize ();
+  int blocks_needed = 2 * BLOCKS_NEEDED (prec);
+  unsigned HOST_HALF_WIDE_INT b_quotient[2 * MAX_BITSIZE_MODE_ANY_INT
+				/ HOST_BITS_PER_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT b_remainder[2 * MAX_BITSIZE_MODE_ANY_INT
+				/ HOST_BITS_PER_WIDE_INT];
+  unsigned HOST_HALF_WIDE_INT b_dividend[(2 * MAX_BITSIZE_MODE_ANY_INT
+				 / HOST_BITS_PER_WIDE_INT) + 1];
+  unsigned HOST_HALF_WIDE_INT b_divisor[2 * MAX_BITSIZE_MODE_ANY_INT
+				/ HOST_BITS_PER_WIDE_INT];
+  int m, n;
+  bool dividend_neg = false;
+  bool divisor_neg = false;
+
+  if ((*divisor).zero_p ())
+    *overflow = true;
+
+  /* The smallest signed number / -1 causes overflow.  */
+  if (sgn == wide_int::SIGNED)
+    {
+      wide_int t = wide_int::set_bit_in_zero (prec - 1, 
+					      bs, 
+					      prec);
+      if (*dividend == t && (*divisor).minus_one_p ())
+	*overflow = true;
+    }
+
+  quotient.set_bitsize (bs);
+  remainder->set_bitsize (bs);
+  quotient.set_precision (prec);
+  remainder->set_precision (prec);
+
+  /* If overflow is set, just get out.  There will only be grief by
+     continuing.  */
+  if (*overflow)
+    {
+      if (compute_remainder)
+	{
+	  remainder->set_len (1);
+	  remainder->elt_ref (0) = 0;
+	}
+      return wide_int::zero (bs, prec);
+    }
+
+  /* Do it on the host if you can.  */
+  if (prec <= HOST_BITS_PER_WIDE_INT)
+    {
+      quotient.set_len (1);
+      remainder->set_len (1);
+      if (sgn == wide_int::SIGNED)
+	{
+	  quotient.elt_ref (0) 
+	    = sext_hwi (dividend->elt (0) / divisor->elt (0), prec);
+	  remainder->elt_ref (0) 
+	    = sext_hwi (dividend->elt (0) % divisor->elt (0), prec);
+	}
+      else
+	{
+	  unsigned HOST_WIDE_INT o0 = dividend->elt (0);
+	  unsigned HOST_WIDE_INT o1 = divisor->elt (0);
+
+	  if (prec < HOST_BITS_PER_WIDE_INT)
+	    {
+	      o0 = zext_hwi (o0, prec);
+	      o1 = zext_hwi (o1, prec);
+	    }
+	  quotient.elt_ref (0) = sext_hwi (o0 / o1, prec);
+	  remainder->elt_ref (0) = sext_hwi (o0 % o1, prec);
+	}
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwww ("wide_int::divmod", quotient, *remainder, *dividend, *divisor);
+#endif
+      return quotient;
+    }
+
+  /* Make the divisor and divident positive and remember what we
+     did.  */
+  if (sgn == wide_int::SIGNED)
+    {
+      if (dividend->sign_mask ())
+	{
+	  u0 = dividend->neg ();
+	  dividend = &u0;
+	  dividend_neg = true;
+	}
+      if (divisor->sign_mask ())
+	{
+	  u1 = divisor->neg ();
+	  divisor = &u1;
+	  divisor_neg = true;
+	}
+    }
+
+  wi_unpack (b_dividend, &dividend->uelt_ref (0), dividend->get_len (),
+	     blocks_needed);
+  wi_unpack (b_divisor, &divisor->uelt_ref (0), divisor->get_len (),
+	     blocks_needed);
+
+  if (dividend->sign_mask ())
+    m = blocks_needed;
+  else
+    m = 2 * dividend->get_len ();
+
+  if (divisor->sign_mask ())
+    n = blocks_needed;
+  else
+    n = 2 * divisor->get_len ();
+
+  divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);
+
+  if (compute_quotient)
+    {
+      wi_pack (&quotient.uelt_ref (0), b_quotient, m);
+      quotient.set_len (m / 2);
+      quotient.canonize ();
+      /* The quotient is neg if exactly one of the divisor or dividend is
+	 neg.  */
+      if (dividend_neg != divisor_neg)
+	quotient = quotient.neg ();
+    }
+
+  if (compute_remainder)
+    {
+      wi_pack (&remainder->uelt_ref (0), b_remainder, n);
+      remainder->set_len (n / 2);
+      (*remainder).canonize ();
+      /* The remainder is always the same sign as the dividend.  */
+      if (dividend_neg)
+	*remainder = (*remainder).neg ();
+    }
+
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwww ("wide_int::divmod", quotient, *remainder, *dividend, *divisor);
+#endif
+  return quotient;
+}
+
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is
+   truncated.  */
+
+wide_int
+wide_int::div_trunc (const wide_int &divisor, SignOp sgn) const
+{
+  wide_int remainder;
+  bool overflow;
+
+  return divmod_internal (true, this, &divisor, sgn, 
+			  &remainder, false, &overflow);
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is truncated.
+   Overflow is set to true if the result overflows, otherwise it is
+   not set.  */
+wide_int
+wide_int::div_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  
+  return divmod_internal (true, this, &divisor, sgn, 
+			  &remainder, false, overflow);
+}
+
+/* Divide DIVISOR into THIS producing both the quotient and remainder.
+   The result is the same size as the operands.  The sign is specified
+   in SGN.  The output is truncated.  */
+
+wide_int
+wide_int::divmod_trunc (const wide_int &divisor, wide_int *remainder, SignOp sgn) const
+{
+  bool overflow;
+
+  return divmod_internal (true, this, &divisor, sgn, 
+			  remainder, true, &overflow);
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is truncated.  */
+
+wide_int
+wide_int::mod_trunc (const wide_int &divisor, SignOp sgn) const
+{
+  bool overflow;
+  wide_int remainder;
+
+  divmod_internal (false, this, &divisor, sgn, 
+		   &remainder, true, &overflow);
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+
+  divmod_internal (true, this, &divisor, sgn, 
+			  &remainder, true, overflow);
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is floor
+   truncated.  Overflow is set to true if the result overflows,
+   otherwise it is not set.  */
+
+wide_int
+wide_int::div_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+  if (sgn == SIGNED && quotient.neg_p () && !remainder.zero_p ())
+    return quotient - (HOST_WIDE_INT)1;
+  return quotient;
+}
+
+
+/* Divide DIVISOR into THIS.  The remainder is also produced in
+   REMAINDER.  The result is the same size as the operands.  The sign
+   is specified in SGN.  The output is floor truncated.  Overflow is
+   set to true if the result overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::divmod_floor (const wide_int &divisor, wide_int *remainder, SignOp sgn) const
+{
+  wide_int quotient;
+  bool overflow;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      remainder, true, &overflow);
+  if (sgn == SIGNED && quotient.neg_p () && !(*remainder).zero_p ())
+    {
+      *remainder = *remainder - divisor;
+      return quotient - (HOST_WIDE_INT)1;
+    }
+  return quotient;
+}
+
+
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is floor truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (sgn == SIGNED && quotient.neg_p () && !remainder.zero_p ())
+    return remainder - divisor;
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is ceil
+   truncated.  Overflow is set to true if the result overflows,
+   otherwise it is not set.  */
+
+wide_int
+wide_int::div_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (!remainder.zero_p ())
+    {
+      if (sgn == UNSIGNED || quotient.neg_p ())
+	return quotient;
+      else
+	return quotient + (HOST_WIDE_INT)1;
+    }
+  return quotient;
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is the
+   same size as the operands.  The sign is specified in SGN.  The
+   output is ceil truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (!remainder.zero_p ())
+    {
+      if (sgn == UNSIGNED || quotient.neg_p ())
+	return remainder;
+      else
+	return remainder - divisor;
+    }
+  return remainder;
+}
+
+/* Divide DIVISOR into THIS.  The result is the same size as the
+   operands.  The sign is specified in SGN.  The output is round
+   truncated.  Overflow is set to true if the result overflows,
+   otherwise it is not set.  */
+
+wide_int
+wide_int::div_round (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+  if (!remainder.zero_p ())
+    {
+      if (sgn == SIGNED)
+	{
+	  wide_int p_remainder = remainder.neg_p () ? remainder.neg () : remainder;
+	  wide_int p_divisor = divisor.neg_p () ? divisor.neg () : divisor;
+	  p_divisor = p_divisor.rshiftu (1);
+	  
+	  if (p_divisor.gts_p (p_remainder)) 
+	    {
+	      if (quotient.neg_p ())
+		return quotient - (HOST_WIDE_INT)1;
+	      else 
+		return quotient + (HOST_WIDE_INT)1;
+	    }
+	}
+      else
+	{
+	  wide_int p_divisor = divisor.rshiftu (1);
+	  if (p_divisor.gtu_p (remainder))
+	    return quotient + (unsigned HOST_WIDE_INT)1;
+	}
+    }
+  return quotient;
+}
+
+/* Divide DIVISOR into THIS producing the remainder.  The result is
+   the same size as the operands.  The sign is specified in SGN.  The
+   output is round truncated.  Overflow is set to true if the result
+   overflows, otherwise it is not set.  */
+
+wide_int
+wide_int::mod_round (const wide_int &divisor, SignOp sgn, bool *overflow) const
+{
+  wide_int remainder;
+  wide_int quotient;
+
+  quotient = divmod_internal (true, this, &divisor, sgn, 
+			      &remainder, true, overflow);
+
+  if (!remainder.zero_p ())
+    {
+      if (sgn == SIGNED)
+	{
+	  wide_int p_remainder = remainder.neg_p () ? remainder.neg () : remainder;
+	  wide_int p_divisor = divisor.neg_p () ? divisor.neg () : divisor;
+	  p_divisor = p_divisor.rshiftu (1);
+	  
+	  if (p_divisor.gts_p (p_remainder)) 
+	    {
+	      if (quotient.neg_p ())
+		return remainder + divisor;
+	      else 
+		return remainder - divisor;
+	    }
+	}
+      else
+	{
+	  wide_int p_divisor = divisor.rshiftu (1);
+	  if (p_divisor.gtu_p (remainder))
+	    return remainder - divisor;
+	}
+    }
+  return remainder;
+}
+
+/*
+ * Shifting, rotating and extraction.
+ */
+
+/* If SHIFT_COUNT_TRUNCATED is defined, truncate CNT.   
+
+   At first look, the shift truncation code does not look right.
+   Shifts (and rotates) are done according to the precision of the
+   mode but the shift count is truncated according to the bitsize
+   of the mode.   This is how real hardware works.
+
+   On an ideal machine, like Knuth's mix machine, a shift count is a
+   word long and all of the bits of that word are examined to compute
+   the shift amount.  But on real hardware, especially on machines
+   with fast (single cycle shifts) that takes too long.  On these
+   machines, the amount of time to perform a shift dictates the cycle
+   time of the machine so corners are cut to keep this fast.  A
+   comparison of an entire 64 bit word would take something like 6
+   gate delays before the shifting can even start.
+
+   So real hardware only looks at a small part of the shift amount.
+   On ibm machines, this tends to be 1 more than what is necessary to
+   encode the shift amount.  The rest of the world looks at only the
+   minimum number of bits.  This means that only 3 gate delays are
+   necessary to set up the shifter.
+
+   On the other hand, right shifts and rotates must be according to
+   the precision or the operation does not make any sense.   */
+static inline int
+trunc_shift (unsigned int bitsize, int cnt)
+{
+#ifdef SHIFT_COUNT_TRUNCATED
+  cnt = cnt & (bitsize - 1);
+#endif
+  return cnt;
+}
+
+/* This function is called in two contexts.  If OP == TRUNC, this
+   function provides a count that matches the semantics of the target
+   machine depending on the value of SHIFT_COUNT_TRUNCATED.  Note that
+   if SHIFT_COUNT_TRUNCATED is not defined, this function may produce
+   -1 as a value if the shift amount is greater than the bitsize of
+   the mode.  -1 is a surrogate for a very large amount.
+
+   If OP == NONE, then this function always truncates the shift value
+   to the bitsize because this shifting operation is a function that
+   is internal to GCC.  */
+
+static inline int
+trunc_shift (unsigned int bitsize, const wide_int *cnt, wide_int::ShiftOp z)
+{
+  if (z == wide_int::TRUNC)
+    {
+#ifdef SHIFT_COUNT_TRUNCATED
+      return cnt->elt (0) & (bitsize - 1);
+#else
+      if (cnt.ltu (bitsize))
+	return cnt->elt (0) & (bitsize - 1);
+      else 
+	return -1;
+#endif
+    }
+  else
+    return cnt->elt (0) & (bitsize - 1);
+}
+
+/* Extract WIDTH bits from THIS starting at OFFSET.  The result is
+   assumed to fit in a HOST_WIDE_INT.  This function is safe in that
+   it can properly access elements that may not be explicitly
+   represented.  */
+
+HOST_WIDE_INT
+wide_int::extract_to_hwi (int offset, int width) const
+{
+  int start_elt, end_elt, shift;
+  HOST_WIDE_INT x;
+
+  /* Get rid of the easy cases first.   */
+  if (offset >= len * HOST_BITS_PER_WIDE_INT)
+    return sign_mask ();
+  if (offset + width <= 0)
+    return 0;
+
+  shift = offset & (HOST_BITS_PER_WIDE_INT - 1);
+  if (offset < 0)
+    {
+      start_elt = -1;
+      end_elt = 0;
+      x = 0;
+    }
+  else
+    {
+      start_elt = offset / HOST_BITS_PER_WIDE_INT;
+      end_elt = (offset + width - 1) / HOST_BITS_PER_WIDE_INT;
+      x = start_elt >= len ? sign_mask () : val[start_elt] >> shift;
+    }
+
+  if (start_elt != end_elt)
+    {
+      HOST_WIDE_INT y = end_elt == len
+	? sign_mask () : val[end_elt];
+
+      x = (unsigned HOST_WIDE_INT)x >> shift;
+      x |= y << (HOST_BITS_PER_WIDE_INT - shift);
+    }
+
+  if (width != HOST_BITS_PER_WIDE_INT)
+    x &= ((HOST_WIDE_INT)1 << width) - 1;
+
+  return x;
+}
+
+
+/* Left shift by an integer Y.  See the definition of Op.TRUNC for how
+   to set Z.  */
+
+wide_int
+wide_int::lshift (unsigned int y, ShiftOp z) const
+{
+  return lshift (y, z, bitsize, precision);
+}
+
+/* Left shifting by an wide_int shift amount.  See the definition of
+   Op.TRUNC for how to set Z.  */
+
+wide_int
+wide_int::lshift (const wide_int &y, ShiftOp z) const
+{
+  if (z == TRUNC)
+    {
+      HOST_WIDE_INT shift = trunc_shift (bitsize, &y, TRUNC);
+      if (shift == -1)
+	return wide_int::zero (bitsize, precision);
+      return lshift (shift, NONE, bitsize, precision);
+    }
+  else
+    return lshift (trunc_shift (bitsize, &y, NONE), NONE, bitsize, precision);
+}
+
+/* Left shift THIS by CNT.  See the definition of Op.TRUNC for how to
+   set Z.  Since this is used internally, it has the ability to
+   specify the BISIZE and PRECISION independently.  This is useful
+   when inserting a small value into a larger one.  */
+
+wide_int
+wide_int::lshift (unsigned int cnt, ShiftOp op, 
+		  unsigned int bs, unsigned int res_prec) const
+{
+  wide_int result;
+  unsigned int i;
+
+  result.bitsize = bs;
+  result.precision = res_prec;
+
+  if (op == TRUNC)
+    cnt = trunc_shift (bs, cnt);
+
+  /* Handle the simple case quickly.   */
+  if (res_prec <= HOST_BITS_PER_WIDE_INT)
+    {
+      result.val[0] = val[0] << cnt;
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::lshift", result, *this, cnt);
+#endif
+
+      return result;
+    }
+
+  if (cnt >= res_prec)
+    {
+      result.val[0] = 0;
+      result.len = 1;
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::lshift", result, *this, cnt);
+#endif
+      return result;
+    }
+
+  for (i = 0; i < res_prec; i += HOST_BITS_PER_WIDE_INT)
+    result.val[i / HOST_BITS_PER_WIDE_INT]
+      = extract_to_hwi (i - cnt, HOST_BITS_PER_WIDE_INT);
+
+  result.len = BLOCKS_NEEDED (res_prec);
+
+  result.canonize ();
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::lshift", result, *this, cnt);
+#endif
+
+  return result;
+}
+
+/* Rotate THIS left by Y within its precision.  */
+
+wide_int
+wide_int::lrotate (const wide_int &y) const
+{
+  return lrotate (y.extract_to_hwi (0, HOST_BITS_PER_WIDE_INT));
+}
+
+/* Rotate THIS left by CNT within its precision.  */
+
+wide_int
+wide_int::lrotate (unsigned int cnt) const
+{
+  wide_int left, right, result;
+
+  left = lshift (cnt, NONE);
+  right = rshiftu (precision - cnt, NONE);
+  result = left | right;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::lrotate", result, *this, cnt);
+#endif
+  return result;
+}
+
+/* Unsigned right shift by Y.  See the definition of Op.TRUNC for how
+   to set Z.  */
+
+wide_int
+wide_int::rshiftu (const wide_int &y, ShiftOp z) const
+{
+  if (z == TRUNC)
+    {
+      HOST_WIDE_INT shift = trunc_shift (bitsize, &y, TRUNC);
+      if (shift == -1)
+	return wide_int::zero (bitsize, precision);
+      return rshiftu (shift, NONE);
+    }
+  else
+    return rshiftu (trunc_shift (bitsize, &y, NONE), NONE);
+}
+
+/* Unsigned right shift THIS by CNT.  See the definition of Op.TRUNC
+   for how to set Z.  */
+
+wide_int
+wide_int::rshiftu (unsigned int cnt, ShiftOp trunc_op) const
+{
+  wide_int result;
+  int stop_block, offset, i;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (trunc_op == TRUNC)
+    cnt = trunc_shift (bitsize, cnt);
+
+  if (cnt == 0)
+    {
+      result = force_to_size (bitsize, precision);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::rshiftu", result, *this, cnt);
+#endif
+      return result;
+    }
+
+  /* Handle the simple case quickly.   */
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      unsigned HOST_WIDE_INT x = val[0];
+
+      if (precision < HOST_BITS_PER_WIDE_INT)
+	x = zext_hwi (x, precision);
+
+      result.val[0] = x >> cnt;
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::rshiftu", result, *this, cnt);
+#endif
+      return result;
+    }
+
+  if (cnt >= precision)
+    {
+      result.val[0] = 0;
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::rshiftu", result, *this, cnt);
+#endif
+      return result;
+    }
+
+  stop_block = BLOCKS_NEEDED (precision - cnt);
+  for (i = 0; i < stop_block; i++)
+    result.val[i]
+      = extract_to_hwi ((i * HOST_BITS_PER_WIDE_INT) + cnt,
+			HOST_BITS_PER_WIDE_INT);
+
+  result.len = stop_block;
+
+  offset = (precision - cnt) & (HOST_BITS_PER_WIDE_INT - 1);
+  if (offset)
+    result.val[stop_block - 1] = zext_hwi (result.val[stop_block - 1], offset);
+  else
+    /* The top block had a 1 at the top position so it will decompress
+       wrong unless a zero block is added.  This only works because we
+       know the shift was greater than 0.  */
+    if (result.val[stop_block - 1] < 0)
+      result.val[result.len++] = 0;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int:rshiftu", result, *this, cnt);
+#endif
+  return result;
+}
+
+/* Signed right shift by Y.  See the definition of Op.TRUNC for how to
+   set Z.  */
+wide_int
+wide_int::rshifts (const wide_int &y, ShiftOp z) const
+{
+  if (z == TRUNC)
+    {
+      HOST_WIDE_INT shift = trunc_shift (bitsize, &y, TRUNC);
+      if (shift == -1)
+	{
+	  /* The value of the shift was larger than the bitsize and this
+	     machine does not truncate the value, so the result is
+	     a smeared sign bit.  */
+	  if (neg_p ())
+	    return wide_int::minus_one (bitsize, precision);
+	  else
+	    return wide_int::zero (bitsize, precision);
+	}
+      return rshifts (shift, NONE);
+    }
+  else
+    return rshifts (trunc_shift (bitsize, &y, NONE), NONE);
+}
+
+/* Signed right shift THIS by CNT.  See the definition of Op.TRUNC for
+   how to set Z.  */
+
+wide_int
+wide_int::rshifts (unsigned int cnt, ShiftOp trunc_op) const
+{
+  wide_int result;
+  int stop_block, i;
+
+  result.bitsize = bitsize;
+  result.precision = precision;
+
+  if (trunc_op == TRUNC)
+    cnt = trunc_shift (bitsize, cnt);
+
+  if (cnt == 0)
+    {
+      result = force_to_size (bitsize, precision);
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::rshifts", result, *this, cnt);
+#endif
+      return result;
+    }
+  /* Handle the simple case quickly.   */
+  if (precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      HOST_WIDE_INT x = val[0];
+      result.val[0] = x >> cnt;
+      result.len = 1;
+
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::rshifts", result, *this, cnt);
+#endif
+      return result;
+    }
+
+  if (cnt >= precision)
+    {
+      HOST_WIDE_INT m = sign_mask ();
+      result.val[0] = m;
+      result.len = 1;
+#ifdef DEBUG_WIDE_INT
+      if (dump_file)
+	debug_wwv ("wide_int::rshifts", result, *this, cnt);
+#endif
+      return result;
+    }
+
+  stop_block = BLOCKS_NEEDED (precision - cnt);
+  for (i = 0; i < stop_block; i++)
+    result.val[i]
+      = extract_to_hwi ((i * HOST_BITS_PER_WIDE_INT) + cnt,
+			HOST_BITS_PER_WIDE_INT);
+
+  result.len = stop_block;
+
+  /* No need to sign extend the last block, since it extract_to_hwi
+     already did that.  */
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::rshifts", result, *this, cnt);
+#endif
+
+  return result;
+}
+
+/* Rotate THIS right by Y within its precision.  */
+
+wide_int
+wide_int::rrotate (const wide_int &y) const
+{
+  return rrotate (y.extract_to_hwi (0, HOST_BITS_PER_WIDE_INT));
+}
+
+/* Rotate THIS right by CNT within its precision.  */
+
+wide_int
+wide_int::rrotate (int cnt) const
+{
+  wide_int left, right, result;
+
+  left = lshift (precision - cnt, NONE);
+  right = rshiftu (cnt, NONE);
+  result = left | right;
+
+#ifdef DEBUG_WIDE_INT
+  if (dump_file)
+    debug_wwv ("wide_int::rrotate", result, *this, cnt);
+#endif
+  return result;
+}
+
+/*
+ * Private utilities.
+ */
+/* Decompress THIS for at least TARGET bits into a result with MODE.  */
+
+wide_int
+wide_int::decompress (unsigned int target, unsigned int bs, unsigned int prec) const
+{
+  wide_int result;
+  int blocks_needed = BLOCKS_NEEDED (target);
+  HOST_WIDE_INT mask;
+  int len, i;
+
+  result.bitsize = bs;
+  result.precision = prec;
+  result.len = blocks_needed;
+
+  for (i = 0; i < this->len; i++)
+    result.val[i] = val[i];
+
+  len = this->len;
+
+  /* One could argue that this should just ice.  */
+  if (target > result.precision)
+    return result;
+
+  /* The extension that we are doing here is not sign extension, it is
+     decompression.  */
+  mask = sign_mask ();
+  while (len < blocks_needed)
+    result.val[len++] = mask;
+
+  return result;
+}
+
+
+/*
+ * Private debug printing routines.
+ */
+
+/* The debugging routines print results of wide operations into the
+   dump files of the respective passes in which they were called.  */
+char *
+wide_int::dump (char* buf) const
+{
+  int i;
+  int l;
+  const char * sep = "";
+
+  l = sprintf (buf, "[%d,%d (", bitsize, precision);
+  for (i = len - 1; i >= 0; i--)
+    {
+      l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]);
+      sep = " ";
+    }
+
+  gcc_assert (len != 0);
+
+  l += sprintf (&buf[l], ")]");
+
+  gcc_assert (l < MAX);
+  return buf;
+}
+
+#ifdef DEBUG_WIDE_INT
+void
+debug_vw (const char* name, int r, const wide_int& o0)
+{
+  char buf0[MAX];
+  fprintf (dump_file, "%s: %d = %s\n", name, r, o0.dump (buf0));
+}
+
+void
+debug_vwh (const char* name, int r, const wide_int &o0,
+	   HOST_WIDE_INT o1)
+{
+  char buf0[MAX];
+  fprintf (dump_file, "%s: %d = %s 0x"HOST_WIDE_INT_PRINT_HEX" \n", name, r,
+	   o0.dump (buf0), o1);
+}
+
+void
+debug_vww (const char* name, int r, const wide_int &o0,
+	   const wide_int &o1)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  fprintf (dump_file, "%s: %d = %s OP %s\n", name, r,
+	   o0.dump (buf0), o1.dump (buf1));
+}
+
+void
+debug_wv (const char* name, const wide_int &r, int v0)
+{
+  char buf0[MAX];
+  fprintf (dump_file, "%s: %s = %d\n",
+	   name, r.dump (buf0), v0);
+}
+
+void
+debug_wvv (const char* name, const wide_int &r, int v0, int v1)
+{
+  char buf0[MAX];
+  fprintf (dump_file, "%s: %s = %d %d\n",
+	   name, r.dump (buf0), v0, v1);
+}
+
+void
+debug_wvvv (const char* name, const wide_int &r, int v0,
+	    int v1, int v2)
+{
+  char buf0[MAX];
+  fprintf (dump_file, "%s: %s = %d %d %d\n",
+	   name, r.dump (buf0), v0, v1, v2);
+}
+
+void
+debug_wwv (const char* name, const wide_int &r,
+	   const wide_int &o0, int v0)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  fprintf (dump_file, "%s: %s = %s %d\n",
+	   name, r.dump (buf0),
+	   o0.dump (buf1), v0);
+}
+
+void
+debug_wwwvv (const char* name, const wide_int &r,
+	     const wide_int &o0, const wide_int &o1, int v0, int v1)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  char buf2[MAX];
+  fprintf (dump_file, "%s: %s = %s OP %s %d %d\n",
+	   name, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2), v0, v1);
+}
+
+void
+debug_ww (const char* name, const wide_int &r, const wide_int &o0)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  fprintf (dump_file, "%s: %s = %s\n",
+	   name, r.dump (buf0),
+	   o0.dump (buf1));
+}
+
+void
+debug_www (const char* name, const wide_int &r,
+	   const wide_int &o0, const wide_int &o1)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  char buf2[MAX];
+  fprintf (dump_file, "%s: %s = %s OP %s\n",
+	   name, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2));
+}
+
+void
+debug_wwwv (const char* name, const wide_int &r,
+	    const wide_int &o0, const wide_int &o1, int v0)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  char buf2[MAX];
+  fprintf (dump_file, "%s: %s = %s OP %s %d\n",
+	   name, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2), v0);
+}
+
+void
+debug_wwww (const char* name, const wide_int &r,
+	    const wide_int &o0, const wide_int &o1, const wide_int &o2)
+{
+  char buf0[MAX];
+  char buf1[MAX];
+  char buf2[MAX];
+  char buf3[MAX];
+  fprintf (dump_file, "%s: %s = %s OP %s OP %s\n",
+	   name, r.dump (buf0),
+	   o0.dump (buf1), o1.dump (buf2), o2.dump (buf3));
+}
+#endif
+
diff --git a/gcc/wide-int.h b/gcc/wide-int.h
new file mode 100644
index 0000000..efd2c01
--- /dev/null
+++ b/gcc/wide-int.h
@@ -0,0 +1,1109 @@ 
+/* Operations with very long integers.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef WIDE_INT_H
+#define WIDE_INT_H
+
+/* A wide integer is currently represented as a vector of
+   HOST_WIDE_INTs.  The vector contains enough elements to hold a
+   value of MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT which is
+   a derived for each host target combination.  The values are stored
+   in the vector with the least signicant HOST_BITS_PER_WIDE_INT bits
+   of the value stored in element 0.
+
+   A wide_int contains four fields: the vector (VAL), the bitsize,
+   precision and a length, (LEN).  The length is the number of HWIs
+   needed to represent the value.
+
+   Since most integers used in a compiler are small values, it is
+   generally profitable to use a representation of the value that is
+   shorter than the modes precision.  LEN is used to indicate the
+   number of elements of the vector that are in use.  When LEN *
+   HOST_BITS_PER_WIDE_INT < the precision, the value has been
+   compressed.  The values of the elements of the vector greater than
+   LEN - 1. are all equal to the highest order bit of LEN.
+
+   The representation does not contain any information about
+   signedness of the represented value, so it can be used to represent
+   both signed and unsigned numbers.  For operations where the results
+   depend on signedness (division, comparisons), the signedness must
+   be specified separately.  For operations where the signness
+   matters, one of the operands to the operation specifies either
+   wide_int::SIGNED or wide_int::UNSIGNED.
+
+   All constructors for wide_int take either a bitsize and precision,
+   an enum machine_mode or tree_type.  */
+
+
+#ifndef GENERATOR_FILE
+#include "tree.h"
+#include "hwint.h"
+#include "options.h"
+#include "tm.h"
+#include "insn-modes.h"
+#include "machmode.h"
+#include "double-int.h"
+#include <gmp.h>
+#include "insn-modes.h"
+
+
+class wide_int {
+  /* Internal representation.  */
+  
+  /* VAL is set to a size that is capable of computing a full
+     multiplication on the largest mode that is represented on the
+     target.  The full multiplication is use by tree-vrp.  If
+     operations are added that require larger buffers, then VAL needs
+     to be changed.  */
+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
+  unsigned short len;
+  unsigned int bitsize;
+  unsigned int precision;
+
+ public:
+  enum ShiftOp {
+    NONE,
+    /* There are two uses for the wide-int shifting functions.  The
+       first use is as an emulation of the target hardware.  The
+       second use is as service routines for other optimizations.  The
+       first case needs to be identified by passing TRUNC as the value
+       of ShiftOp so that shift amount is properly handled according to the
+       SHIFT_COUNT_TRUNCATED flag.  For the second case, the shift
+       amount is always truncated by the bytesize of the mode of
+       THIS.  */
+    TRUNC
+  };
+
+  enum SignOp {
+    /* Many of the math functions produce different results depending
+       on if they are SIGNED or UNSIGNED.  In general, there are two
+       different functions, whose names are prefixed with an 'S" and
+       or an 'U'.  However, for some math functions there is also a
+       routine that does not have the prefix and takes an SignOp
+       parameter of SIGNED or UNSIGNED.  */
+    SIGNED,
+    UNSIGNED
+  };
+
+  /* Conversions.  */
+
+  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize, 
+			     unsigned int precision);
+  static wide_int from_shwi (HOST_WIDE_INT op0, unsigned int bitsize, 
+			     unsigned int precision, bool *overflow);
+  static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, unsigned int bitsize, 
+			     unsigned int precision);
+  static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, unsigned int bitsize, 
+			     unsigned int precision, bool *overflow);
+
+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type);
+  inline static wide_int from_hwi (HOST_WIDE_INT op0, const_tree type, 
+				   bool *overflow);
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, enum machine_mode mode);
+  inline static wide_int from_shwi (HOST_WIDE_INT op0, enum machine_mode mode, 
+				    bool *overflow);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, 
+				    enum machine_mode mode);
+  inline static wide_int from_uhwi (unsigned HOST_WIDE_INT op0, 
+				    enum machine_mode mode, 
+				    bool *overflow);
+
+  static wide_int from_double_int (enum machine_mode, double_int);
+  static wide_int from_tree (const_tree);
+  static wide_int from_rtx (const_rtx, enum machine_mode);
+
+  HOST_WIDE_INT to_shwi () const;
+  HOST_WIDE_INT to_shwi (unsigned int prec) const;
+  unsigned HOST_WIDE_INT to_uhwi () const;
+  unsigned HOST_WIDE_INT to_uhwi (unsigned int prec) const;
+
+  /* Largest and smallest values that are represented in modes or precisions.  */
+
+  static wide_int max_value (unsigned int bitsize, unsigned int prec, SignOp sgn);
+  static wide_int max_value (const_tree type);
+  static wide_int max_value (enum machine_mode mode, SignOp sgn);
+  
+  static wide_int min_value (unsigned int bitsize, unsigned int prec, SignOp sgn);
+  static wide_int min_value (const_tree type);
+  static wide_int min_value (enum machine_mode mode, SignOp sgn);
+  
+  /* Small constants */
+
+  inline static wide_int minus_one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int minus_one (const_tree type);
+  inline static wide_int minus_one (enum machine_mode mode);
+  inline static wide_int zero (unsigned int bitsize, unsigned int prec);
+  inline static wide_int zero (const_tree type);
+  inline static wide_int zero (enum machine_mode mode);
+  inline static wide_int one (unsigned int bitsize, unsigned int prec);
+  inline static wide_int one (const_tree type);
+  inline static wide_int one (enum machine_mode mode);
+  inline static wide_int two (unsigned int bitsize, unsigned int prec);
+  inline static wide_int two (const_tree type);
+  inline static wide_int two (enum machine_mode mode);
+  inline static wide_int ten (unsigned int bitsize, unsigned int prec);
+  inline static wide_int ten (const_tree type);
+  inline static wide_int ten (enum machine_mode mode);
+
+  /* Accessors.  */
+
+  inline unsigned short get_len () const;
+  inline unsigned int get_bitsize () const;
+  inline unsigned int get_precision () const;
+  inline unsigned int get_full_len () const;
+  inline HOST_WIDE_INT elt (unsigned int i) const;
+
+  /* The setters should rarely be used.   They are for the few places
+     where wide_ints are constructed inside some other class.  */
+  inline void set_len (unsigned int);
+  inline void set_bitsize (unsigned int);
+  inline void set_precision (unsigned int);
+  inline HOST_WIDE_INT& elt_ref (unsigned int i);
+  inline unsigned HOST_WIDE_INT& uelt_ref (unsigned int i);
+  inline const unsigned HOST_WIDE_INT& uelt_ref (unsigned int i) const;
+
+  /* Utility routines.  */
+
+  void canonize ();
+  wide_int force_to_size (unsigned int bitsize, 
+			  unsigned int precision) const;
+
+  /* Printing functions.  */
+
+  void print_dec (char *buf, SignOp sgn) const;
+  void print_dec (FILE *file, SignOp sgn) const;
+  void print_decs (char *buf) const;
+  void print_decs (FILE *file) const;
+  void print_decu (char *buf) const;
+  void print_decu (FILE *file) const;
+  void print_hex (char *buf) const;
+  void print_hex (FILE *file) const;
+
+  /* Comparative functions.  */
+
+  inline bool minus_one_p () const;
+  inline bool zero_p () const;
+  inline bool one_p () const;
+  inline bool neg_p () const;
+
+  bool operator == (const wide_int &y) const;
+  inline bool operator != (const wide_int &y) const;
+  inline bool gt_p (HOST_WIDE_INT x, SignOp sgn) const;
+  inline bool gt_p (const wide_int &x, SignOp sgn) const;
+  bool gts_p (HOST_WIDE_INT y) const;
+  inline bool gts_p (const wide_int &y) const;
+  bool gtu_p (unsigned HOST_WIDE_INT y) const;
+  inline bool gtu_p (const wide_int &y) const;
+
+  inline bool lt_p (const HOST_WIDE_INT x, SignOp sgn) const;
+  inline bool lt_p (const wide_int &x, SignOp sgn) const;
+  bool lts_p (HOST_WIDE_INT y) const;
+  bool lts_p (const wide_int &y) const;
+  bool ltu_p (unsigned HOST_WIDE_INT y) const;
+  bool ltu_p (const wide_int &y) const;
+
+  bool only_sign_bit_p (unsigned int prec) const;
+  bool only_sign_bit_p () const;
+  inline bool fits_uhwi_p () const;
+  inline bool fits_shwi_p () const;
+  bool fits_to_tree_p (const_tree type) const;
+  bool fits_u_p (unsigned int prec) const;
+  bool fits_s_p (unsigned int prec) const;
+
+  /* Min and max */
+
+  inline wide_int smin (const wide_int &op1) const;
+  inline wide_int smax (const wide_int &op1) const;
+  inline wide_int umin (const wide_int &op1) const;
+  inline wide_int umax (const wide_int &op1) const;
+
+  /* Extension  */
+
+  inline wide_int ext (unsigned int offset, SignOp sgn) const;
+  wide_int sext (unsigned int offset) const;
+  wide_int sext (enum machine_mode mode) const;
+  wide_int zext (unsigned int offset) const;
+  wide_int zext (enum machine_mode mode) const;
+
+  /* Masking, and Insertion  */
+
+  wide_int set_bit (unsigned int bitpos) const;
+  static wide_int set_bit_in_zero (unsigned int, 
+				   unsigned int bitsize, 
+				   unsigned int prec);
+  inline static wide_int set_bit_in_zero (unsigned int, 
+					  enum machine_mode mode);
+  inline static wide_int set_bit_in_zero (unsigned int, const_tree type);
+  wide_int insert (const wide_int &op0, unsigned int offset,
+		   unsigned int width) const;
+  static wide_int mask (unsigned int start, bool negate, 
+			unsigned int bitsize, unsigned int prec);
+  inline static wide_int mask (unsigned int start, bool negate, 
+			       enum machine_mode mode);
+  inline static wide_int mask (unsigned int start, bool negate,
+			       const_tree type);
+  wide_int bswap () const;
+  static wide_int shifted_mask (unsigned int start, unsigned int width,
+				bool negate,
+				unsigned int bitsize, unsigned int prec);
+  inline static wide_int shifted_mask (unsigned int start, unsigned int width, 
+				       bool negate, enum machine_mode mode);
+  inline static wide_int shifted_mask (unsigned int start, unsigned int width, 
+				       bool negate, const_tree type);
+  inline HOST_WIDE_INT sign_mask () const;
+
+  /* Logicals */
+
+  wide_int operator & (const wide_int &y) const;
+  wide_int and_not (const wide_int &y) const;
+  wide_int operator ~ () const;
+  wide_int or_not (const wide_int &y) const;
+  wide_int operator | (const wide_int &y) const;
+  wide_int operator ^ (const wide_int &y) const;
+
+  /* Arithmetic operation functions, alpha sorted.  */
+  wide_int abs () const;
+  wide_int operator + (const wide_int &y) const;
+  wide_int operator + (HOST_WIDE_INT y) const;
+  wide_int operator + (unsigned HOST_WIDE_INT y) const;
+  wide_int add (const wide_int &x, SignOp sgn, bool *overflow) const;
+  wide_int clz (unsigned int bitsize, unsigned int prec) const;
+  int clz () const;
+  wide_int clrsb (unsigned int bitsize, unsigned int prec) const;
+  int clrsb () const;
+  int cmp (const wide_int &y, SignOp sgn) const;
+  int cmps (const wide_int &y) const;
+  int cmpu (const wide_int &y) const;
+  wide_int ctz (unsigned int bitsize, unsigned int prec) const;
+  int ctz () const;
+  int exact_log2 () const;
+  wide_int ffs () const;
+  wide_int operator * (const wide_int &y) const;
+  wide_int mul (const wide_int &x, SignOp sgn, bool *overflow) const;
+  inline wide_int smul (const wide_int &x, bool *overflow) const;
+  inline wide_int umul (const wide_int &x, bool *overflow) const;
+  wide_int mul_full (const wide_int &x, SignOp sgn) const;
+  inline wide_int umul_full (const wide_int &x) const;
+  inline wide_int smul_full (const wide_int &x) const;
+  wide_int mul_high (const wide_int &x, SignOp sgn) const;
+  wide_int neg () const;
+  wide_int neg_overflow (bool *z) const;
+  wide_int parity (unsigned int bitsize, unsigned int prec) const;
+  int popcount () const;
+  wide_int popcount (unsigned int bitsize, unsigned int prec) const;
+  wide_int operator - (const wide_int &y) const;
+  wide_int operator - (HOST_WIDE_INT y) const;
+  wide_int operator - (unsigned HOST_WIDE_INT y) const;
+  wide_int sub (const wide_int &x, SignOp sgn, bool *overflow) const;
+
+  /* Divison and mod.  These are the ones that are actually used, but
+     there are a lot of them.  */
+
+  wide_int div_trunc (const wide_int &divisor, SignOp sgn) const;
+  wide_int div_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int sdiv_trunc (const wide_int &divisor) const;
+  inline wide_int udiv_trunc (const wide_int &divisor) const;
+
+  wide_int div_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int udiv_floor (const wide_int &divisor) const;
+  inline wide_int sdiv_floor (const wide_int &divisor) const;
+  wide_int div_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  wide_int div_round (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+
+  wide_int divmod_trunc (const wide_int &divisor, wide_int *mod, SignOp sgn) const;
+  inline wide_int sdivmod_trunc (const wide_int &divisor, wide_int *mod) const;
+  inline wide_int udivmod_trunc (const wide_int &divisor, wide_int *mod) const;
+
+  wide_int divmod_floor (const wide_int &divisor, wide_int *mod, SignOp sgn) const;
+  inline wide_int sdivmod_floor (const wide_int &divisor, wide_int *mod) const;
+
+  wide_int mod_trunc (const wide_int &divisor, SignOp sgn) const;
+  wide_int mod_trunc (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int smod_trunc (const wide_int &divisor) const;
+  inline wide_int umod_trunc (const wide_int &divisor) const;
+
+  wide_int mod_floor (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  inline wide_int umod_floor (const wide_int &divisor) const;
+  wide_int mod_ceil (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+  wide_int mod_round (const wide_int &divisor, SignOp sgn, bool *overflow) const;
+
+  /* Shifting rotating and extracting.  */
+  HOST_WIDE_INT extract_to_hwi (int offset, int width) const;
+
+  wide_int lshift (const wide_int &y, ShiftOp z = NONE) const;
+  wide_int lshift (unsigned int y, ShiftOp z, unsigned int bitsize, 
+		   unsigned int precision) const;
+  wide_int lshift (unsigned int y, ShiftOp z = NONE) const;
+
+  wide_int lrotate (const wide_int &y) const;
+  wide_int lrotate (unsigned int y) const;
+
+  wide_int rshift (int y, SignOp sgn) const;
+  inline wide_int rshift (const wide_int &y, SignOp sgn, ShiftOp z = NONE) const;
+  wide_int rshiftu (const wide_int &y, ShiftOp z = NONE) const;
+  wide_int rshiftu (unsigned int y, ShiftOp z = NONE) const;
+  wide_int rshifts (const wide_int &y, ShiftOp z = NONE) const;
+  wide_int rshifts (unsigned int y, ShiftOp z = NONE) const;
+
+  wide_int rrotate (const wide_int &y) const;
+  wide_int rrotate (int y) const;
+
+  static const int DUMP_MAX = (2 * (MAX_BITSIZE_MODE_ANY_INT / 4
+			       + MAX_BITSIZE_MODE_ANY_INT 
+				    / HOST_BITS_PER_WIDE_INT + 32));
+  char *dump (char* buf) const;
+ private:
+
+  /* Private utility routines.  */
+  wide_int decompress (unsigned int target, unsigned int bitsize, 
+		       unsigned int precision) const;
+  static wide_int add_overflow (const wide_int *op0, const wide_int *op1,
+				wide_int::SignOp sgn, bool *overflow);
+  static wide_int sub_overflow (const wide_int *op0, const wide_int *op1, 
+				wide_int::SignOp sgn, bool *overflow);
+};
+
+/* Insert a 1 bit into 0 at BITPOS producing an number with bitsize
+   and precision taken from MODE.  */
+
+wide_int
+wide_int::set_bit_in_zero (unsigned int bitpos, enum machine_mode mode)
+{
+  return wide_int::set_bit_in_zero (bitpos, GET_MODE_BITSIZE (mode),
+				    GET_MODE_PRECISION (mode));
+}
+
+/* Insert a 1 bit into 0 at BITPOS producing an number with bitsize
+   and precision taken from TYPE.  */
+
+wide_int
+wide_int::set_bit_in_zero (unsigned int bitpos, const_tree type)
+{
+
+  return wide_int::set_bit_in_zero (bitpos, 
+				    GET_MODE_BITSIZE (TYPE_MODE (type)),
+				    TYPE_PRECISION (type));
+}
+
+/* Return a result mask where the lower WIDTH bits are ones and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.   The result is made with bitsize
+   and precision taken from MODE.  */
+
+wide_int
+wide_int::mask (unsigned int width, bool negate, enum machine_mode mode)
+{
+  return wide_int::mask (width, negate, 
+			 GET_MODE_BITSIZE (mode),
+			 GET_MODE_PRECISION (mode));
+}
+
+/* Return a result mask where the lower WIDTH bits are ones and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  The result is made with bitsize
+   and precision taken from TYPE.  */
+
+wide_int
+wide_int::mask (unsigned int width, bool negate, const_tree type)
+{
+
+  return wide_int::mask (width, negate, 
+			 GET_MODE_BITSIZE (TYPE_MODE (type)),
+			 TYPE_PRECISION (type));
+}
+
+/* Return a result mask of WIDTH ones starting at START and the bits
+   above that up to the precision are zeros.  The result is inverted
+   if NEGATE is true.  The result is made with bitsize and precision
+   taken from MODE.  */
+
+wide_int
+wide_int::shifted_mask (unsigned int start, unsigned int width, 
+			bool negate, enum machine_mode mode)
+{
+  return wide_int::shifted_mask (start, width, negate, 
+				 GET_MODE_BITSIZE (mode),
+				 GET_MODE_PRECISION (mode));
+}
+
+/* Return a result mask of WIDTH ones starting at START and the
+   bits above that up to the precision are zeros.  The result is
+   inverted if NEGATE is true.  The result is made with bitsize
+   and precision taken from TYPE.  */
+
+wide_int
+wide_int::shifted_mask (unsigned int start, unsigned int width, 
+			bool negate, const_tree type)
+{
+
+  return wide_int::shifted_mask (start, width, negate, 
+				 GET_MODE_BITSIZE (TYPE_MODE (type)),
+				 TYPE_PRECISION (type));
+}
+
+/* Produce 0 or -1 that is the smear of the sign bit.  */
+
+HOST_WIDE_INT
+wide_int::sign_mask () const
+{
+  int i = len - 1;
+  if (precision < HOST_BITS_PER_WIDE_INT)
+    return ((val[0] << (HOST_BITS_PER_WIDE_INT - precision))
+	    >> (HOST_BITS_PER_WIDE_INT - 1));
+
+  /* VRP appears to be badly broken and this is a very ugly fix.  */
+  if (i >= 0)
+    return val[i] >> (HOST_BITS_PER_WIDE_INT - 1);
+
+  gcc_unreachable ();
+}
+
+/* Conversions */
+
+/* Convert OP1 into a wide_int with parameters taken from TYPE.  */
+
+wide_int
+wide_int::from_hwi (HOST_WIDE_INT op0, const_tree type)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  if (TYPE_UNSIGNED (type))
+    return wide_int::from_uhwi (op0, bitsize, prec);
+  else
+    return wide_int::from_shwi (op0, bitsize, prec);
+}
+
+/* Convert OP1 into a wide_int with parameters taken from TYPE.  If
+   the value does not fit, set OVERFLOW.  */
+
+wide_int
+wide_int::from_hwi (HOST_WIDE_INT op0, const_tree type, 
+			    bool *overflow)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
+  unsigned int prec = TYPE_PRECISION (type);
+
+  if (TYPE_UNSIGNED (type))
+    return wide_int::from_uhwi (op0, bitsize, prec, overflow);
+  else
+    return wide_int::from_shwi (op0, bitsize, prec, overflow);
+}
+
+/* Convert signed OP1 into a wide_int with parameters taken from
+   MODE.  */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, enum machine_mode mode)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_shwi (op0, bitsize, prec);
+}
+
+/* Convert signed OP1 into a wide_int with parameters taken from
+   MODE. If the value does not fit, set OVERFLOW. */
+
+wide_int
+wide_int::from_shwi (HOST_WIDE_INT op0, enum machine_mode mode, 
+	   bool *overflow)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_shwi (op0, bitsize, prec, overflow);
+}
+
+/* Convert unsigned OP1 into a wide_int with parameters taken from
+   MODE.  */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0, enum machine_mode mode)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_uhwi (op0, bitsize, prec);
+}
+
+/* Convert unsigned OP1 into a wide_int with parameters taken from
+   MODE. If the value does not fit, set OVERFLOW. */
+
+wide_int
+wide_int::from_uhwi (unsigned HOST_WIDE_INT op0, enum machine_mode mode, 
+			     bool *overflow)
+{
+  unsigned int bitsize = GET_MODE_BITSIZE (mode);
+  unsigned int prec = GET_MODE_PRECISION (mode);
+
+  return wide_int::from_uhwi (op0, bitsize, prec, overflow);
+}
+
+/* Small constants.  */
+
+/* Return a wide int of -1 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::minus_one (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (-1, bs, prec);
+}
+
+/* Return a wide int of -1 with TYPE.  */
+
+wide_int
+wide_int::minus_one (const_tree type)
+{
+  return wide_int::from_shwi (-1, TYPE_MODE (type));
+}
+
+/* Return a wide int of -1 with MODE.  */
+
+wide_int
+wide_int::minus_one (enum machine_mode mode)
+{
+  return wide_int::from_shwi (-1, mode);
+}
+
+
+/* Return a wide int of 0 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::zero (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (0, bs, prec);
+}
+
+/* Return a wide int of 0 with TYPE.  */
+
+wide_int
+wide_int::zero (const_tree type)
+{
+  return wide_int::from_shwi (0, TYPE_MODE (type));
+}
+
+/* Return a wide int of 0 with MODE.  */
+
+wide_int
+wide_int::zero (enum machine_mode mode)
+{
+  return wide_int::from_shwi (0, mode);
+}
+
+
+/* Return a wide int of 1 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::one (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (1, bs, prec);
+}
+
+/* Return a wide int of 1 with TYPE.  */
+
+wide_int
+wide_int::one (const_tree type)
+{
+  return wide_int::from_shwi (1, TYPE_MODE (type));
+}
+
+/* Return a wide int of 1 with MODE.  */
+
+wide_int
+wide_int::one (enum machine_mode mode)
+{
+  return wide_int::from_shwi (1, mode);
+}
+
+
+/* Return a wide int of 2 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::two (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (2, bs, prec);
+}
+
+/* Return a wide int of 2 with TYPE.  */
+
+wide_int
+wide_int::two (const_tree type)
+{
+  return wide_int::from_shwi (2, TYPE_MODE (type));
+}
+
+/* Return a wide int of 2 with MODE.  */
+
+wide_int
+wide_int::two (enum machine_mode mode)
+{
+  return wide_int::from_shwi (2, mode);
+}
+
+
+/* Return a wide int of 10 with bitsize BS and precision PREC.  */
+
+wide_int
+wide_int::ten (unsigned int bs, unsigned int prec)
+{
+  return wide_int::from_shwi (10, bs, prec);
+}
+
+/* Return a wide int of 10 with TYPE.  */
+
+wide_int
+wide_int::ten (const_tree type)
+{
+  return wide_int::from_shwi (10, TYPE_MODE (type));
+}
+
+/* Return a wide int of 10 with MODE.  */
+
+wide_int
+wide_int::ten (enum machine_mode mode)
+{
+  return wide_int::from_shwi (10, mode);
+}
+
+
+/* Public accessors for the interior of a wide int.  */
+
+/* Get the number of host wide ints actually represented within the
+   wide int.  */
+
+unsigned short
+wide_int::get_len () const
+{
+  return len;
+}
+
+/* Get bitsize of the value represented within the wide int.  */
+
+unsigned int
+wide_int::get_bitsize () const
+{
+  return bitsize;
+}
+
+/* Get precision of the value represented within the wide int.  */
+
+unsigned int
+wide_int::get_precision () const
+{
+  return precision;
+}
+
+/* Get the number of host wide ints needed to represent the precision
+   of the number.  NOTE that this should rarely be used.  The only
+   clients of this are places like dwarf2out where you need to
+   explicitly write all of the HWIs that are needed to represent the
+   value. */
+
+unsigned int
+wide_int::get_full_len () const
+{
+  return ((precision + HOST_BITS_PER_WIDE_INT - 1)
+	  / HOST_BITS_PER_WIDE_INT);
+}
+
+/* Get a particular element of the wide int.  */
+
+HOST_WIDE_INT
+wide_int::elt (unsigned int i) const
+{
+  return i >= len ? sign_mask () : val[i];
+}
+
+/* Set the number of host wide ints actually represented within the
+   wide int.  */
+
+void
+wide_int::set_len (unsigned int l)
+{
+  gcc_assert (l < MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT);
+  len = l;
+}
+
+/* Set the bitsize of the wide int.  */
+
+void
+wide_int::set_bitsize (unsigned int bs)
+{
+  bitsize = bs;
+}
+
+/* Set the precision of the wide int.  */
+
+void
+wide_int::set_precision (unsigned int prec)
+{
+  precision = prec;
+}
+
+/* Get a reference to a particular element of the wide int.  Does not
+   check I against len as during construction we might want to set len
+   after creating the value.  */
+
+HOST_WIDE_INT&
+wide_int::elt_ref (unsigned int i)
+{
+  /* We check maximal size, not len.  */
+  gcc_assert (i < MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT); 
+
+  return val[i];
+}
+
+/* Get a reference to a particular element of the wide int as an
+   unsigned quantity.  Does not check I against len as during
+   construction we might want to set len after creating the value.  */
+
+unsigned HOST_WIDE_INT&
+wide_int::uelt_ref (unsigned int i)
+{
+  return *(unsigned HOST_WIDE_INT *)&elt_ref (i);
+}
+
+/* Get a reference to a particular element of the wide int as a
+   constant unsigned quantity.  Does not check I against len as during
+   construction we might want to set len after creating the value.  */
+
+const unsigned HOST_WIDE_INT&
+wide_int::uelt_ref (unsigned int i) const
+{
+  /* We check maximal size, not len.  */
+  gcc_assert (i < MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT); 
+
+  return *(const unsigned HOST_WIDE_INT *)&val[i];
+}
+
+/* Return true if THIS is -1.  */
+
+bool
+wide_int::minus_one_p () const
+{
+  return len == 1 && val[0] == (HOST_WIDE_INT)-1;
+}
+
+/* Return true if THIS is 0.  */
+
+bool
+wide_int::zero_p () const
+{
+  return len == 1 && val[0] == 0;
+}
+
+/* Return true if THIS is 1.  */
+
+bool
+wide_int::one_p () const
+{
+  return len == 1 && val[0] == 1;
+}
+
+/* Return true if THIS is negative.  */
+
+bool
+wide_int::neg_p () const
+{
+  return sign_mask () != 0;
+}
+
+/* Return true if THIS is not equal to OP1. */ 
+
+bool
+wide_int::operator != (const wide_int &op1) const
+{
+  return !(*this == op1);
+}  
+
+/* Return true if THIS is greater than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::gt_p (HOST_WIDE_INT op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return gts_p (op1);
+  else
+    return gtu_p (op1);
+}  
+
+/* Return true if THIS is greater than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::gt_p (const wide_int &op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return op1.lts_p (*this);
+  else
+    return op1.ltu_p (*this);
+}  
+
+/* Return true if THIS is signed greater than OP1.  */
+
+bool
+wide_int::gts_p (const wide_int &op1) const
+{
+  return op1.lts_p (*this);
+}  
+
+/* Return true if THIS is unsigned greater than OP1.  */
+
+bool
+wide_int::gtu_p (const wide_int &op1) const
+{
+  return op1.ltu_p (*this);
+}  
+
+/* Return true if THIS is less than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::lt_p (HOST_WIDE_INT op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return lts_p (op1);
+  else
+    return ltu_p (op1);
+}  
+
+/* Return true if THIS is less than OP1.  Signness is indicated by
+   OP.  */
+
+bool
+wide_int::lt_p (const wide_int &op1, SignOp op) const
+{
+  if (op == SIGNED)
+    return lts_p (op1);
+  else
+    return ltu_p (op1);
+}  
+
+/* Return the signed min of THIS and OP1. */
+
+wide_int
+wide_int::smin (const wide_int &op1) const
+{
+  return lts_p (op1) ? (*this) : op1;
+}  
+
+/* Return the signed max of THIS and OP1. */
+
+wide_int
+wide_int::smax (const wide_int &op1) const
+{
+  return gts_p (op1) ? (*this) : op1;
+}  
+
+/* Return the unsigned min of THIS and OP1. */
+
+wide_int
+wide_int::umin (const wide_int &op1) const
+{
+  return ltu_p (op1) ? (*this) : op1;
+}  
+
+/* Return the unsigned max of THIS and OP1. */
+
+wide_int
+wide_int::umax (const wide_int &op1) const
+{
+  return gtu_p (op1) ? (*this) : op1;
+}  
+
+
+/* Return true if THIS fits in a HOST_WIDE_INT with no loss of
+   precision.  */
+
+bool
+wide_int::fits_shwi_p () const
+{
+  return len == 1;
+}
+
+/* Return true if THIS fits in an unsigned HOST_WIDE_INT with no loss
+   of precision.  */
+
+bool
+wide_int::fits_uhwi_p () const
+{
+  return len == 1 
+    || (len == 2 && val[1] == (HOST_WIDE_INT)-1);
+}
+
+/* Return THIS extended to PREC.  The signness of the extension is
+   specified by OP.  */
+
+wide_int 
+wide_int::ext (unsigned int prec, SignOp z) const
+{
+  if (z == UNSIGNED)
+    return zext (prec);
+  else
+    return zext (prec);
+}
+
+/* Signed multiply THIS and OP1.  The result is the same precision as
+   the operands.  OVERFLOW is set true if the result overflows.  */
+
+wide_int
+wide_int::smul (const wide_int &x, bool *overflow) const
+{
+  return mul (x, SIGNED, overflow);
+}
+
+/* Unsigned multiply THIS and OP1.  The result is the same precision
+   as the operands.  OVERFLOW is set true if the result overflows.  */
+
+wide_int
+wide_int::umul (const wide_int &x, bool *overflow) const
+{
+  return mul (x, UNSIGNED, overflow);
+}
+
+/* Signed multiply THIS and OP1.  The result is twice the precision as
+   the operands.  */
+
+wide_int
+wide_int::smul_full (const wide_int &x) const
+{
+  return mul_full (x, SIGNED);
+}
+
+/* Unsigned multiply THIS and OP1.  The result is twice the precision
+   as the operands.  */
+
+wide_int
+wide_int::umul_full (const wide_int &x) const
+{
+  return mul_full (x, UNSIGNED);
+}
+
+/* Signed divide with truncation of result.  */
+
+wide_int
+wide_int::sdiv_trunc (const wide_int &divisor) const
+{
+  return div_trunc (divisor, SIGNED);
+}
+
+/* Unsigned divide with truncation of result.  */
+
+wide_int
+wide_int::udiv_trunc (const wide_int &divisor) const
+{
+  return div_trunc (divisor, UNSIGNED);
+}
+
+/* Unsigned divide with floor truncation of result.  */
+
+wide_int
+wide_int::udiv_floor (const wide_int &divisor) const
+{
+  bool overflow;
+
+  return div_floor (divisor, UNSIGNED, &overflow);
+}
+
+/* Signed divide with floor truncation of result.  */
+
+wide_int
+wide_int::sdiv_floor (const wide_int &divisor) const
+{
+  bool overflow;
+
+  return div_floor (divisor, SIGNED, &overflow);
+}
+
+/* Signed divide/mod with truncation of result.  */
+
+wide_int
+wide_int::sdivmod_trunc (const wide_int &divisor, wide_int *mod) const
+{
+  return divmod_trunc (divisor, mod, SIGNED);
+}
+
+/* Unsigned divide/mod with truncation of result.  */
+
+wide_int
+wide_int::udivmod_trunc (const wide_int &divisor, wide_int *mod) const
+{
+  return divmod_trunc (divisor, mod, UNSIGNED);
+}
+
+/* Signed divide/mod with floor truncation of result.  */
+
+wide_int
+wide_int::sdivmod_floor (const wide_int &divisor, wide_int *mod) const
+{
+  return divmod_floor (divisor, mod, SIGNED);
+}
+
+/* Signed mod with truncation of result.  */
+
+wide_int
+wide_int::smod_trunc (const wide_int &divisor) const
+{
+  return mod_trunc (divisor, SIGNED);
+}
+
+/* Unsigned mod with truncation of result.  */
+
+wide_int
+wide_int::umod_trunc (const wide_int &divisor) const
+{
+  return mod_trunc (divisor, UNSIGNED);
+}
+
+/* Unsigned mod with floor truncation of result.  */
+
+wide_int
+wide_int::umod_floor (const wide_int &divisor) const
+{
+  bool overflow;
+
+  return mod_floor (divisor, UNSIGNED, &overflow);
+}
+
+/* Right shift THIS by Y.  SGN indicates the sign.  Z indicates the
+   truncation option.  */
+
+wide_int
+wide_int::rshift (const wide_int &y, SignOp sgn, ShiftOp z) const
+{
+  if (sgn == UNSIGNED)
+    return rshiftu (y, z);
+  else
+    return rshifts (y, z);
+}
+
+/* tree related routines.  */
+
+extern tree wide_int_to_tree (tree type, const wide_int &cst);
+
+
+/* Conversion to and from GMP integer representations.  */
+
+void mpz_set_wide_int (mpz_t, wide_int, bool);
+wide_int mpz_get_wide_int (const_tree, mpz_t, bool);
+#endif /* GENERATOR FILE */
+
+#endif /* WIDE_INT_H */