diff mbox

wide-int branch now up for public comment and review

Message ID 520A9DCC.6080609@naturalbridge.com
State New
Headers show

Commit Message

Kenneth Zadeck Aug. 13, 2013, 8:57 p.m. UTC
Richi and everyone else who may be interested,

Congrats on your first child.  They are a lot of fun, but are very
high maintenence.

Today we put up the wide-int branch for all to see and play with. See

svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int

At this point, we have completed testing it on x86-64.  Not only is it
regression free, but for code that uses only 64 bit or smaller data
types, it produces identical machine language (if a couple of changes
are made to the truck - see the patch below).  We are currently
working on the PPC and expect to get this platform to the same
position very soon.

 From a high level view, the branch looks somewhat closer to what you
asked for than I would have expected.  There are now three
implementations of wide-int as a template.  The default is the one you
saw before and takes its precision from the input mode or type. There
are two other template instances which have fixed precisions that are
defined to be large enough to be assumed to be infinite (just like
your favorite use of double-int).  Both are used in places where there
is not the notion of precision correctness of the operands. One is
used for all addressing arithmetic and the other is used mostly in the
vectorizer and loop optimizations.  The bottom line is that both a
finite and infinite precision model are really necessary in the
current structure of GCC.

The two infinite precision classes are not exactly the storage classes
that you proposed because they are implemented using the same storage
model as the default template but they do provide a different view of
the math which I assume was your primary concern.  You may also decide
that there is not reason to have a separate class for the addressing
arithmetic since they work substantially the same way.  We did it so
that we have the option in the future to allow the two reps to
diverge.

The one place where I can see changing which template is used is in
tree-ssa-ccp.  This is the only one of the many GCC constant
propagator that does not use the default template.  I did not convert
this pass to use the default template because, for testing purposes
(at your suggestion), we did tried to minimize the improvements so
that we get the same code out with wide-int.  When I convert it to use
the default template, the pass will run slightly faster and will find
slightly more constants: both very desirable features, but not in the
context of getting this large patch into GCC.

As I said earlier, we get the same code as long as the program uses
only 64 bit or smaller types.  For code that uses larger types, we do
not.  The problem actually stems from one of the assertions that you
made when we were arguing about fixed vs infinite precision.  You had
said that a lot of the code depended on double ints behaving like
infinite precision.  You were right!!!  However, what this really
meant is that when that code was subjected to at 128 bit type, it just
produced bogus results!!!!  All of this has been fixed now on the
branch.  The code that uses the default template works within it's
precision.  The code that uses one of the infinite precision templates
can be guaranteed that there is always enough head room because we
sniff out the largest mode on the target and multiply that by 4.
However, the net result is that programs that use 128 bit types get
better code out that is more likely to be correct.

The vast majority of the patch falls into two types of code:

1) The 4 files that hold the wide-int code itself.  You have seen a
    lot of this code before except for the infinite precision
    templates.  Also the classes are more C++ than C in their flavor.
    In particular, the integration with trees is very tight in that an
    int-cst or regular integers can be the operands of any wide-int
    operation.

2) The code that encapsulates the representation of a TREE_INT_CST.
    For the latter, I introduced a series of abstractions to hide the
    access so that I could change the representation of TREE_INT_CST
    away from having exactly two HWIs.  I do not really like these
    abstractions, but the good news is that most of them can/will go
    away after this branch is integrated into the trunk.  These
    abstractions allow the code to do the same function, without
    exposing the change in the data structures.  However, they preserve
    the fact that for the most part, the middle end of the compiler
    tries to do no optimization on anything larger than a single HWI.
    But this preserves the basic behavior of the compiler which is what
    you asked us to do.

    The abstractions that I have put in to hide the rep of TREE_INT_CST 
are:

    host_integerp (x, 1) -> tree_fits_uhwi_p (x)
    host_integerp (x, 0) -> tree_fits_shwi_p (x)
    host_integerp (x, TYPE_UNSIGNED (y)) -> tree_fits_hwi_p (x, 
TYPE_SIGN (y))
    host_integerp (x, TYPE_UNSIGNED (x)) -> tree_fits_hwi_p (x)


    TREE_INT_CST_HIGH (x) == 0 || TREE_INT_CST_HIGH (value) == -1 -> 
cst_fits_shwi_p (x)
    TREE_INT_CST_HIGH (x) + (tree_int_cst_sgn (x) < 0) -> 
cst_fits_shwi_p (x)
    cst_and_fits_in_hwi (x) -> cst_fits_shwi_p (x)

    TREE_INT_CST_HIGH (x) == 0) -> cst_fits_uhwi_p (x)

    tree_low_cst (x, 1) ->  tree_to_uhwi (x)
    tree_low_cst (x, 0) ->  tree_to_shwi (x)
    TREE_INT_CST_LOW (x) -> to either tree_to_uhwi (x), tree_to_shwi (x) 
or tree_to_hwi (x)

    Code that used the TREE_INT_CST_HIGH in ways beyond checking to see
    if contained 0 or -1 was converted directly to wide-int.


You had proposed that one of the ways that we should/could test the
non single HWI paths in wide-int was to change the size of the element
of the array used to represent value in wide-int.   I believe that
there are better ways to do this testing.   For one, the infinite
precision templates do not use the fast pathway anyway because
currently those pathways are only triggered for precisions that fit in
a single HWI.   (There is the possibility that some of the infinite
precision functions could use this fast path, but they currently do
not.)   However, what we are planning to do when the ppc gets stable
is to build a 64 bit compiler for the x86 that uses a 32 bit HWI.
This is no longer a supported path, but fixing the bugs on it would
shake out the remaining places where the compiler (as well as the
wide-int code) gets the wrong answer for larger types.

The code still has our tracing in it.   We will remove it before the
branch is committed, but for large scale debugging, we find this
very useful.

I am not going to close with the typical "ok to commit?" closing
because I know you will have a lot to say.   But I do think that you
will find that this is a lot closer to what you envisioned than what
you saw before.

kenny

Comments

Richard Sandiford Aug. 22, 2013, 7:55 a.m. UTC | #1
Thanks for doing this.  I haven't had chance to look at the branch yet
(hope to soon), but the results on mips64-linux-gnu look pretty good:

  http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg02033.html

The only failures that stand out as being possibly wide-int-related are:

  FAIL: gcc.dg/fixed-point/int-warning.c  (test for warnings, line ...)

which isn't something that x86_64-linux-gnu or powerpc64-linux-gnu
would test AFAIK.  Haven't had chance to look into why it's failing yet,
but FWIW it's a compile-only test so should be reproducable with a cc1 cross.

I'll run a test with the corresponding trunk version too.

Thanks,
Richard
Richard Sandiford Aug. 23, 2013, 3:02 p.m. UTC | #2
Hi Kenny,

This is the first time I've looked at the implementation of wide-int.h
(rather than just looking at the rtl changes, which as you know I like
in general), so FWIW here are some comments on wide-int.h.  I expect
a lot of them overlap with Richard B.'s comments.

I also expect many of them are going to be annoying, sorry, but this
first one definitely will.  The coding conventions say that functions
should be defined outside the class:

    http://gcc.gnu.org/codingconventions.html

and that opening braces should be on their own line, so most of the file
needs to be reformatted.  I went through and made that change with the
patch below, in the process of reading through.  I also removed "SGN
must be SIGNED or UNSIGNED." because it seemed redundant when those are
the only two values available.  The patch fixes a few other coding standard
problems and typos, but I've not made any actual code changes (or at least,
I didn't mean to).

Does it look OK to install?

I'm still unsure about these "infinite" precision types, but I understand
the motivation and I have no objections.  However:

>     * Code that does widening conversions.  The canonical way that
>       this is performed is to sign or zero extend the input value to
>       the max width based on the sign of the type of the source and
>       then to truncate that value to the target type.  This is in
>       preference to using the sign of the target type to extend the
>       value directly (which gets the wrong value for the conversion
>       of large unsigned numbers to larger signed types).

I don't understand this particular reason.  Using the sign of the source
type is obviously right, but why does that mean we need "infinite" precision,
rather than just doubling the precision of the source?

>   * When a constant that has an integer type is converted to a
>     wide-int it comes in with precision 0.  For these constants the
>     top bit does accurately reflect the sign of that constant; this
>     is an exception to the normal rule that the signedness is not
>     represented.  When used in a binary operation, the wide-int
>     implementation properly extends these constants so that they
>     properly match the other operand of the computation.  This allows
>     you write:
>
>                tree t = ...
>                wide_int x = t + 6;
>
>     assuming t is a int_cst.

This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
actually wants it to be an unsigned value.  Some code uses it to avoid
the undefinedness of signed overflow.  So these overloads could lead
to us accidentally zero-extending what's conceptually a signed value
without any obvious indication that that's happening.  Also, hex constants
are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
meant to be zero-extended.

I realise the same thing can happen if you mix "unsigned int" with
HOST_WIDE_INT, but the point is that you shouldn't really do that
in general, whereas we're defining these overloads precisely so that
a mixture can be used.

I'd prefer some explicit indication of the sign, at least for anything
other than plain "int" (so that the compiler will complain about uses
of "unsigned int" and above).

>   Note that the bits above the precision are not defined and the
>   algorithms used here are careful not to depend on their value.  In
>   particular, values that come in from rtx constants may have random
>   bits.

I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
have random bits.  The upper bits must be a sign extension of the value.
There's exactly one valid rtx for each (value, mode) pair.  If you saw
something different then that sounds like a bug.  The rules should already
be fairly well enforced though, since something like (const_int 128) --
or (const_int 256) -- will not match a QImode operand.

This is probably the part of the representation that I disagree most with.
There seem to be two main ways we could hande the extension to whole HWIs:

(1) leave the stored upper bits undefined and extend them on read
(2) keep the stored upper bits in extended form

The patch goes for (1) but (2) seems better to me, for a few reasons:

* As above, constants coming from rtl are already in the right form,
  so if you create a wide_int from an rtx and only query it, no explicit
  extension is needed.

* Things like logical operations and right shifts naturally preserve
  the sign-extended form, so only a subset of write operations need
  to take special measures.

* You have a public interface that exposes the underlying HWIs
  (which is fine with me FWIW), so it seems better to expose a fully-defined
  HWI rather than only a partially-defined HWI.

E.g. zero_p is:

  HOST_WIDE_INT x;

  if (precision && precision < HOST_BITS_PER_WIDE_INT)
    x = sext_hwi (val[0], precision);
  else if (len == 0)
    {
      gcc_assert (precision == 0);
      return true;
    }
  else
    x = val[0];

  return len == 1 && x == 0;

but I think it really ought to be just:

  return len == 1 && val[0] == 0;

>   When the precision is 0, all the bits in the LEN elements of
>   VEC are significant with no undefined bits.  Precisionless
>   constants are limited to being one or two HOST_WIDE_INTs.  When two
>   are used the upper value is 0, and the high order bit of the first
>   value is set.  (Note that this may need to be generalized if it is
>   ever necessary to support 32bit HWIs again).

I didn't understand this.  When are two HOST_WIDE_INTs needed for
"precision 0"?

> #define addr_max_bitsize (64)
> #define addr_max_precision \

These should either be lower-case C++ constants or upper-case macros.

>  /* VAL is set to a size that is capable of computing a full
>     multiplication on the largest mode that is represented on the
>     target.  Currently there is a part of tree-vrp that requires 2x +
>     2 bits of precision where x is the precision of the variables
>     being optimized.  */
>  HOST_WIDE_INT val[WIDE_INT_MAX_ELTS];

This comment seems redundant with the one above WIDE_INT_MAX_ELTS
and likely to get out of date.

>     So real hardware only looks at a small part of the shift amount.
>     On IBM machines, this tends to be 1 more than what is necessary
>     to encode the shift amount.  The rest of the world looks at only
>     the minimum number of bits.  This means that only 3 gate delays
>     are necessary to set up the shifter.

I agree that it makes sense for wide_int to provide a form of shift
in which the shift amount is truncated.  However, I strongly believe
wide-int.h should not test SHIFT_COUNT_TRUNCATED directly.  It should
be up to the callers to decide when truncation is needed (and to what width).

We really need to get rid of the #include "tm.h" in wide-int.h.
MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
thing in there.  If that comes from tm.h then perhaps we should put it
into a new header file instead.

> /* Return THIS as a signed HOST_WIDE_INT.  If THIS does not fit in
>    PREC, the information is lost. */
> HOST_WIDE_INT
> to_shwi (unsigned int prec = 0) const

Just dropping the excess bits seems dangerous.  I think we should assert
instead, at least when prec is 0.

> /* Return true if THIS is negative based on the interpretation of SGN.
>    For UNSIGNED, this is always false.  This is correct even if
>    precision is 0.  */
> inline bool
> wide_int::neg_p (signop sgn) const

It seems odd that you have to pass SIGNED here.  I assume you were doing
it so that the caller is forced to confirm signedness in the cases where
a tree type is involved, but:

* neg_p kind of implies signedness anyway
* you don't require this for minus_one_p, so the interface isn't consistent
* at the rtl level signedness isn't a property of the "type" (mode),
  so it seems strange to add an extra hoop there

> /* Return true if THIS fits in an unsigned HOST_WIDE_INT with no
>    loss of precision.  */
> inline bool
> wide_int_ro::fits_uhwi_p () const
> {
>   return len == 1 || (len == 2 && val[1] == 0);
> }

This doesn't look right, since len == 1 could mean that you have a
gazillion-bit all-ones number.  Also, the val[1] == 0 check seems
to assume the upper bits are defined when the precision isn't a multiple
of the HWI size (although as above I that's a good thing and should be
enforced).

sign_mask has:

>   gcc_unreachable ();
> #if 0
>   return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1);
> #endif

Maybe remove this?

The inline div routines do:

>   if (overflow)
>     *overflow = false;

but then just pass overflow to divmod_internal.  Seems better to initialise
*overflow there instead.

div_floor has:

>     return divmod_internal (true, val, len, p1, s, cl, p2, sgn,
>			    &remainder, false, overflow);
>
>     if (quotient.neg_p (sgn) && !remainder.zero_p ())
>       return quotient - 1;
>     return quotient;
>   }

where the last bit is unreachable.

> /* Divide DIVISOR into THIS producing the remainder.  The result is
>    the same size as the operands.  The sign is specified in SGN.
>    The output is floor truncated.  OVERFLOW is set to true if the
>    result overflows, false otherwise.  */
> template <typename T>
> inline wide_int_ro
> wide_int_ro::mod_floor (const T &c, signop sgn, bool *overflow = 0) const

It's really the quotient that's floor-truncated, not the output
(the remainder).  I was a bit confused at first why you'd need to
truncate an integer remainder.  Same for the other functions.

> #ifdef DEBUG_WIDE_INT
>   debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);
> #endif

I think these are going to bitrot quickly if we #if 0 then out.
I think we should either use:

  if (DEBUG_WIDE_INT)
    debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);

or drop them.

The implementations of the general to_shwi1 and to_shwi2 functions look
identical.  I think the comment should explain why two functions are needed.

> /* Negate THIS.  */
> inline wide_int_ro
> wide_int_ro::operator - () const
> {
>   wide_int_ro r;
>   r = wide_int_ro (0) - *this;
>   return r;
> }
>
> /* Negate THIS.  */
> inline wide_int_ro
> wide_int_ro::neg () const
> {
>   wide_int_ro z = wide_int_ro::from_shwi (0, precision);
>
>   gcc_checking_assert (precision);
>   return z - *this;
> }

Why do we need both of these, and why are they implemented slightly
differently?

> template <int bitsize>
> inline bool
> fixed_wide_int <bitsize>::multiple_of_p (const wide_int_ro &factor,
>					 signop sgn,
>					 fixed_wide_int *multiple) const
> {
>   return wide_int_ro::multiple_of_p (factor, sgn,
>				     reinterpret_cast <wide_int *> (multiple));
> }

The patch has several instances of this kind of reinterpret_cast.
It looks like an aliasing violation.

The main thing that's changed since the early patches is that we now
have a mixture of wide-int types.  This seems to have led to a lot of
boiler-plate forwarding functions (or at least it felt like that while
moving them all out the class).  And that in turn seems to be because
you're trying to keep everything as member functions.  E.g. a lot of the
forwarders are from a member function to a static function.

Wouldn't it be better to have the actual classes be light-weight,
with little more than accessors, and do the actual work with non-member
template functions?  There seems to be 3 grades of wide-int:

  (1) read-only, constant precision  (from int, etc.)
  (2) read-write, constant precision  (fixed_wide_int)
  (3) read-write, variable precision  (wide_int proper)

but we should be able to hide that behind templates, with compiler errors
if you try to write to (1), etc.

To take one example, the reason we can't simply use things like
std::min on wide ints is because signedness needs to be specified
explicitly, but there's a good reason why the standard defined
std::min (x, y) rather than x.min (y).  It seems like we ought
to have smin and umin functions alongside std::min, rather than
make them member functions.  We could put them in a separate namespace
if necessary.

I might have a go at trying this last part next week, unless Richard is
already in the process of rewriting things :-)

Thanks,
Richard
Kenneth Zadeck Aug. 23, 2013, 8:50 p.m. UTC | #3
On 08/23/2013 11:02 AM, Richard Sandiford wrote:
> Hi Kenny,
>
> This is the first time I've looked at the implementation of wide-int.h
> (rather than just looking at the rtl changes, which as you know I like
> in general), so FWIW here are some comments on wide-int.h.  I expect
> a lot of them overlap with Richard B.'s comments.
>
> I also expect many of them are going to be annoying, sorry, but this
> first one definitely will.  The coding conventions say that functions
> should be defined outside the class:
>
>      http://gcc.gnu.org/codingconventions.html
>
> and that opening braces should be on their own line, so most of the file
> needs to be reformatted.  I went through and made that change with the
> patch below, in the process of reading through.  I also removed "SGN
> must be SIGNED or UNSIGNED." because it seemed redundant when those are
> the only two values available.  The patch fixes a few other coding standard
> problems and typos, but I've not made any actual code changes (or at least,
> I didn't mean to).
I had started the file with the functions outside of the class and mike 
had stated putting them in the class, and so i went with putting them in 
the class since many of them were one liners and so having them out of 
line just doubled the size of everything. however, we did not look at 
the coding conventions and that really settles the argument.  Thanks for 
doing this.
> Does it look OK to install?
you can check it in.
> I'm still unsure about these "infinite" precision types, but I understand
> the motivation and I have no objections.  However:
>
>>      * Code that does widening conversions.  The canonical way that
>>        this is performed is to sign or zero extend the input value to
>>        the max width based on the sign of the type of the source and
>>        then to truncate that value to the target type.  This is in
>>        preference to using the sign of the target type to extend the
>>        value directly (which gets the wrong value for the conversion
>>        of large unsigned numbers to larger signed types).


> I don't understand this particular reason.  Using the sign of the source
> type is obviously right, but why does that mean we need "infinite" precision,
> rather than just doubling the precision of the source?
in a sense, "infinite" does not really mean infinite, it really just 
means large enough so that you never loose any information from the 
top.    For widening all that you really need to be "infinite" is one 
more bit larger than the destination type.   We could have had an abi 
where you specified the precision of every operation based on the 
precisions of the inputs.   Instead, for these kinds of operations, we 
decided to sniff the port and determine a fixed width that was large 
enough for everything that was needed for the port. We call that number 
infinite.  This sort of follows the convention that double-int was used 
with were infinite was 128 bits, but with our design/implementation, we 
(hopefully) have no bugs where the size of the datatypes needed never 
runs into implementation limits.

>>    * When a constant that has an integer type is converted to a
>>      wide-int it comes in with precision 0.  For these constants the
>>      top bit does accurately reflect the sign of that constant; this
>>      is an exception to the normal rule that the signedness is not
>>      represented.  When used in a binary operation, the wide-int
>>      implementation properly extends these constants so that they
>>      properly match the other operand of the computation.  This allows
>>      you write:
>>
>>                 tree t = ...
>>                 wide_int x = t + 6;
>>
>>      assuming t is a int_cst.
> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
> actually wants it to be an unsigned value.  Some code uses it to avoid
> the undefinedness of signed overflow.  So these overloads could lead
> to us accidentally zero-extending what's conceptually a signed value
> without any obvious indication that that's happening.  Also, hex constants
> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
> meant to be zero-extended.
>
> I realise the same thing can happen if you mix "unsigned int" with
> HOST_WIDE_INT, but the point is that you shouldn't really do that
> in general, whereas we're defining these overloads precisely so that
> a mixture can be used.
>
> I'd prefer some explicit indication of the sign, at least for anything
> other than plain "int" (so that the compiler will complain about uses
> of "unsigned int" and above).

There is a part of me that finds this scary and a part of me that feels 
that the concern is largely theoretical.    It does make it much easier 
to read and understand the code to be able to write "t + 6" rather than 
"wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some 
control over how 6 is converted.

>>    Note that the bits above the precision are not defined and the
>>    algorithms used here are careful not to depend on their value.  In
>>    particular, values that come in from rtx constants may have random
>>    bits.
> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
> have random bits.  The upper bits must be a sign extension of the value.
> There's exactly one valid rtx for each (value, mode) pair.  If you saw
> something different then that sounds like a bug.  The rules should already
> be fairly well enforced though, since something like (const_int 128) --
> or (const_int 256) -- will not match a QImode operand.
>
> This is probably the part of the representation that I disagree most with.
> There seem to be two main ways we could hande the extension to whole HWIs:
>
> (1) leave the stored upper bits undefined and extend them on read
> (2) keep the stored upper bits in extended form
It is not a matter of opening old wounds.   I had run some tests on 
x86-64 and was never able to assume that the bits above the precision 
had always been canonized.   I will admit that i did not try to run down 
the bugs because i thought that since the rtx constructors did not have 
a mode associated with them now was one required to in the constructors, 
that this was not an easily solvable problem.   So i have no idea if i 
hit the one and only bug or was about to start drinking from a fire 
hose.   But the back ends are full of GEN_INT (a) where a came from god 
knows where and you almost never see it properly canonized.   I think 
that until GEN_INT takes a mandatory non VOIDmode mode parameter, and 
that constructor canonizes it, you are doomed chasing this bug 
forever.    My/our experience on the dataflow branch was that unless you 
go clean things up AND put in a bunch of traps to keep people honest, 
you are never going to be able to make this assumption.

Having said that, we actually do neither of (1) or (2) inside of 
wide-int.  For rtl to wide-int, we leave the upper bits undefined and 
never allow you to look at them because the constructor has a mode and 
that mode allows you to draw a line in the sand.   There is no 
constructor for the "infinite" wide ints from rtl so you have no way to 
look.

Doing this allows us to do something that richi really wanted: avoiding 
copying.   We do not get to do as much richi would like and when he 
comes back from leave, he may have other places where can apply it, but 
right now if you say w = t + 6 as above, it "borrows" the rep from t to 
do the add, it does not really build a wide-int. We also do this if t is 
an rtx const.   But if we had to canonize the number, then we could not 
borrow the rep.
> The patch goes for (1) but (2) seems better to me, for a few reasons:
>
> * As above, constants coming from rtl are already in the right form,
>    so if you create a wide_int from an rtx and only query it, no explicit
>    extension is needed.
>
> * Things like logical operations and right shifts naturally preserve
>    the sign-extended form, so only a subset of write operations need
>    to take special measures.
right now the internals of wide-int do not keep the bits above the 
precision clean.   as you point out this could be fixed by changing 
lshift, add, sub, mul, div (and anything else i have forgotten about) 
and removing the code that cleans this up on exit.   I actually do not 
really care which way we go here but if we do go on keeping the bits 
clean above the precision inside wide-int, we are going to have to clean 
the bits in the constructors from rtl, or fix some/a lot of bugs.

But if you want to go with the stay clean plan you have to start clean, 
so at the rtl level this means copying. and it is the not copying trick 
that pushed me in the direction we went.

At the tree level, this is not an issue.   There are no constructors for 
tree-csts that do not have a type and before we copy the rep from the 
wide-int to the tree, we clean the top bits.   (BTW - If i had to guess 
what the bug is with the missing messages on the mips port, it would be 
because the front ends HAD a bad habit of creating constants that did 
not fit into a type and then later checking to see if there were any 
interesting bits above the precision in the int-cst.  This now does not 
work because we clean out those top bits on construction but it would 
not surprise me if we missed the fixed point constant path.)   So at the 
tree level, we could easily go either way here, but there is a cost at 
the rtl level with doing (2).


> * You have a public interface that exposes the underlying HWIs
>    (which is fine with me FWIW), so it seems better to expose a fully-defined
>    HWI rather than only a partially-defined HWI.
>
> E.g. zero_p is:
>
>    HOST_WIDE_INT x;
>
>    if (precision && precision < HOST_BITS_PER_WIDE_INT)
>      x = sext_hwi (val[0], precision);
>    else if (len == 0)
>      {
>        gcc_assert (precision == 0);
>        return true;
>      }
the above test for len==0 has been removed because it is rot.

>    else
>      x = val[0];
>
>    return len == 1 && x == 0;
>
> but I think it really ought to be just:
>
>    return len == 1 && val[0] == 0;
If we did your 2, it would be this way.
>>    When the precision is 0, all the bits in the LEN elements of
>>    VEC are significant with no undefined bits.  Precisionless
>>    constants are limited to being one or two HOST_WIDE_INTs.  When two
>>    are used the upper value is 0, and the high order bit of the first
>>    value is set.  (Note that this may need to be generalized if it is
>>    ever necessary to support 32bit HWIs again).
> I didn't understand this.  When are two HOST_WIDE_INTs needed for
> "precision 0"?
if a large unsigned constant comes in with the top bit set, the 
canonized value takes 2 hwis, the top hwi being 0.
>> #define addr_max_bitsize (64)
>> #define addr_max_precision \
> These should either be lower-case C++ constants or upper-case macros.
this will be fixed.
>>   /* VAL is set to a size that is capable of computing a full
>>      multiplication on the largest mode that is represented on the
>>      target.  Currently there is a part of tree-vrp that requires 2x +
>>      2 bits of precision where x is the precision of the variables
>>      being optimized.  */
>>   HOST_WIDE_INT val[WIDE_INT_MAX_ELTS];
> This comment seems redundant with the one above WIDE_INT_MAX_ELTS
> and likely to get out of date.
this will be fixed
>>      So real hardware only looks at a small part of the shift amount.
>>      On IBM machines, this tends to be 1 more than what is necessary
>>      to encode the shift amount.  The rest of the world looks at only
>>      the minimum number of bits.  This means that only 3 gate delays
>>      are necessary to set up the shifter.
> I agree that it makes sense for wide_int to provide a form of shift
> in which the shift amount is truncated.  However, I strongly believe
> wide-int.h should not test SHIFT_COUNT_TRUNCATED directly.  It should
> be up to the callers to decide when truncation is needed (and to what width).
richi does not like this either so i will get rid of it.
>
> We really need to get rid of the #include "tm.h" in wide-int.h.
> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
> thing in there.  If that comes from tm.h then perhaps we should put it
> into a new header file instead.
I will talk to mike about fixing this.
>> /* Return THIS as a signed HOST_WIDE_INT.  If THIS does not fit in
>>     PREC, the information is lost. */
>> HOST_WIDE_INT
>> to_shwi (unsigned int prec = 0) const
> Just dropping the excess bits seems dangerous.  I think we should assert
> instead, at least when prec is 0.
there are times when this is useful.   there is a lot of code that just 
wants to look at the bottom bits to do some alignment stuff. I guess 
that code could just grab the bottom elt of the array, but has not 
generally been how these this has been done.
>> /* Return true if THIS is negative based on the interpretation of SGN.
>>     For UNSIGNED, this is always false.  This is correct even if
>>     precision is 0.  */
>> inline bool
>> wide_int::neg_p (signop sgn) const
> It seems odd that you have to pass SIGNED here.  I assume you were doing
> it so that the caller is forced to confirm signedness in the cases where
> a tree type is involved, but:
>
> * neg_p kind of implies signedness anyway
> * you don't require this for minus_one_p, so the interface isn't consistent
> * at the rtl level signedness isn't a property of the "type" (mode),
>    so it seems strange to add an extra hoop there
it was done this way so that you can pass in TYPE_SIGN (t) in as the 
second parameter.   We could default the parameter to SIGNED and that 
would solve both cases.   I will look into minus_one_p.


>> /* Return true if THIS fits in an unsigned HOST_WIDE_INT with no
>>     loss of precision.  */
>> inline bool
>> wide_int_ro::fits_uhwi_p () const
>> {
>>    return len == 1 || (len == 2 && val[1] == 0);
>> }
> This doesn't look right, since len == 1 could mean that you have a
> gazillion-bit all-ones number.  Also, the val[1] == 0 check seems
> to assume the upper bits are defined when the precision isn't a multiple
> of the HWI size (although as above I that's a good thing and should be
> enforced).
you are correct.
> sign_mask has:

>>    gcc_unreachable ();
>> #if 0
>>    return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1);
>> #endif
> Maybe remove this?
>
> The inline div routines do:
i will work on this this weekend.   tree vrp has not been our friend and 
sometimes does not like to compile this function.

>>    if (overflow)
>>      *overflow = false;
> but then just pass overflow to divmod_internal.  Seems better to initialise
> *overflow there instead.
>
> div_floor has:
>
>>      return divmod_internal (true, val, len, p1, s, cl, p2, sgn,
>> 			    &remainder, false, overflow);
>>
>>      if (quotient.neg_p (sgn) && !remainder.zero_p ())
>>        return quotient - 1;
>>      return quotient;
>>    }
> where the last bit is unreachable.
not to mention that the compiler never complained.
>
>> /* Divide DIVISOR into THIS producing the remainder.  The result is
>>     the same size as the operands.  The sign is specified in SGN.
>>     The output is floor truncated.  OVERFLOW is set to true if the
>>     result overflows, false otherwise.  */
>> template <typename T>
>> inline wide_int_ro
>> wide_int_ro::mod_floor (const T &c, signop sgn, bool *overflow = 0) const
> It's really the quotient that's floor-truncated, not the output
> (the remainder).  I was a bit confused at first why you'd need to
> truncate an integer remainder.  Same for the other functions.
The comments needs work, not the code.   You do have to adjust the 
remainder in some cases, but it is not truncation.
>> #ifdef DEBUG_WIDE_INT
>>    debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);
>> #endif
> I think these are going to bitrot quickly if we #if 0 then out.
> I think we should either use:
>
>    if (DEBUG_WIDE_INT)
>      debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);
>
> or drop them.
my plan is to leave these in while the branch is still being developed 
and then get rid of them before it is merged.    My guess is that i am 
going to need them still when i try the 32bit hwi test.

> The implementations of the general to_shwi1 and to_shwi2 functions look
> identical.  I think the comment should explain why two functions are needed.
I will check this
>> /* Negate THIS.  */
>> inline wide_int_ro
>> wide_int_ro::operator - () const
>> {
>>    wide_int_ro r;
>>    r = wide_int_ro (0) - *this;
>>    return r;
>> }
>>
>> /* Negate THIS.  */
>> inline wide_int_ro
>> wide_int_ro::neg () const
>> {
>>    wide_int_ro z = wide_int_ro::from_shwi (0, precision);
>>
>>    gcc_checking_assert (precision);
>>    return z - *this;
>> }
> Why do we need both of these, and why are they implemented slightly
> differently?
neg should go away.
>> template <int bitsize>
>> inline bool
>> fixed_wide_int <bitsize>::multiple_of_p (const wide_int_ro &factor,
>> 					 signop sgn,
>> 					 fixed_wide_int *multiple) const
>> {
>>    return wide_int_ro::multiple_of_p (factor, sgn,
>> 				     reinterpret_cast <wide_int *> (multiple));
>> }
> The patch has several instances of this kind of reinterpret_cast.
> It looks like an aliasing violation.
>
> The main thing that's changed since the early patches is that we now
> have a mixture of wide-int types.  This seems to have led to a lot of
> boiler-plate forwarding functions (or at least it felt like that while
> moving them all out the class).  And that in turn seems to be because
> you're trying to keep everything as member functions.  E.g. a lot of the
> forwarders are from a member function to a static function.
>
> Wouldn't it be better to have the actual classes be light-weight,
> with little more than accessors, and do the actual work with non-member
> template functions?  There seems to be 3 grades of wide-int:
>
>    (1) read-only, constant precision  (from int, etc.)
>    (2) read-write, constant precision  (fixed_wide_int)
>    (3) read-write, variable precision  (wide_int proper)
>
> but we should be able to hide that behind templates, with compiler errors
> if you try to write to (1), etc.
>
> To take one example, the reason we can't simply use things like
> std::min on wide ints is because signedness needs to be specified
> explicitly, but there's a good reason why the standard defined
> std::min (x, y) rather than x.min (y).  It seems like we ought
> to have smin and umin functions alongside std::min, rather than
> make them member functions.  We could put them in a separate namespace
> if necessary.
>
> I might have a go at trying this last part next week, unless Richard is
> already in the process of rewriting things :-)
mike will answer this.
> Thanks,
> Richard
>
>
Mike Stump Aug. 24, 2013, 1:59 a.m. UTC | #4
On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
>>  * When a constant that has an integer type is converted to a
>>    wide-int it comes in with precision 0.  For these constants the
>>    top bit does accurately reflect the sign of that constant; this
>>    is an exception to the normal rule that the signedness is not
>>    represented.  When used in a binary operation, the wide-int
>>    implementation properly extends these constants so that they
>>    properly match the other operand of the computation.  This allows
>>    you write:
>> 
>>               tree t = ...
>>               wide_int x = t + 6;
>> 
>>    assuming t is a int_cst.
> 
> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
> actually wants it to be an unsigned value.  Some code uses it to avoid
> the undefinedness of signed overflow.  So these overloads could lead
> to us accidentally zero-extending what's conceptually a signed value
> without any obvious indication that that's happening.  Also, hex constants
> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
> meant to be zero-extended.
> 
> I realise the same thing can happen if you mix "unsigned int" with
> HOST_WIDE_INT, but the point is that you shouldn't really do that
> in general, whereas we're defining these overloads precisely so that
> a mixture can be used.

So, I don't like penalizing all users, because one user might write incorrect code.  We have the simple operators so that users can retain some of the simplicity and beauty that is the underlying language.  Those semantics are well known and reasonable.  We reasonably match those semantics.  At the end of the day, we have to be able to trust what the user writes.  Now, a user that doesn't trust themselves, can elect to not use these functions; they aren't required to use them.
Richard Sandiford Aug. 24, 2013, 8:30 a.m. UTC | #5
Mike Stump <mikestump@comcast.net> writes:
> On Aug 23, 2013, at 8:02 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>>>  * When a constant that has an integer type is converted to a
>>>    wide-int it comes in with precision 0.  For these constants the
>>>    top bit does accurately reflect the sign of that constant; this
>>>    is an exception to the normal rule that the signedness is not
>>>    represented.  When used in a binary operation, the wide-int
>>>    implementation properly extends these constants so that they
>>>    properly match the other operand of the computation.  This allows
>>>    you write:
>>> 
>>>               tree t = ...
>>>               wide_int x = t + 6;
>>> 
>>>    assuming t is a int_cst.
>> 
>> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
>> actually wants it to be an unsigned value.  Some code uses it to avoid
>> the undefinedness of signed overflow.  So these overloads could lead
>> to us accidentally zero-extending what's conceptually a signed value
>> without any obvious indication that that's happening.  Also, hex constants
>> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
>> meant to be zero-extended.
>> 
>> I realise the same thing can happen if you mix "unsigned int" with
>> HOST_WIDE_INT, but the point is that you shouldn't really do that
>> in general, whereas we're defining these overloads precisely so that
>> a mixture can be used.
>
> So, I don't like penalizing all users, because one user might write
> incorrect code.  We have the simple operators so that users can retain
> some of the simplicity and beauty that is the underlying language.
> Those semantics are well known and reasonable.  We reasonably match
> those semantics.  At the end of the day, we have to be able to trust
> what the user writes.  Now, a user that doesn't trust themselves, can
> elect to not use these functions; they aren't required to use them.

But the point of my "unsigned HOST_WIDE_INT" example is that the
semantics of the language can also get in the way.  If you're adding
or multiplying two smallish constants, you should generally use
"unsigned HOST_WIDE_INT" in order to avoid the undefinedness of signed
overflow.  In that kind of case the "unsigned" in no way implies that
you want the value to be extended to the wider types that we're
adding with this patch.  I'm worried about making that assumption
just to provide a bit of syntactic sugar, especially when any
wrong-code bug would be highly value-dependent.  When this kind of
thing is done at the rtl level, the original constant (in a CONST_INT)
is also HOST_WIDE_INT-sized, so the kind of code that I'm talking about
did not do any promotion before the wide-int conversion.  That changes
if we generalise one of the values to a wide_int.

It isn't just a case of being confident when writing new code.
It's a case of us changing a large body of existing code in one go,
some of which uses unsigned values for the reason above.

Like I say, all this objection is for "unsigned int and above".
If we provide the syntactic sugar for plain "int" and use templates
to force a compiler error for "higher" types then that would still allow
things like "x + 2" but force a bit of analysis for "x + y" in cases
where "x" and "y" are not the same type and one is wide_int.

Thanks,
Richard
Richard Sandiford Aug. 24, 2013, 9:47 a.m. UTC | #6
Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>>>      * Code that does widening conversions.  The canonical way that
>>>        this is performed is to sign or zero extend the input value to
>>>        the max width based on the sign of the type of the source and
>>>        then to truncate that value to the target type.  This is in
>>>        preference to using the sign of the target type to extend the
>>>        value directly (which gets the wrong value for the conversion
>>>        of large unsigned numbers to larger signed types).
>
>
>> I don't understand this particular reason.  Using the sign of the source
>> type is obviously right, but why does that mean we need "infinite" precision,
>> rather than just doubling the precision of the source?
> in a sense, "infinite" does not really mean infinite, it really just 
> means large enough so that you never loose any information from the 
> top.    For widening all that you really need to be "infinite" is one 
> more bit larger than the destination type.

I'm still being clueless, sorry, but why does the intermediate int need
to be wider than the destination type?  If you're going from an N1-bit
value to an N2>N1-bit value, I don't see why you ever need more than
N2 bits.  Maybe I'm misunderstanding what the comment means by
"widening conversions".

>> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
>> actually wants it to be an unsigned value.  Some code uses it to avoid
>> the undefinedness of signed overflow.  So these overloads could lead
>> to us accidentally zero-extending what's conceptually a signed value
>> without any obvious indication that that's happening.  Also, hex constants
>> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
>> meant to be zero-extended.
>>
>> I realise the same thing can happen if you mix "unsigned int" with
>> HOST_WIDE_INT, but the point is that you shouldn't really do that
>> in general, whereas we're defining these overloads precisely so that
>> a mixture can be used.
>>
>> I'd prefer some explicit indication of the sign, at least for anything
>> other than plain "int" (so that the compiler will complain about uses
>> of "unsigned int" and above).
>
> There is a part of me that finds this scary and a part of me that feels 
> that the concern is largely theoretical.    It does make it much easier 
> to read and understand the code to be able to write "t + 6" rather than 
> "wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some 
> control over how 6 is converted.

I replied in more detail to Mike's comment, but the reason I suggested
only allowing this for plain "int" (and using templates to forbid
"unsigned int" and wider types) was that you could still write "t + 6".
You just couldn't write "t + 0x80000000" or "t + x", where "x" is a
HOST_WIDE_INT or similar.

Code can store values in "HOST_WIDE_INT" or "unsigned HOST_WIDE_INT"
if we know at GCC compile time that HOST_WIDE_INT is big enough.
But code doesn't necessarily store values in a "HOST_WIDE_INT"
because we know at GCC compile time that the value is signed,
or in a "unsigned HOST_WIDE_INT" because we know at GCC compile
time that the value is unsigned.  The signedness often depends
on the input.  The language still forces us to pick one or the
other though.  I don't feel comfortable with having syntactic
sugar that reads too much into the choice.

>>>    Note that the bits above the precision are not defined and the
>>>    algorithms used here are careful not to depend on their value.  In
>>>    particular, values that come in from rtx constants may have random
>>>    bits.
>> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
>> have random bits.  The upper bits must be a sign extension of the value.
>> There's exactly one valid rtx for each (value, mode) pair.  If you saw
>> something different then that sounds like a bug.  The rules should already
>> be fairly well enforced though, since something like (const_int 128) --
>> or (const_int 256) -- will not match a QImode operand.
>>
>> This is probably the part of the representation that I disagree most with.
>> There seem to be two main ways we could hande the extension to whole HWIs:
>>
>> (1) leave the stored upper bits undefined and extend them on read
>> (2) keep the stored upper bits in extended form
> It is not a matter of opening old wounds.   I had run some tests on 
> x86-64 and was never able to assume that the bits above the precision 
> had always been canonized.   I will admit that i did not try to run down 
> the bugs because i thought that since the rtx constructors did not have 
> a mode associated with them now was one required to in the constructors, 
> that this was not an easily solvable problem.   So i have no idea if i 
> hit the one and only bug or was about to start drinking from a fire 
> hose.

Hopefully the first one. :-)   Would you mind going back and trying again,
so that we at least have some idea what kind of bugs they were?

> But the back ends are full of GEN_INT (a) where a came from god 
> knows where and you almost never see it properly canonized.   I think 
> that until GEN_INT takes a mandatory non VOIDmode mode parameter, and 
> that constructor canonizes it, you are doomed chasing this bug 
> forever.

Well, sure, the bug crops up now and then (I fixed another instance
a week or so ago).  But generally this bug shows up as an ICE --
usually an unrecognisable instruction ICE -- precisely because this
is something that the code is already quite picky about.

Most of the GEN_INTs in the backend will be correct.  They're in
a position to make asumptions about instructions and target modes.

> Having said that, we actually do neither of (1) or (2) inside of 
> wide-int.  For rtl to wide-int, we leave the upper bits undefined and 
> never allow you to look at them because the constructor has a mode and 
> that mode allows you to draw a line in the sand.   There is no 
> constructor for the "infinite" wide ints from rtl so you have no way to 
> look.

Sorry, I don't understand the distinction you're making between this
and (1).  The precision is taken from the mode, and you flesh out the
upper bits on read if you care.

> Doing this allows us to do something that richi really wanted: avoiding 
> copying.   We do not get to do as much richi would like and when he 
> comes back from leave, he may have other places where can apply it, but 
> right now if you say w = t + 6 as above, it "borrows" the rep from t to 
> do the add, it does not really build a wide-int. We also do this if t is 
> an rtx const.   But if we had to canonize the number, then we could not 
> borrow the rep.

But the point is that the number is already canonical for rtx, so you
should do nothing.  I.e. (2) will be doing less work.  If we have other
input types besides rtx that don't define the upper bits, then under (2)
their wide_int accessor functions (to_shwi[12] in the current scheme)
should do the extension.

>> The patch goes for (1) but (2) seems better to me, for a few reasons:
>>
>> * As above, constants coming from rtl are already in the right form,
>>    so if you create a wide_int from an rtx and only query it, no explicit
>>    extension is needed.
>>
>> * Things like logical operations and right shifts naturally preserve
>>    the sign-extended form, so only a subset of write operations need
>>    to take special measures.
> right now the internals of wide-int do not keep the bits above the 
> precision clean.   as you point out this could be fixed by changing 
> lshift, add, sub, mul, div (and anything else i have forgotten about) 
> and removing the code that cleans this up on exit.   I actually do not 
> really care which way we go here but if we do go on keeping the bits 
> clean above the precision inside wide-int, we are going to have to clean 
> the bits in the constructors from rtl, or fix some/a lot of bugs.
>
> But if you want to go with the stay clean plan you have to start clean, 
> so at the rtl level this means copying. and it is the not copying trick 
> that pushed me in the direction we went.
>
> At the tree level, this is not an issue.   There are no constructors for 
> tree-csts that do not have a type and before we copy the rep from the 
> wide-int to the tree, we clean the top bits.   (BTW - If i had to guess 
> what the bug is with the missing messages on the mips port, it would be 
> because the front ends HAD a bad habit of creating constants that did 
> not fit into a type and then later checking to see if there were any 
> interesting bits above the precision in the int-cst.  This now does not 
> work because we clean out those top bits on construction but it would 
> not surprise me if we missed the fixed point constant path.)   So at the 
> tree level, we could easily go either way here, but there is a cost at 
> the rtl level with doing (2).

TBH, I think we should do (2) and fix whatever bugs you saw with invalid
rtx constants.

>>>    When the precision is 0, all the bits in the LEN elements of
>>>    VEC are significant with no undefined bits.  Precisionless
>>>    constants are limited to being one or two HOST_WIDE_INTs.  When two
>>>    are used the upper value is 0, and the high order bit of the first
>>>    value is set.  (Note that this may need to be generalized if it is
>>>    ever necessary to support 32bit HWIs again).
>> I didn't understand this.  When are two HOST_WIDE_INTs needed for
>> "precision 0"?
> if a large unsigned constant comes in with the top bit set, the 
> canonized value takes 2 hwis, the top hwi being 0.

Ah, yeah.  But then that sounds like another reason to only allow
"int" to have zero precision.

>>> /* Return THIS as a signed HOST_WIDE_INT.  If THIS does not fit in
>>>     PREC, the information is lost. */
>>> HOST_WIDE_INT
>>> to_shwi (unsigned int prec = 0) const
>> Just dropping the excess bits seems dangerous.  I think we should assert
>> instead, at least when prec is 0.
> there are times when this is useful.   there is a lot of code that just 
> wants to look at the bottom bits to do some alignment stuff. I guess 
> that code could just grab the bottom elt of the array, but has not 
> generally been how these this has been done.

Yeah, using elt to mean "get the low HOST_WIDE_INT" sounds better to me FWIW.
"to_shwi" sounds like a type conversion whereas "elt" is more obviously
accessing only part of the value.

>>>    gcc_unreachable ();
>>> #if 0
>>>    return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1);
>>> #endif
>> Maybe remove this?
>>
>> The inline div routines do:
> i will work on this this weekend.   tree vrp has not been our friend and 
> sometimes does not like to compile this function.

OK.  If it's just temporary then never mind, but a comment might help.
I remember you mentioned tree-vrp in a reply to one of Richard's reviews,
so this is probably something you've been asked before.  And will probably
be asked again if there's no comment :-)

>> I think these are going to bitrot quickly if we #if 0 then out.
>> I think we should either use:
>>
>>    if (DEBUG_WIDE_INT)
>>      debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2);
>>
>> or drop them.
> my plan is to leave these in while the branch is still being developed 
> and then get rid of them before it is merged.    My guess is that i am 
> going to need them still when i try the 32bit hwi test.

Ah, OK, sounds good.

One thing I hit yesterday while trying out the "lightweight classes"
idea from the end of the mail is that, as things stand, most operations
end up with machine instructions to do:

  if (*p1 == 0)
    *p1 = *p2;

  if (*p2 == 0)
    *p2 = *p1;

E.g. for wide_int + wide_int, wide_int + tree and wide_int + rtx,
where the compiler has no way of statically proving that the precisions
are nonzero.  Considering all the template goo we're adding to avoid
a copy (even though that copy is going to be 24 bytes in the worst case
on most targets, and could be easily SRAed), this seems like a relatively
high overhead for syntactic sugar, especially on hosts that need branches
rather than conditional moves.

Either all this is premature optimisation (including avoiding the
copying) or we should probably do something to avoid these checks too.
I wonder how easy it would be to restrict this use of "zero precision"
(i.e. flexible precision) to those where primitive types like "int" are
used as template arguments to operators, and require a precision when
constructing a wide_int.  I wouldn't have expected "real" precision 0
(from zero-width bitfields or whatever) to need any special handling
compared to precision 1 or 2.

Thanks,
Richard
Florian Weimer Aug. 24, 2013, 6:16 p.m. UTC | #7
On 08/13/2013 10:57 PM, Kenneth Zadeck wrote:
> 1) The 4 files that hold the wide-int code itself.  You have seen a
>     lot of this code before except for the infinite precision
>     templates.  Also the classes are more C++ than C in their flavor.
>     In particular, the integration with trees is very tight in that an
>     int-cst or regular integers can be the operands of any wide-int
>     operation.

Are any of these conversions lossy?  Maybe some of these constructors 
should be made explicit?
Kenneth Zadeck Aug. 24, 2013, 6:42 p.m. UTC | #8
On 08/24/2013 02:16 PM, Florian Weimer wrote:
> On 08/13/2013 10:57 PM, Kenneth Zadeck wrote:
>> 1) The 4 files that hold the wide-int code itself.  You have seen a
>>     lot of this code before except for the infinite precision
>>     templates.  Also the classes are more C++ than C in their flavor.
>>     In particular, the integration with trees is very tight in that an
>>     int-cst or regular integers can be the operands of any wide-int
>>     operation.
>
> Are any of these conversions lossy?  Maybe some of these constructors 
> should be made explicit?
>
It depends, there is nothing wrong with lossy conversions as long as you 
know what you are doing.
Mike Stump Aug. 25, 2013, 5:39 p.m. UTC | #9
On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
> We really need to get rid of the #include "tm.h" in wide-int.h.
> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
> thing in there.  If that comes from tm.h then perhaps we should put it
> into a new header file instead.

BITS_PER_UNIT comes from there as well, and I'd need both.  Grabbing the #defines we generate is easy enough, but BITS_PER_UNIT would be more annoying.  No port in the tree makes use of it yet (other than 8).  So, do we just assume BITS_PER_UNIT is 8?
Richard Sandiford Aug. 25, 2013, 6:29 p.m. UTC | #10
Mike Stump <mikestump@comcast.net> writes:
> On Aug 23, 2013, at 8:02 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> We really need to get rid of the #include "tm.h" in wide-int.h.
>> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
>> thing in there.  If that comes from tm.h then perhaps we should put it
>> into a new header file instead.
>
> BITS_PER_UNIT comes from there as well, and I'd need both.  Grabbing the
> #defines we generate is easy enough, but BITS_PER_UNIT would be more
> annoying.  No port in the tree makes use of it yet (other than 8).  So,
> do we just assume BITS_PER_UNIT is 8?

Looks like wide-int is just using BITS_PER_UNIT to get the number of
bits in "char".  That's a host thing, so it should be CHAR_BIT instead.

Thanks,
Richard
Mike Stump Aug. 25, 2013, 7:49 p.m. UTC | #11
On Aug 25, 2013, at 11:29 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
> Mike Stump <mikestump@comcast.net> writes:
>> On Aug 23, 2013, at 8:02 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> We really need to get rid of the #include "tm.h" in wide-int.h.
>>> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
>>> thing in there.  If that comes from tm.h then perhaps we should put it
>>> into a new header file instead.
>> 
>> BITS_PER_UNIT comes from there as well, and I'd need both.  Grabbing the
>> #defines we generate is easy enough, but BITS_PER_UNIT would be more
>> annoying.  No port in the tree makes use of it yet (other than 8).  So,
>> do we just assume BITS_PER_UNIT is 8?
> 
> Looks like wide-int is just using BITS_PER_UNIT to get the number of
> bits in "char".  That's a host thing, so it should be CHAR_BIT instead.

?  What?  No.  BITS_PER_UNIT is a feature of the target machine, so, it is absolutely wrong to use a property of the host machine or the build machine.  We don't use sizeof(int) to set the size of int on the target for the example same reason.
Joseph Myers Aug. 25, 2013, 8:11 p.m. UTC | #12
On Sun, 25 Aug 2013, Mike Stump wrote:

> On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
> > We really need to get rid of the #include "tm.h" in wide-int.h.
> > MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
> > thing in there.  If that comes from tm.h then perhaps we should put it
> > into a new header file instead.
> 
> BITS_PER_UNIT comes from there as well, and I'd need both.  Grabbing the 
> #defines we generate is easy enough, but BITS_PER_UNIT would be more 
> annoying.  No port in the tree makes use of it yet (other than 8).  So, 
> do we just assume BITS_PER_UNIT is 8?

Regarding avoiding tm.h dependence through BITS_PER_UNIT (without actually 
converting it from a target macro to a target hook), see my suggestions at 
<http://gcc.gnu.org/ml/gcc-patches/2010-11/msg02617.html>.  It would seem 
fairly reasonable, if in future other macros are converted to hooks and 
it's possible to build multiple back ends into a single compiler binary, 
to require that all such back ends share a value of BITS_PER_UNIT.

BITS_PER_UNIT describes the number of bits in QImode - the RTL-level byte.  
I don't think wide-int should care about that at all.  As I've previously 
noted, many front-end uses of BITS_PER_UNIT really care about the C-level 
char and so should be TYPE_PRECISION (char_type_node).  Generally, before 
thinking about how to get BITS_PER_UNIT somewhere, consider whether the 
code is actually correct to be using BITS_PER_UNIT at all - whether it's 
the RTL-level QImode that is really what's relevant to the code.
Mike Stump Aug. 25, 2013, 9:38 p.m. UTC | #13
On Aug 25, 2013, at 1:11 PM, "Joseph S. Myers" <joseph@codesourcery.com> wrote:
> On Sun, 25 Aug 2013, Mike Stump wrote:
>> On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
>>> We really need to get rid of the #include "tm.h" in wide-int.h.
>>> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent
>>> thing in there.  If that comes from tm.h then perhaps we should put it
>>> into a new header file instead.
>> 
>> BITS_PER_UNIT comes from there as well, and I'd need both.  Grabbing the 
>> #defines we generate is easy enough, but BITS_PER_UNIT would be more 
>> annoying.  No port in the tree makes use of it yet (other than 8).  So, 
>> do we just assume BITS_PER_UNIT is 8?
> 
> Regarding avoiding tm.h dependence through BITS_PER_UNIT (without actually 
> converting it from a target macro to a target hook), see my suggestions at 
> <http://gcc.gnu.org/ml/gcc-patches/2010-11/msg02617.html>.  It would seem 
> fairly reasonable, if in future other macros are converted to hooks and 
> it's possible to build multiple back ends into a single compiler binary, 
> to require that all such back ends share a value of BITS_PER_UNIT.

Ick.  I don't see the beauty of this direction.  If one wants to move in a, we can generate code for any target, fine, let's do that.  If someone wants to make a target, just a dynamically loaded shared library, let's do that.  There are all sorts of directions to move it, but the intermediate, let's design and implement a system, where for some targets it works, and for some targets it doesn't work, well, I think that is short sighted and we ought not target that.

Having a separate tm-blabla.h for some of the selections of the target is fine, but mandating that as the form for doing a target is, well, bad.

I'd love to see the entire hook and target selection mechanism brought up to 1990's level of sophistication, the 1970s are over.  Sigh.  Anyway, all this is well beyond the scope of the work at hand.

> As I've previously 
> noted, many front-end uses of BITS_PER_UNIT really care about the C-level 
> char and so should be TYPE_PRECISION (char_type_node).  Generally, before 
> thinking about how to get BITS_PER_UNIT somewhere, consider whether the 
> code is actually correct to be using BITS_PER_UNIT at all - whether it's 
> the RTL-level QImode that is really what's relevant to the code.

I think we got all the uses correct.  Let us know if any are wrong.
Richard Biener Aug. 28, 2013, 8:57 a.m. UTC | #14
On Fri, 23 Aug 2013, Richard Sandiford wrote:

> Hi Kenny,
> 
> This is the first time I've looked at the implementation of wide-int.h
> (rather than just looking at the rtl changes, which as you know I like
> in general), so FWIW here are some comments on wide-int.h.  I expect
> a lot of them overlap with Richard B.'s comments.
> 
> I also expect many of them are going to be annoying, sorry, but this
> first one definitely will.  The coding conventions say that functions
> should be defined outside the class:
> 
>     http://gcc.gnu.org/codingconventions.html
> 
> and that opening braces should be on their own line, so most of the file
> needs to be reformatted.  I went through and made that change with the
> patch below, in the process of reading through.  I also removed "SGN
> must be SIGNED or UNSIGNED." because it seemed redundant when those are
> the only two values available.  The patch fixes a few other coding standard
> problems and typos, but I've not made any actual code changes (or at least,
> I didn't mean to).
> 
> Does it look OK to install?
> 
> I'm still unsure about these "infinite" precision types, but I understand
> the motivation and I have no objections.  However:
> 
> >     * Code that does widening conversions.  The canonical way that
> >       this is performed is to sign or zero extend the input value to
> >       the max width based on the sign of the type of the source and
> >       then to truncate that value to the target type.  This is in
> >       preference to using the sign of the target type to extend the
> >       value directly (which gets the wrong value for the conversion
> >       of large unsigned numbers to larger signed types).
> 
> I don't understand this particular reason.  Using the sign of the source
> type is obviously right, but why does that mean we need "infinite" precision,
> rather than just doubling the precision of the source?

The comment indeed looks redundant - of course it is not correct
to extend the value "directly".

> >   * When a constant that has an integer type is converted to a
> >     wide-int it comes in with precision 0.  For these constants the
> >     top bit does accurately reflect the sign of that constant; this
> >     is an exception to the normal rule that the signedness is not
> >     represented.  When used in a binary operation, the wide-int
> >     implementation properly extends these constants so that they
> >     properly match the other operand of the computation.  This allows
> >     you write:
> >
> >                tree t = ...
> >                wide_int x = t + 6;
> >
> >     assuming t is a int_cst.
> 
> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
> actually wants it to be an unsigned value.  Some code uses it to avoid
> the undefinedness of signed overflow.  So these overloads could lead
> to us accidentally zero-extending what's conceptually a signed value
> without any obvious indication that that's happening.  Also, hex constants
> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was
> meant to be zero-extended.
>
> I realise the same thing can happen if you mix "unsigned int" with
> HOST_WIDE_INT, but the point is that you shouldn't really do that
> in general, whereas we're defining these overloads precisely so that
> a mixture can be used.
> 
> I'd prefer some explicit indication of the sign, at least for anything
> other than plain "int" (so that the compiler will complain about uses
> of "unsigned int" and above).

I prefer the automatic promotion - it is exactly what regular C types
do.  Now, consider

  wide_int x = ... construct 5 with precision 16 ...
  wide_int y = x + 6;

now, '6' is 'int' (precision 32), but with wide-int we treat it
as precision '0' ('infinite').  For x + 6 we the _truncate_ its
precision to that of 'x'(?) not exactly matching C behavior
(where we'd promote 'x' to 'int', perform the add and then truncate
to the precision of 'y' - which for wide-int gets its precision
from the result of x + 6).

Mimicing C would support dropping those 'require equal precision'
asserts but also would require us to properly track a sign to be
able to promote properly (or as I argued all the time always
properly sign-extend values so we effectively have infinite precision
anyway).

The fits_uhwi_p implementation changes scares me off that
"upper bits are undefined" thing a lot again... (I hate introducing
'undefinedness' into the compiler by 'design')

> >   Note that the bits above the precision are not defined and the
> >   algorithms used here are careful not to depend on their value.  In
> >   particular, values that come in from rtx constants may have random
> >   bits.

Which is a red herring.  It should be fixed.  I cannot even believe
that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
anywhere.

> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
> have random bits.  The upper bits must be a sign extension of the value.
> There's exactly one valid rtx for each (value, mode) pair.  If you saw
> something different then that sounds like a bug.  The rules should already
> be fairly well enforced though, since something like (const_int 128) --
> or (const_int 256) -- will not match a QImode operand.

See.  We're saved ;)

> This is probably the part of the representation that I disagree most with.
> There seem to be two main ways we could hande the extension to whole HWIs:
> 
> (1) leave the stored upper bits undefined and extend them on read
> (2) keep the stored upper bits in extended form
> 
> The patch goes for (1) but (2) seems better to me, for a few reasons:

I agree whole-heartedly.

> * As above, constants coming from rtl are already in the right form,
>   so if you create a wide_int from an rtx and only query it, no explicit
>   extension is needed.
> 
> * Things like logical operations and right shifts naturally preserve
>   the sign-extended form, so only a subset of write operations need
>   to take special measures.
> 
> * You have a public interface that exposes the underlying HWIs
>   (which is fine with me FWIW), so it seems better to expose a fully-defined
>   HWI rather than only a partially-defined HWI.
> 
> E.g. zero_p is:
> 
>   HOST_WIDE_INT x;
> 
>   if (precision && precision < HOST_BITS_PER_WIDE_INT)
>     x = sext_hwi (val[0], precision);
>   else if (len == 0)
>     {
>       gcc_assert (precision == 0);
>       return true;
>     }
>   else
>     x = val[0];
> 
>   return len == 1 && x == 0;
> 
> but I think it really ought to be just:
> 
>   return len == 1 && val[0] == 0;

Yes!

But then - what value does keeping track of a 'precision' have
in this case?  It seems to me it's only a "convenient carrier"
for

  wide_int x = wide-int-from-RTX (y);
  machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
  ... process x ...
  RTX = RTX-from-wide_int (x, saved_mode);

that is, wide-int doesn't do anything with 'precision' but you
can extract it later to not need to remember a mode you were
interested in?

Oh, and of course some operations require a 'precision', like rotate.

> >   When the precision is 0, all the bits in the LEN elements of
> >   VEC are significant with no undefined bits.  Precisionless
> >   constants are limited to being one or two HOST_WIDE_INTs.  When two
> >   are used the upper value is 0, and the high order bit of the first
> >   value is set.  (Note that this may need to be generalized if it is
> >   ever necessary to support 32bit HWIs again).
> 
> I didn't understand this.  When are two HOST_WIDE_INTs needed for
> "precision 0"?

For the wide_int containing unsigned HOST_WIDE_INT ~0.  As we
sign-extend the representation (heh, yes, we do or should!) we
require an extra HWI to store the fact that ~0 is unsigned.

> The main thing that's changed since the early patches is that we now
> have a mixture of wide-int types.  This seems to have led to a lot of
> boiler-plate forwarding functions (or at least it felt like that while
> moving them all out the class).  And that in turn seems to be because
> you're trying to keep everything as member functions.  E.g. a lot of the
> forwarders are from a member function to a static function.
> 
> Wouldn't it be better to have the actual classes be light-weight,
> with little more than accessors, and do the actual work with non-member
> template functions?  There seems to be 3 grades of wide-int:
> 
>   (1) read-only, constant precision  (from int, etc.)
>   (2) read-write, constant precision  (fixed_wide_int)
>   (3) read-write, variable precision  (wide_int proper)
> 
> but we should be able to hide that behind templates, with compiler errors
> if you try to write to (1), etc.

Yeah, I'm probably trying to clean up the implementation once I
got past recovering from two month without GCC ...

Richard.
Richard Sandiford Aug. 28, 2013, 9:49 a.m. UTC | #15
Richard Biener <rguenther@suse.de> writes:
>> * As above, constants coming from rtl are already in the right form,
>>   so if you create a wide_int from an rtx and only query it, no explicit
>>   extension is needed.
>> 
>> * Things like logical operations and right shifts naturally preserve
>>   the sign-extended form, so only a subset of write operations need
>>   to take special measures.
>> 
>> * You have a public interface that exposes the underlying HWIs
>>   (which is fine with me FWIW), so it seems better to expose a fully-defined
>>   HWI rather than only a partially-defined HWI.
>> 
>> E.g. zero_p is:
>> 
>>   HOST_WIDE_INT x;
>> 
>>   if (precision && precision < HOST_BITS_PER_WIDE_INT)
>>     x = sext_hwi (val[0], precision);
>>   else if (len == 0)
>>     {
>>       gcc_assert (precision == 0);
>>       return true;
>>     }
>>   else
>>     x = val[0];
>> 
>>   return len == 1 && x == 0;
>> 
>> but I think it really ought to be just:
>> 
>>   return len == 1 && val[0] == 0;
>
> Yes!
>
> But then - what value does keeping track of a 'precision' have
> in this case?  It seems to me it's only a "convenient carrier"
> for
>
>   wide_int x = wide-int-from-RTX (y);
>   machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
>   ... process x ...
>   RTX = RTX-from-wide_int (x, saved_mode);
>
> that is, wide-int doesn't do anything with 'precision' but you
> can extract it later to not need to remember a mode you were
> interested in?

I can see why you like the constant-precision, very wide integers for trees,
where the constants have an inherent sign.  But (and I think this might be
old ground too :-)), that isn't the case with rtl.  At the tree level,
using constant-precision, very wide integers allows you to add a 32-bit
signed INTEGER_CST to a 16-unsigned INTEGER_CST.  And that has an
obvious meaning, both as a 32-bit result or as a wider result, depending
on how you choose to use it.  But in rtl there is no meaning to adding
an SImode and an HImode value together, since we don't know how to
extend the HImode value beyond its precision.  You must explicitly sign-
or zero-extend the value first.  (The fact that we choose to sign-extend
rtl constants when storing them in HWIs is just a representation detail,
to avoid having undefined bits in the HWIs.  It doesn't mean that rtx
values themselves are signed.  We could have used a zero-extending
representation instead without changing the semantics.)

So the precision variable is good for the rtl level in several ways:

- As you say, it avoids adding the explicit truncations that (in practice)
  every rtl operation would need

- It's more efficient in that case, since we don't calculate high values
  and then discard them immediately.  The common GET_MODE_PRECISION (mode)
  <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
  the wide-int trappings.

- It's a good way of checking type safety and making sure that excess
  bits aren't accidentally given a semantic meaning.  This is the most
  important reason IMO.

The branch has both the constant-precision, very wide integers that we
want for trees and the variable-precision integers we want for rtl,
so it's not an "either or".  With the accessor-based implementation,
there should be very little cost to having both.

>> The main thing that's changed since the early patches is that we now
>> have a mixture of wide-int types.  This seems to have led to a lot of
>> boiler-plate forwarding functions (or at least it felt like that while
>> moving them all out the class).  And that in turn seems to be because
>> you're trying to keep everything as member functions.  E.g. a lot of the
>> forwarders are from a member function to a static function.
>> 
>> Wouldn't it be better to have the actual classes be light-weight,
>> with little more than accessors, and do the actual work with non-member
>> template functions?  There seems to be 3 grades of wide-int:
>> 
>>   (1) read-only, constant precision  (from int, etc.)
>>   (2) read-write, constant precision  (fixed_wide_int)
>>   (3) read-write, variable precision  (wide_int proper)
>> 
>> but we should be able to hide that behind templates, with compiler errors
>> if you try to write to (1), etc.
>
> Yeah, I'm probably trying to clean up the implementation once I
> got past recovering from two month without GCC ...

FWIW, I've been plugging away at a version that uses accessors.
I hope to have it vaguely presentable by the middle of next week,
in case your recovery takes that long...

Thanks,
Richard
Richard Biener Aug. 28, 2013, 10:22 a.m. UTC | #16
On Wed, 28 Aug 2013, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> >> * As above, constants coming from rtl are already in the right form,
> >>   so if you create a wide_int from an rtx and only query it, no explicit
> >>   extension is needed.
> >> 
> >> * Things like logical operations and right shifts naturally preserve
> >>   the sign-extended form, so only a subset of write operations need
> >>   to take special measures.
> >> 
> >> * You have a public interface that exposes the underlying HWIs
> >>   (which is fine with me FWIW), so it seems better to expose a fully-defined
> >>   HWI rather than only a partially-defined HWI.
> >> 
> >> E.g. zero_p is:
> >> 
> >>   HOST_WIDE_INT x;
> >> 
> >>   if (precision && precision < HOST_BITS_PER_WIDE_INT)
> >>     x = sext_hwi (val[0], precision);
> >>   else if (len == 0)
> >>     {
> >>       gcc_assert (precision == 0);
> >>       return true;
> >>     }
> >>   else
> >>     x = val[0];
> >> 
> >>   return len == 1 && x == 0;
> >> 
> >> but I think it really ought to be just:
> >> 
> >>   return len == 1 && val[0] == 0;
> >
> > Yes!
> >
> > But then - what value does keeping track of a 'precision' have
> > in this case?  It seems to me it's only a "convenient carrier"
> > for
> >
> >   wide_int x = wide-int-from-RTX (y);
> >   machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
> >   ... process x ...
> >   RTX = RTX-from-wide_int (x, saved_mode);
> >
> > that is, wide-int doesn't do anything with 'precision' but you
> > can extract it later to not need to remember a mode you were
> > interested in?
> 
> I can see why you like the constant-precision, very wide integers for trees,
> where the constants have an inherent sign.  But (and I think this might be
> old ground too :-)), that isn't the case with rtl.  At the tree level,
> using constant-precision, very wide integers allows you to add a 32-bit
> signed INTEGER_CST to a 16-unsigned INTEGER_CST.  And that has an
> obvious meaning, both as a 32-bit result or as a wider result, depending
> on how you choose to use it.  But in rtl there is no meaning to adding
> an SImode and an HImode value together, since we don't know how to
> extend the HImode value beyond its precision.  You must explicitly sign-
> or zero-extend the value first.  (The fact that we choose to sign-extend
> rtl constants when storing them in HWIs is just a representation detail,
> to avoid having undefined bits in the HWIs.  It doesn't mean that rtx
> values themselves are signed.  We could have used a zero-extending
> representation instead without changing the semantics.)

Yeah, that was my understanding.

> So the precision variable is good for the rtl level in several ways:
> 
> - As you say, it avoids adding the explicit truncations that (in practice)
>   every rtl operation would need
> 
> - It's more efficient in that case, since we don't calculate high values
>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
>   the wide-int trappings.
> 
> - It's a good way of checking type safety and making sure that excess
>   bits aren't accidentally given a semantic meaning.  This is the most
>   important reason IMO.
> 
> The branch has both the constant-precision, very wide integers that we
> want for trees and the variable-precision integers we want for rtl,
> so it's not an "either or".  With the accessor-based implementation,
> there should be very little cost to having both.

So what I wonder (and where we maybe disagree) is how much code
wants to inspect "intermediate" results.  Say originally you have

rtx foo (rtx x, rtx y)
{
  rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
GEN_INT (1));
  rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
y);
  return res;
}

and with wide-int you want to change that to

rtx foo (rtx x, rtx y)
{
  wide_int tem = wide_int (x) + 1;
  wide_int res = tem - y;
  return res.to_rtx ();
}

how much code ever wants to inspect 'tem' or 'res'?
That is, does it matter
if 'tem' and 'res' would have been calculated in "infinite precision"
and only to_rtx () would do the truncation to the desired mode?

I think not.  The amount of code performing multiple operations on
_constants_ in sequence is extremely low (if it even exists).

So I'd rather have to_rtx get a mode argument (or a precision) and
perform the required truncation / sign-extension at RTX construction
time (which is an expensive operation anyway).

So where does this "break"?  It only breaks where previous code
broke (or where previous code had measures to not break that we
don't carry over to the wide-int case).  Obvious case is unsigned
division on the sign-extended rep of RTL constants for example.

> >> The main thing that's changed since the early patches is that we now
> >> have a mixture of wide-int types.  This seems to have led to a lot of
> >> boiler-plate forwarding functions (or at least it felt like that while
> >> moving them all out the class).  And that in turn seems to be because
> >> you're trying to keep everything as member functions.  E.g. a lot of the
> >> forwarders are from a member function to a static function.
> >> 
> >> Wouldn't it be better to have the actual classes be light-weight,
> >> with little more than accessors, and do the actual work with non-member
> >> template functions?  There seems to be 3 grades of wide-int:
> >> 
> >>   (1) read-only, constant precision  (from int, etc.)
> >>   (2) read-write, constant precision  (fixed_wide_int)
> >>   (3) read-write, variable precision  (wide_int proper)
> >> 
> >> but we should be able to hide that behind templates, with compiler errors
> >> if you try to write to (1), etc.
> >
> > Yeah, I'm probably trying to clean up the implementation once I
> > got past recovering from two month without GCC ...
> 
> FWIW, I've been plugging away at a version that uses accessors.
> I hope to have it vaguely presentable by the middle of next week,
> in case your recovery takes that long...

Depends on my priorities ;)

Btw, rtl.h still wastes space with

struct GTY((variable_size)) hwivec_def {
  int num_elem;         /* number of elements */
  HOST_WIDE_INT elem[1];
};

struct GTY((chain_next ("RTX_NEXT (&%h)"),
            chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def {
...
  /* The first element of the operands of this rtx.
     The number of operands and their types are controlled
     by the `code' field, according to rtl.def.  */
  union u {
    rtunion fld[1];
    HOST_WIDE_INT hwint[1];
    struct block_symbol block_sym;
    struct real_value rv;
    struct fixed_value fv;
    struct hwivec_def hwiv;
  } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
};

there are 32bits available before the union.  If you don't use
those for num_elem then all wide-ints will at least take as
much space as DOUBLE_INTs originally took - and large ints
that would have required DOUBLE_INTs in the past will now
require more space than before.  Which means your math
motivating the 'num_elem' encoding stuff is wrong.  With
moving 'num_elem' before u you can even re-use the hwint
field in the union as the existing double-int code does
(which in fact could simply do the encoding trick in the
old CONST_DOUBLE scheme, similar for the tree INTEGER_CST
container).

Richard.
Richard Sandiford Aug. 28, 2013, 11:26 a.m. UTC | #17
Richard Biener <rguenther@suse.de> writes:
>> So the precision variable is good for the rtl level in several ways:
>> 
>> - As you say, it avoids adding the explicit truncations that (in practice)
>>   every rtl operation would need
>> 
>> - It's more efficient in that case, since we don't calculate high values
>>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
>>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
>>   the wide-int trappings.
>> 
>> - It's a good way of checking type safety and making sure that excess
>>   bits aren't accidentally given a semantic meaning.  This is the most
>>   important reason IMO.
>> 
>> The branch has both the constant-precision, very wide integers that we
>> want for trees and the variable-precision integers we want for rtl,
>> so it's not an "either or".  With the accessor-based implementation,
>> there should be very little cost to having both.
>
> So what I wonder (and where we maybe disagree) is how much code
> wants to inspect "intermediate" results.  Say originally you have
>
> rtx foo (rtx x, rtx y)
> {
>   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
> GEN_INT (1));
>   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
> y);
>   return res;
> }
>
> and with wide-int you want to change that to
>
> rtx foo (rtx x, rtx y)
> {
>   wide_int tem = wide_int (x) + 1;
>   wide_int res = tem - y;
>   return res.to_rtx ();
> }
>
> how much code ever wants to inspect 'tem' or 'res'?
> That is, does it matter
> if 'tem' and 'res' would have been calculated in "infinite precision"
> and only to_rtx () would do the truncation to the desired mode?
>
> I think not.  The amount of code performing multiple operations on
> _constants_ in sequence is extremely low (if it even exists).
>
> So I'd rather have to_rtx get a mode argument (or a precision) and
> perform the required truncation / sign-extension at RTX construction
> time (which is an expensive operation anyway).

I agree this is where we disagree.  I don't understand why you think
the above is better.  Why do we want to do "infinite precision"
addition of two values when only the lowest N bits of those values
have a (semantically) defined meaning?  Earlier in the thread it sounded
like we both agreed that having undefined bits in the _representation_
was bad.  So why do we want to do calculations on parts of values that
are undefined in the (rtx) semantics?

E.g. say we're adding two rtx values whose mode just happens to be
HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
the carry from adding the two HWIs, only to add it to an upper HWI
that has no semantically-defined value?  It's garbage in, garbage out.

Providing this for rtl doesn't affect the tree-level operations in any
way.  Although things like addition require both arguments to have the
same precision after promotion, that's trivially true for trees, since
(a) the inputs used there -- C integers and tree constants -- can be
promoted and (b) tree-level code uses fixed_wide_int <N>, where every
fixed_wide_int <N> has the same precision.

And you can still do "infinite-precision" arithmetic on rtx constants
if you want.  You just have to say how you want the constant to be
extended (sign or zero), so that the value of all bits is meaningful.

Thanks,
Richard
Richard Biener Aug. 28, 2013, 11:57 a.m. UTC | #18
On Wed, 28 Aug 2013, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> >> So the precision variable is good for the rtl level in several ways:
> >> 
> >> - As you say, it avoids adding the explicit truncations that (in practice)
> >>   every rtl operation would need
> >> 
> >> - It's more efficient in that case, since we don't calculate high values
> >>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
> >>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
> >>   the wide-int trappings.
> >> 
> >> - It's a good way of checking type safety and making sure that excess
> >>   bits aren't accidentally given a semantic meaning.  This is the most
> >>   important reason IMO.
> >> 
> >> The branch has both the constant-precision, very wide integers that we
> >> want for trees and the variable-precision integers we want for rtl,
> >> so it's not an "either or".  With the accessor-based implementation,
> >> there should be very little cost to having both.
> >
> > So what I wonder (and where we maybe disagree) is how much code
> > wants to inspect "intermediate" results.  Say originally you have
> >
> > rtx foo (rtx x, rtx y)
> > {
> >   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
> > GEN_INT (1));
> >   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
> > y);
> >   return res;
> > }
> >
> > and with wide-int you want to change that to
> >
> > rtx foo (rtx x, rtx y)
> > {
> >   wide_int tem = wide_int (x) + 1;
> >   wide_int res = tem - y;
> >   return res.to_rtx ();
> > }
> >
> > how much code ever wants to inspect 'tem' or 'res'?
> > That is, does it matter
> > if 'tem' and 'res' would have been calculated in "infinite precision"
> > and only to_rtx () would do the truncation to the desired mode?
> >
> > I think not.  The amount of code performing multiple operations on
> > _constants_ in sequence is extremely low (if it even exists).
> >
> > So I'd rather have to_rtx get a mode argument (or a precision) and
> > perform the required truncation / sign-extension at RTX construction
> > time (which is an expensive operation anyway).
> 
> I agree this is where we disagree.  I don't understand why you think
> the above is better.  Why do we want to do "infinite precision"
> addition of two values when only the lowest N bits of those values
> have a (semantically) defined meaning?  Earlier in the thread it sounded
> like we both agreed that having undefined bits in the _representation_
> was bad.  So why do we want to do calculations on parts of values that
> are undefined in the (rtx) semantics?
> 
> E.g. say we're adding two rtx values whose mode just happens to be
> HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
> the carry from adding the two HWIs, only to add it to an upper HWI
> that has no semantically-defined value?  It's garbage in, garbage out.

Not garbage in, and not garbage out (just wasted work).  That's
the possible downside - the upside is to get rid of the notion of
a 'precision'.

But yes, it's good that we agree on the fact that undefined bits
in the _representation_ are wrong.

OTOH they still will be in some ways "undefined" if you consider

  wide_int xw = from_rtx (xr, mode);
  tree xt = to_tree (xw, type);
  wide_int xw2 = from_tree (xt);

with an unsigned type xw and xw2 will not be equal (in the
'extension' bits) for a value with MSB set.  That is, RTL
chooses to always sign-extend, tree chooses to extend according
to sign information.  wide-int chooses to ... ?  (it seems the
wide-int overall comment lost the part that defined its encoding,
but it seems that we still sign-extend val[len-1], so
(unsigned HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and
(HOST_WIDE_INT)-1 is { -1 } with len == 1.  In RTL both
would be encoded with len == 1 (no distinction between a signed
and unsigned number with all bits set), on the current
tree representation the encoding would be with len == 1, too,
as we have TYPE_UNSIGNED to tell us the sign.

So we still need to somehow "map" between those representations.

Coming from the tree side (as opposed to from the RTL side) I'd
have argued you need a 'signed' flag ;)  We side-stepped that
by doing the extension trick in a way that preserves sign information.

Looking at the RTL representation from that wide-int representation
makes RTL look as if all constants are signed.  That's fine if
code that wants to do unsigned stuff properly extends.  So - do
we need a from_rtx_unsigned () constructor that does this?
I'm worried about all the extensions done in operations like add ():

  if (p1 <= HOST_BITS_PER_WIDE_INT)
    {
      result.len = 1;
      result.precision = p1;
      result.val[0] = val[0] + s[0];
      if (p1 < HOST_BITS_PER_WIDE_INT)
        result.val[0] = sext_hwi (result.val[0], p1);
      if (sgn == SIGNED)
        {
          HOST_WIDE_INT x
            = (((result.val[0] ^ val[0]) & (result.val[0] ^ s[0]))
               >> (p1 - 1)) & 1;
          *overflow = (x != 0);
        }
      else
        *overflow = ((unsigned HOST_WIDE_INT) result.val[0]
                     < (unsigned HOST_WIDE_INT) val[0]);
      }

that's supposed to be a cheap operation ... and as far as I can see
it even computes wrong outcome :/  Given precision 32 and the
unsigned value 0xfffffffe, thus { 0x00000000fffffffe }, and
adding 0 it will produce { 0xfffffffffffffffe }, the signed value -2.

So that

      if (p1 < HOST_BITS_PER_WIDE_INT)
        result.val[0] = sext_hwi (result.val[0], p1);

is clearly wrong.  Yes, it may be needed when constructing a
RTX with that value.

Richard.
Richard Sandiford Aug. 28, 2013, 12:29 p.m. UTC | #19
Richard Biener <rguenther@suse.de> writes:

> On Wed, 28 Aug 2013, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> >> So the precision variable is good for the rtl level in several ways:
>> >> 
>> >> - As you say, it avoids adding the explicit truncations that (in practice)
>> >>   every rtl operation would need
>> >> 
>> >> - It's more efficient in that case, since we don't calculate high values
>> >>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
>> >>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
>> >>   the wide-int trappings.
>> >> 
>> >> - It's a good way of checking type safety and making sure that excess
>> >>   bits aren't accidentally given a semantic meaning.  This is the most
>> >>   important reason IMO.
>> >> 
>> >> The branch has both the constant-precision, very wide integers that we
>> >> want for trees and the variable-precision integers we want for rtl,
>> >> so it's not an "either or".  With the accessor-based implementation,
>> >> there should be very little cost to having both.
>> >
>> > So what I wonder (and where we maybe disagree) is how much code
>> > wants to inspect "intermediate" results.  Say originally you have
>> >
>> > rtx foo (rtx x, rtx y)
>> > {
>> >   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
>> > GEN_INT (1));
>> >   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
>> > y);
>> >   return res;
>> > }
>> >
>> > and with wide-int you want to change that to
>> >
>> > rtx foo (rtx x, rtx y)
>> > {
>> >   wide_int tem = wide_int (x) + 1;
>> >   wide_int res = tem - y;
>> >   return res.to_rtx ();
>> > }
>> >
>> > how much code ever wants to inspect 'tem' or 'res'?
>> > That is, does it matter
>> > if 'tem' and 'res' would have been calculated in "infinite precision"
>> > and only to_rtx () would do the truncation to the desired mode?
>> >
>> > I think not.  The amount of code performing multiple operations on
>> > _constants_ in sequence is extremely low (if it even exists).
>> >
>> > So I'd rather have to_rtx get a mode argument (or a precision) and
>> > perform the required truncation / sign-extension at RTX construction
>> > time (which is an expensive operation anyway).
>> 
>> I agree this is where we disagree.  I don't understand why you think
>> the above is better.  Why do we want to do "infinite precision"
>> addition of two values when only the lowest N bits of those values
>> have a (semantically) defined meaning?  Earlier in the thread it sounded
>> like we both agreed that having undefined bits in the _representation_
>> was bad.  So why do we want to do calculations on parts of values that
>> are undefined in the (rtx) semantics?
>> 
>> E.g. say we're adding two rtx values whose mode just happens to be
>> HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
>> the carry from adding the two HWIs, only to add it to an upper HWI
>> that has no semantically-defined value?  It's garbage in, garbage out.
>
> Not garbage in, and not garbage out (just wasted work).

Well, it's not garbage in the sense of an uninitialised HWI detected
by valgrind (say).  But it's semantic garbage.

> That's the possible downside - the upside is to get rid of the notion
> of a 'precision'.

No, it's still there, just in a different place.

> OTOH they still will be in some ways "undefined" if you consider
>
>   wide_int xw = from_rtx (xr, mode);
>   tree xt = to_tree (xw, type);
>   wide_int xw2 = from_tree (xt);
>
> with an unsigned type xw and xw2 will not be equal (in the
> 'extension' bits) for a value with MSB set.

Do you mean it's undefined as things stand, or when using "infinite
precision" for rtl?  It shouldn't lead to anything undefined at
the moment.  Only the low GET_MODE_BITSIZE (mode) bits of xw are
meaningful, but those are also the only bits that would be used.

> That is, RTL chooses to always sign-extend, tree chooses to extend
> according to sign information.  wide-int chooses to ... ?  (it seems
> the wide-int overall comment lost the part that defined its encoding,
> but it seems that we still sign-extend val[len-1], so (unsigned
> HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is
> { -1 } with len == 1.

Only if the precision is > HOST_BITS_PER_WIDE_INT.  If the precision
is HOST_BITS_PER_WIDE_INT then both are { -1U }.  "len" is never
greater than precision * HOST_BITS_PER_WIDE_INT.

> In RTL both would be encoded with len == 1 (no
> distinction between a signed and unsigned number with all bits set),

Same again: both are -1 if the mode is HOST_BITS_PER_WIDE_INT or smaller.
If the mode is wider then RTL too uses { -1, 0 }.  So the current wide_int
representation matches the RTL representation pretty closely, except for
the part about wide_int leaving excess bits undefined.  But that's just
a convenience, it isn't important in terms of what operators to.

> on the current tree representation the encoding would be with len ==
> 1, too, as we have TYPE_UNSIGNED to tell us the sign.

OK.

> So we still need to somehow "map" between those representations.

Right, that's what the constructors, from_* and to_* routines do.

> Looking at the RTL representation from that wide-int representation
> makes RTL look as if all constants are signed.

Well, except for direct accessors like elt(), the wide-int representation
is shielded from the interface (as it should be).  It doesn't affect the
result of arithmetic.  The representation should have no visible effect
for rtl or tree users who avoid elt().

Thanks,
Richard
Richard Biener Aug. 28, 2013, 12:48 p.m. UTC | #20
On Wed, 28 Aug 2013, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> 
> > On Wed, 28 Aug 2013, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> >> So the precision variable is good for the rtl level in several ways:
> >> >> 
> >> >> - As you say, it avoids adding the explicit truncations that (in practice)
> >> >>   every rtl operation would need
> >> >> 
> >> >> - It's more efficient in that case, since we don't calculate high values
> >> >>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
> >> >>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
> >> >>   the wide-int trappings.
> >> >> 
> >> >> - It's a good way of checking type safety and making sure that excess
> >> >>   bits aren't accidentally given a semantic meaning.  This is the most
> >> >>   important reason IMO.
> >> >> 
> >> >> The branch has both the constant-precision, very wide integers that we
> >> >> want for trees and the variable-precision integers we want for rtl,
> >> >> so it's not an "either or".  With the accessor-based implementation,
> >> >> there should be very little cost to having both.
> >> >
> >> > So what I wonder (and where we maybe disagree) is how much code
> >> > wants to inspect "intermediate" results.  Say originally you have
> >> >
> >> > rtx foo (rtx x, rtx y)
> >> > {
> >> >   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
> >> > GEN_INT (1));
> >> >   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
> >> > y);
> >> >   return res;
> >> > }
> >> >
> >> > and with wide-int you want to change that to
> >> >
> >> > rtx foo (rtx x, rtx y)
> >> > {
> >> >   wide_int tem = wide_int (x) + 1;
> >> >   wide_int res = tem - y;
> >> >   return res.to_rtx ();
> >> > }
> >> >
> >> > how much code ever wants to inspect 'tem' or 'res'?
> >> > That is, does it matter
> >> > if 'tem' and 'res' would have been calculated in "infinite precision"
> >> > and only to_rtx () would do the truncation to the desired mode?
> >> >
> >> > I think not.  The amount of code performing multiple operations on
> >> > _constants_ in sequence is extremely low (if it even exists).
> >> >
> >> > So I'd rather have to_rtx get a mode argument (or a precision) and
> >> > perform the required truncation / sign-extension at RTX construction
> >> > time (which is an expensive operation anyway).
> >> 
> >> I agree this is where we disagree.  I don't understand why you think
> >> the above is better.  Why do we want to do "infinite precision"
> >> addition of two values when only the lowest N bits of those values
> >> have a (semantically) defined meaning?  Earlier in the thread it sounded
> >> like we both agreed that having undefined bits in the _representation_
> >> was bad.  So why do we want to do calculations on parts of values that
> >> are undefined in the (rtx) semantics?
> >> 
> >> E.g. say we're adding two rtx values whose mode just happens to be
> >> HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
> >> the carry from adding the two HWIs, only to add it to an upper HWI
> >> that has no semantically-defined value?  It's garbage in, garbage out.
> >
> > Not garbage in, and not garbage out (just wasted work).
> 
> Well, it's not garbage in the sense of an uninitialised HWI detected
> by valgrind (say).  But it's semantic garbage.
> 
> > That's the possible downside - the upside is to get rid of the notion
> > of a 'precision'.
> 
> No, it's still there, just in a different place.
> 
> > OTOH they still will be in some ways "undefined" if you consider
> >
> >   wide_int xw = from_rtx (xr, mode);
> >   tree xt = to_tree (xw, type);
> >   wide_int xw2 = from_tree (xt);
> >
> > with an unsigned type xw and xw2 will not be equal (in the
> > 'extension' bits) for a value with MSB set.
> 
> Do you mean it's undefined as things stand, or when using "infinite
> precision" for rtl?  It shouldn't lead to anything undefined at
> the moment.  Only the low GET_MODE_BITSIZE (mode) bits of xw are
> meaningful, but those are also the only bits that would be used.
> 
> > That is, RTL chooses to always sign-extend, tree chooses to extend
> > according to sign information.  wide-int chooses to ... ?  (it seems
> > the wide-int overall comment lost the part that defined its encoding,
> > but it seems that we still sign-extend val[len-1], so (unsigned
> > HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is
> > { -1 } with len == 1.
> 
> Only if the precision is > HOST_BITS_PER_WIDE_INT.  If the precision
> is HOST_BITS_PER_WIDE_INT then both are { -1U }.

That wasn't my understanding on how things work.

> "len" is never
> greater than precision * HOST_BITS_PER_WIDE_INT.

"len" can be one larger than precision * HOST_BITS_PER_WIDE_INT as
I originally designed the encoding scheme.  It was supposed to
be able to capture the difference between a positive and a negative
number (unlike the RTL rep).

I see canonize() truncates to blocks_needed.

That said, I'm still missing one of my most important requests:

 - all references to HOST_WIDE_INT (and HOST_BITS_PER_WIDE_INT and
   firends) need to go and be replaced with a private typedef
   and proper constants (apart from in the _hwi interface API of course)
 - wide_int needs to work with the storage not being HOST_WIDE_INT,
   in the end it should be HOST_WIDEST_FAST_INT, but for testing coverage
   it ideally should work for 'signed char' as well (at _least_ it needs
   to work for plain 'int')

> > In RTL both would be encoded with len == 1 (no
> > distinction between a signed and unsigned number with all bits set),
> 
> Same again: both are -1 if the mode is HOST_BITS_PER_WIDE_INT or smaller.
> If the mode is wider then RTL too uses { -1, 0 }.  So the current wide_int
> representation matches the RTL representation pretty closely, except for
> the part about wide_int leaving excess bits undefined.  But that's just
> a convenience, it isn't important in terms of what operators to.
> 
> > on the current tree representation the encoding would be with len ==
> > 1, too, as we have TYPE_UNSIGNED to tell us the sign.
> 
> OK.
> 
> > So we still need to somehow "map" between those representations.
> 
> Right, that's what the constructors, from_* and to_* routines do.

I wonder where the from_tree and to_tree ones are?  Are they
from_double_int / wide_int_to_tree (what's wide_int_to_infinite_tree?)

> > Looking at the RTL representation from that wide-int representation
> > makes RTL look as if all constants are signed.
> 
> Well, except for direct accessors like elt(), the wide-int representation
> is shielded from the interface (as it should be).  It doesn't affect the
> result of arithmetic.  The representation should have no visible effect
> for rtl or tree users who avoid elt().

True.  Though it should be one that allows an efficient implementation.

Richard.
Kenneth Zadeck Aug. 28, 2013, 1:09 p.m. UTC | #21
>>>    Note that the bits above the precision are not defined and the
>>>    algorithms used here are careful not to depend on their value.  In
>>>    particular, values that come in from rtx constants may have random
>>>    bits.
> Which is a red herring.  It should be fixed.  I cannot even believe
> that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
> or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
> anywhere.

Richi,

you asked for the entire patch as a branch so that you could look at the 
whole picture.
It is now time for you to do that.    I understand it is very large and 
it will take some time for you to get your head around the whole 
thing.   But remember this:

The vast majority of the clients are dealing with intermediate code that 
has explicit promotions.    Not only the rtl level, but the majority of 
the tree level takes inputs that have explicit precisions that have been 
matched and wants to see an explicit precision result.   For this code, 
doing the fixed precision thing where you do not ever ask about what is 
behind the curtain is a very good match.

However, there are parts of the compiler, all within the tree or gimple 
level, that do not have this view.   For this, there are two templates 
export an interface that behaves in a manner very similar to what 
double-int does when the precision was smaller than 128 bits.  (we 
discovered a large number of bugs when using double int for timode 
because they did make an infinite precision assumption, but that is 
another story.)  All numbers are converted to signed numbers that are 
extended based on their input type and the math is performed in a large 
enough field so that they never push near the end.   We know what the 
end is because we sniff the port.

At this point we looked at the pass we were converting and we used the 
appropriate implementation that match the style of coding and the 
algorithm. i.e. we made no substantive changes.  As mentioned in my 
earlier mail, i plan to change in the future tree-ssa-ccp to use the 
fixed precision form, but this is a change that is motivated by being 
able to find more constants with the proper representation than what is 
beautiful.   But this is the only case where i think the rep should be 
substantially changed.

I know you find the fixed precision stuff unappealing.   But the truth 
is that you wanted us to make this patch so that it did as little damage 
to the way the compiler worked as possible and given that so much of the 
compiler actually does fixed precision math, this is the path of least 
resistance.

If it is reasonable to change the rtl, we may change that, but the truth 
is that the clients never see this so it is not as much of an issue as 
you are making it.   Now you can see all of the clients, see this for 
yourself.

Kenny
Mike Stump Aug. 28, 2013, 3:53 p.m. UTC | #22
On Aug 28, 2013, at 3:22 AM, Richard Biener <rguenther@suse.de> wrote:
> Btw, rtl.h still wastes space with
> 
> struct GTY((variable_size)) hwivec_def {
>  int num_elem;         /* number of elements */
>  HOST_WIDE_INT elem[1];
> };
> 
> struct GTY((chain_next ("RTX_NEXT (&%h)"),
>            chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def {
> ...
>  /* The first element of the operands of this rtx.
>     The number of operands and their types are controlled
>     by the `code' field, according to rtl.def.  */
>  union u {
>    rtunion fld[1];
>    HOST_WIDE_INT hwint[1];
>    struct block_symbol block_sym;
>    struct real_value rv;
>    struct fixed_value fv;
>    struct hwivec_def hwiv;
>  } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
> };
> 
> there are 32bits available before the union.  If you don't use
> those for num_elem then all wide-ints will at least take as
> much space as DOUBLE_INTs originally took - and large ints
> that would have required DOUBLE_INTs in the past will now
> require more space than before.  Which means your math
> motivating the 'num_elem' encoding stuff is wrong.  With
> moving 'num_elem' before u you can even re-use the hwint
> field in the union as the existing double-int code does
> (which in fact could simply do the encoding trick in the
> old CONST_DOUBLE scheme, similar for the tree INTEGER_CST
> container).

So, HOST_WIDE_INT is likely 64 bits, and likely is 64 bit aligned.  The base (stuff before the union) is 32 bits.  There is a 32 bit gap, even if not used before the HOST_WIDE_INT elem.  We place the num_elem is this gap.  Even if the field were removed, the size would not change, nor the placement of elem.  So, short of packing, a 32-bit HWI host or going with a 32-bit type instead of a HOST_WIDE_INT, I'm not sure I follow you?  I tend to discount 32-bit hosted compilers as a thing of the past.
Kenneth Zadeck Aug. 28, 2013, 10:55 p.m. UTC | #23
>>>    Note that the bits above the precision are not defined and the
>>>    algorithms used here are careful not to depend on their value.  In
>>>    particular, values that come in from rtx constants may have random
>>>    bits.
> Which is a red herring.  It should be fixed.  I cannot even believe
> that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
> or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
> anywhere.
>
I can agree with you that this could be fixed.   But it is not necessary 
to fix it.   The rtl level and most of the tree level has existed for a 
long time by doing math within the precision.

you do not see the masking at the rtl level because the masking is not 
necessary.    if you never look beyond the precision you just do not 
care.    There is the issue with divide and mod and quite frankly the 
code on the trunk scares me to death.   my code at the rtl level makes 
sure that everything is clean since i can see if it is a udiv or a div 
that is enough info to clean the value up.

>> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
>> have random bits.  The upper bits must be a sign extension of the value.
>> There's exactly one valid rtx for each (value, mode) pair.  If you saw
>> something different then that sounds like a bug.  The rules should already
>> be fairly well enforced though, since something like (const_int 128) --
>> or (const_int 256) -- will not match a QImode operand.
> See.  We're saved ;)
this is richard's theory.   There is a block of code at wide-int.cc:175 
that is ifdefed out that checks to see if the rtl consts are 
canonical.   if you bring it in, building the libraries fails very 
quickly.   The million dollar question is this the only bug or is this 
the first bug of a 1000 bugs.   Your comment about not seeing any 
masking cuts both ways.   There is very little code on the way in if you 
create ints with GEN_INT that makes sure someone has done the right 
thing on that side.   So my guess is that there are a lot of failures 
and a large number of them will be in the ports.

If richard is right then there will be changes.  The internals of 
wide-int will be changed so that everything is maintained in canonical 
form rather than just doing the canonization on the outside.   This will 
clean up things like fits_uhwi_p and a lot more of the wide-int internals.

But i think that if you want to change the compiler to use infinite 
precision arithmetic, you really ought to have a better reason that you 
think that it is cleaner.   Because, it buys you nothing for most of the 
compiler and it is slower AND it is a much bigger change to the compiler 
than the one we want to make.


>> This is probably the part of the representation that I disagree most with.
>> There seem to be two main ways we could hande the extension to whole HWIs:
>>
>> (1) leave the stored upper bits undefined and extend them on read
>> (2) keep the stored upper bits in extended form
>>
>> The patch goes for (1) but (2) seems better to me, for a few reasons:
> I agree whole-heartedly.
my statement is above.   if we can do 2 then we should.   But I do not 
think that it is worth several people-years to do this.
>
>> * As above, constants coming from rtl are already in the right form,
>>    so if you create a wide_int from an rtx and only query it, no explicit
>>    extension is needed.
>>
>> * Things like logical operations and right shifts naturally preserve
>>    the sign-extended form, so only a subset of write operations need
>>    to take special measures.
>>
>> * You have a public interface that exposes the underlying HWIs
>>    (which is fine with me FWIW), so it seems better to expose a fully-defined
>>    HWI rather than only a partially-defined HWI.
>>
>> E.g. zero_p is:
>>
>>    HOST_WIDE_INT x;
>>
>>    if (precision && precision < HOST_BITS_PER_WIDE_INT)
>>      x = sext_hwi (val[0], precision);
>>    else if (len == 0)
>>      {
>>        gcc_assert (precision == 0);
>>        return true;
>>      }
>>    else
>>      x = val[0];
>>
>>    return len == 1 && x == 0;
>>
>> but I think it really ought to be just:
>>
>>    return len == 1 && val[0] == 0;
> Yes!
>
> But then - what value does keeping track of a 'precision' have
> in this case?  It seems to me it's only a "convenient carrier"
> for
>
>    wide_int x = wide-int-from-RTX (y);
>    machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
>    ... process x ...
>    RTX = RTX-from-wide_int (x, saved_mode);
>
> that is, wide-int doesn't do anything with 'precision' but you
> can extract it later to not need to remember a mode you were
> interested in?
>
> Oh, and of course some operations require a 'precision', like rotate.
As i have said, when the operands and result are all precision correct, 
as they are generally in both rtl and tree, then the default fixed 
precision interface is really very clean.    I know you object to some 
implementation issues, but this is why we put the branch up, so you can 
see what you get from the other side.


>>>    When the precision is 0, all the bits in the LEN elements of
>>>    VEC are significant with no undefined bits.  Precisionless
>>>    constants are limited to being one or two HOST_WIDE_INTs.  When two
>>>    are used the upper value is 0, and the high order bit of the first
>>>    value is set.  (Note that this may need to be generalized if it is
>>>    ever necessary to support 32bit HWIs again).
>> I didn't understand this.  When are two HOST_WIDE_INTs needed for
>> "precision 0"?
> For the wide_int containing unsigned HOST_WIDE_INT ~0.  As we
> sign-extend the representation (heh, yes, we do or should!) we
> require an extra HWI to store the fact that ~0 is unsigned.
>
>> The main thing that's changed since the early patches is that we now
>> have a mixture of wide-int types.  This seems to have led to a lot of
>> boiler-plate forwarding functions (or at least it felt like that while
>> moving them all out the class).  And that in turn seems to be because
>> you're trying to keep everything as member functions.  E.g. a lot of the
>> forwarders are from a member function to a static function.
>>
>> Wouldn't it be better to have the actual classes be light-weight,
>> with little more than accessors, and do the actual work with non-member
>> template functions?  There seems to be 3 grades of wide-int:
>>
>>    (1) read-only, constant precision  (from int, etc.)
>>    (2) read-write, constant precision  (fixed_wide_int)
>>    (3) read-write, variable precision  (wide_int proper)
>>
>> but we should be able to hide that behind templates, with compiler errors
>> if you try to write to (1), etc.
> Yeah, I'm probably trying to clean up the implementation once I
> got past recovering from two month without GCC ...

> Richard.
Richard Biener Aug. 29, 2013, 7:36 a.m. UTC | #24
On Wed, 28 Aug 2013, Mike Stump wrote:

> On Aug 28, 2013, at 3:22 AM, Richard Biener <rguenther@suse.de> wrote:
> > Btw, rtl.h still wastes space with
> > 
> > struct GTY((variable_size)) hwivec_def {
> >  int num_elem;         /* number of elements */
> >  HOST_WIDE_INT elem[1];
> > };
> > 
> > struct GTY((chain_next ("RTX_NEXT (&%h)"),
> >            chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def {
> > ...
> >  /* The first element of the operands of this rtx.
> >     The number of operands and their types are controlled
> >     by the `code' field, according to rtl.def.  */
> >  union u {
> >    rtunion fld[1];
> >    HOST_WIDE_INT hwint[1];
> >    struct block_symbol block_sym;
> >    struct real_value rv;
> >    struct fixed_value fv;
> >    struct hwivec_def hwiv;
> >  } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
> > };
> > 
> > there are 32bits available before the union.  If you don't use
> > those for num_elem then all wide-ints will at least take as
> > much space as DOUBLE_INTs originally took - and large ints
> > that would have required DOUBLE_INTs in the past will now
> > require more space than before.  Which means your math
> > motivating the 'num_elem' encoding stuff is wrong.  With
> > moving 'num_elem' before u you can even re-use the hwint
> > field in the union as the existing double-int code does
> > (which in fact could simply do the encoding trick in the
> > old CONST_DOUBLE scheme, similar for the tree INTEGER_CST
> > container).
> 
> So, HOST_WIDE_INT is likely 64 bits, and likely is 64 bit aligned.  The 
> base (stuff before the union) is 32 bits.  There is a 32 bit gap, even 
> if not used before the HOST_WIDE_INT elem.  We place the num_elem is 
> this gap.

No, you don't.  You place num_elem 64bit aligned _after_ the gap.
And you have another 32bit gap, as you say, before elem.

> Even if the field were removed, the size would not change, 
> nor the placement of elem.  So, short of packing, a 32-bit HWI host or 
> going with a 32-bit type instead of a HOST_WIDE_INT, I'm not sure I 
> follow you?  I tend to discount 32-bit hosted compilers as a thing of 
> the past.

Me, too.  On 32bit hosts nothing would change as 'u' is 32bit aligned
there (ok, on 32bit hosts putting num_elem before 'u' would actually
increase memory usage - but as you say, 32bit hosted compilers are
a thing of the past ;)).

Richard.
Richard Biener Aug. 29, 2013, 8:42 a.m. UTC | #25
On Wed, 28 Aug 2013, Kenneth Zadeck wrote:

> 
> > > >    Note that the bits above the precision are not defined and the
> > > >    algorithms used here are careful not to depend on their value.  In
> > > >    particular, values that come in from rtx constants may have random
> > > >    bits.
> > Which is a red herring.  It should be fixed.  I cannot even believe
> > that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
> > or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
> > anywhere.
> > 
> I can agree with you that this could be fixed.   But it is not necessary to
> fix it.   The rtl level and most of the tree level has existed for a long time
> by doing math within the precision.
> 
> you do not see the masking at the rtl level because the masking is not
> necessary.    if you never look beyond the precision you just do not care.
> There is the issue with divide and mod and quite frankly the code on the trunk
> scares me to death.   my code at the rtl level makes sure that everything is
> clean since i can see if it is a udiv or a div that is enough info to clean
> the value up.
> 
> > > I have a feeling I'm rehashing a past debate, sorry, but rtx constants
> > > can't
> > > have random bits.  The upper bits must be a sign extension of the value.
> > > There's exactly one valid rtx for each (value, mode) pair.  If you saw
> > > something different then that sounds like a bug.  The rules should already
> > > be fairly well enforced though, since something like (const_int 128) --
> > > or (const_int 256) -- will not match a QImode operand.
> > See.  We're saved ;)
> this is richard's theory.   There is a block of code at wide-int.cc:175 that
> is ifdefed out that checks to see if the rtl consts are canonical.   if you
> bring it in, building the libraries fails very quickly.   The million dollar
> question is this the only bug or is this the first bug of a 1000 bugs.   Your
> comment about not seeing any masking cuts both ways.   There is very little
> code on the way in if you create ints with GEN_INT that makes sure someone has
> done the right thing on that side.   So my guess is that there are a lot of
> failures and a large number of them will be in the ports.

Well, clearly RTL code _expects_ constants to be properly extended.  I
can reproduce the bug and I simply guess that's a matter of mismatching
modes here (or performing an implicit truncation without properly
extending the constant).

    at /space/rguenther/src/svn/wide-int/gcc/combine.c:10086
10086                                                      GEN_INT 
(count));
(gdb) l
10081
10082                 mask_rtx = GEN_INT (nonzero_bits (varop, GET_MODE 
(varop)));
10083
10084                 mask_rtx
10085                   = simplify_const_binary_operation (code, 
result_mode, mask_rtx,
10086                                                      GEN_INT 
(count));

uses of GEN_INT are frowned upon ... for exactly that reason - the
mask_rtx is not a proper RTL constant for SImode.

Btw, all this isn't a reason to not have a well-defined wide-int
representation.  You just have (as you have for trees) to properly
canonize the representation at the time you convert from RTL
constants to wide-int constants.

In the long run we want a uniform representation of constants
so we can do zero-copying - but it looks like we now have
three different representations - the tree one (sign or zero
extended dependent on sign), RTL (garbage as you show) and
wide-int (always sign-extended).

That's why I was looking at at least matching what tree does
(because tree constants _are_ properly canonized).

Richard.
Kenneth Zadeck Aug. 29, 2013, 12:23 p.m. UTC | #26
On 08/29/2013 04:42 AM, Richard Biener wrote:
> On Wed, 28 Aug 2013, Kenneth Zadeck wrote:
>
>>>>>     Note that the bits above the precision are not defined and the
>>>>>     algorithms used here are careful not to depend on their value.  In
>>>>>     particular, values that come in from rtx constants may have random
>>>>>     bits.
>>> Which is a red herring.  It should be fixed.  I cannot even believe
>>> that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
>>> or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
>>> anywhere.
>>>
>> I can agree with you that this could be fixed.   But it is not necessary to
>> fix it.   The rtl level and most of the tree level has existed for a long time
>> by doing math within the precision.
>>
>> you do not see the masking at the rtl level because the masking is not
>> necessary.    if you never look beyond the precision you just do not care.
>> There is the issue with divide and mod and quite frankly the code on the trunk
>> scares me to death.   my code at the rtl level makes sure that everything is
>> clean since i can see if it is a udiv or a div that is enough info to clean
>> the value up.
>>
>>>> I have a feeling I'm rehashing a past debate, sorry, but rtx constants
>>>> can't
>>>> have random bits.  The upper bits must be a sign extension of the value.
>>>> There's exactly one valid rtx for each (value, mode) pair.  If you saw
>>>> something different then that sounds like a bug.  The rules should already
>>>> be fairly well enforced though, since something like (const_int 128) --
>>>> or (const_int 256) -- will not match a QImode operand.
>>> See.  We're saved ;)
>> this is richard's theory.   There is a block of code at wide-int.cc:175 that
>> is ifdefed out that checks to see if the rtl consts are canonical.   if you
>> bring it in, building the libraries fails very quickly.   The million dollar
>> question is this the only bug or is this the first bug of a 1000 bugs.   Your
>> comment about not seeing any masking cuts both ways.   There is very little
>> code on the way in if you create ints with GEN_INT that makes sure someone has
>> done the right thing on that side.   So my guess is that there are a lot of
>> failures and a large number of them will be in the ports.
> Well, clearly RTL code _expects_ constants to be properly extended.  I
> can reproduce the bug and I simply guess that's a matter of mismatching
> modes here (or performing an implicit truncation without properly
> extending the constant).
>
>      at /space/rguenther/src/svn/wide-int/gcc/combine.c:10086
> 10086                                                      GEN_INT
> (count));
> (gdb) l
> 10081
> 10082                 mask_rtx = GEN_INT (nonzero_bits (varop, GET_MODE
> (varop)));
> 10083
> 10084                 mask_rtx
> 10085                   = simplify_const_binary_operation (code,
> result_mode, mask_rtx,
> 10086                                                      GEN_INT
> (count));
>
> uses of GEN_INT are frowned upon ... for exactly that reason - the
> mask_rtx is not a proper RTL constant for SImode.
over time, the GEN_INTs will go away at the portable rtl level  as more 
of the code is transitioned to use wide-int.    The port story is not so 
good.   Any port that uses TI or beyond will likely evolve to using 
wide-int for the math and unifying the cases between CONST_INT and 
CONST_WIDE_INT.  (the wide-int constructor from rtl takes either and the 
constructor to rtl looks at the constant.) But a port like the mips that 
has no TI will likely never change.
>
> Btw, all this isn't a reason to not have a well-defined wide-int
> representation.  You just have (as you have for trees) to properly
> canonize the representation at the time you convert from RTL
> constants to wide-int constants.
The wide-int representation is completely well defined.    This is what 
i cannot see why you do not understand.   You do not need to look above 
the precision so there is nothing undefined!!!!!!   I know this bothers 
you very badly and i will agree that if it is easy to clean up rtl to 
match this, then it would simplify the code somewhat.    But it is not 
undefined or random.   Furthermore, cleaning it will not change the abi 
so if rtl gets cleaner, we can change this.

> In the long run we want a uniform representation of constants
> so we can do zero-copying - but it looks like we now have
> three different representations - the tree one (sign or zero
> extended dependent on sign), RTL (garbage as you show) and
> wide-int (always sign-extended).
You wanted this to look more like double-int, now that it does, you now 
to see the flaws in it.    Do i ever get to win?   I do not think anyone 
would have been served well by trying to make trees have only signed 
constants in the way that double-int did.  I will admit that the 
implementation would be cleaner if we could just change everything else 
in the compiler to match a super clean wide-int design.   I took the 
other approach and bottled as much of the ugliness behind the a api and 
tried to keep the rest of the compiler looking mostly the way that it does.

To a first approximation, ugliness is a conserved force in the universe.

>
> That's why I was looking at at least matching what tree does
> (because tree constants _are_ properly canonized).
well, they are very close.    i am likely working on the "last" bug now 
for mips.
>
> Richard.
diff mbox

Patch

=====================================

The two patches for the truck below are necessary to get identical
code between the wide-int branch and the truck.   The first patch has
been submitted for review and fixes a bug.   The second patch will not
be submitted as it is just for compatibility.   The second patch
slightly changes the hash function that the rtl gcse passes use. Code
is modified based on the traversal of a hash function, so if the hash
functions are not identical, the code is slightly different between
the two branches.


=====================================
diff --git a/gcc/expr.c b/gcc/expr.c
index 923f59b..f5744b0 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4815,7 +4815,8 @@  expand_assignment (tree to, tree from, bool 
nontemporal)
                    bitregion_start, bitregion_end,
                    mode1, from,
                    get_alias_set (to), nontemporal);
-      else if (bitpos >= mode_bitsize / 2)
+      else if (bitpos >= mode_bitsize / 2
+           && bitpos+bitsize <= mode_bitsize)
          result = store_field (XEXP (to_rtx, 1), bitsize,
                    bitpos - mode_bitsize / 2,
                    bitregion_start, bitregion_end,
@@ -4834,8 +4835,12 @@  expand_assignment (tree to, tree from, bool 
nontemporal)
          }
        else
          {
+          HOST_WIDE_INT extra = 0;
+          if (bitpos+bitsize > mode_bitsize)
+        extra = bitpos+bitsize - mode_bitsize;
            rtx temp = assign_stack_temp (GET_MODE (to_rtx),
-                        GET_MODE_SIZE (GET_MODE (to_rtx)));
+                        GET_MODE_SIZE (GET_MODE (to_rtx))
+                        + extra);
            write_complex_part (temp, XEXP (to_rtx, 0), false);
            write_complex_part (temp, XEXP (to_rtx, 1), true);
            result = store_field (temp, bitsize, bitpos,
diff --git a/gcc/rtl.def b/gcc/rtl.def
index b4ce1b9..5ed015c 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -342,6 +342,8 @@  DEF_RTL_EXPR(TRAP_IF, "trap_if", "ee", RTX_EXTRA)
  /* numeric integer constant */
  DEF_RTL_EXPR(CONST_INT, "const_int", "w", RTX_CONST_OBJ)

+DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ)
+
  /* fixed-point constant */
  DEF_RTL_EXPR(CONST_FIXED, "const_fixed", "www", RTX_CONST_OBJ)