Message ID | 520A9DCC.6080609@naturalbridge.com |
---|---|
State | New |
Headers | show |
Thanks for doing this. I haven't had chance to look at the branch yet (hope to soon), but the results on mips64-linux-gnu look pretty good: http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg02033.html The only failures that stand out as being possibly wide-int-related are: FAIL: gcc.dg/fixed-point/int-warning.c (test for warnings, line ...) which isn't something that x86_64-linux-gnu or powerpc64-linux-gnu would test AFAIK. Haven't had chance to look into why it's failing yet, but FWIW it's a compile-only test so should be reproducable with a cc1 cross. I'll run a test with the corresponding trunk version too. Thanks, Richard
Hi Kenny, This is the first time I've looked at the implementation of wide-int.h (rather than just looking at the rtl changes, which as you know I like in general), so FWIW here are some comments on wide-int.h. I expect a lot of them overlap with Richard B.'s comments. I also expect many of them are going to be annoying, sorry, but this first one definitely will. The coding conventions say that functions should be defined outside the class: http://gcc.gnu.org/codingconventions.html and that opening braces should be on their own line, so most of the file needs to be reformatted. I went through and made that change with the patch below, in the process of reading through. I also removed "SGN must be SIGNED or UNSIGNED." because it seemed redundant when those are the only two values available. The patch fixes a few other coding standard problems and typos, but I've not made any actual code changes (or at least, I didn't mean to). Does it look OK to install? I'm still unsure about these "infinite" precision types, but I understand the motivation and I have no objections. However: > * Code that does widening conversions. The canonical way that > this is performed is to sign or zero extend the input value to > the max width based on the sign of the type of the source and > then to truncate that value to the target type. This is in > preference to using the sign of the target type to extend the > value directly (which gets the wrong value for the conversion > of large unsigned numbers to larger signed types). I don't understand this particular reason. Using the sign of the source type is obviously right, but why does that mean we need "infinite" precision, rather than just doubling the precision of the source? > * When a constant that has an integer type is converted to a > wide-int it comes in with precision 0. For these constants the > top bit does accurately reflect the sign of that constant; this > is an exception to the normal rule that the signedness is not > represented. When used in a binary operation, the wide-int > implementation properly extends these constants so that they > properly match the other operand of the computation. This allows > you write: > > tree t = ... > wide_int x = t + 6; > > assuming t is a int_cst. This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" actually wants it to be an unsigned value. Some code uses it to avoid the undefinedness of signed overflow. So these overloads could lead to us accidentally zero-extending what's conceptually a signed value without any obvious indication that that's happening. Also, hex constants are unsigned int, but it doesn't seem safe to assume that 0x80000000 was meant to be zero-extended. I realise the same thing can happen if you mix "unsigned int" with HOST_WIDE_INT, but the point is that you shouldn't really do that in general, whereas we're defining these overloads precisely so that a mixture can be used. I'd prefer some explicit indication of the sign, at least for anything other than plain "int" (so that the compiler will complain about uses of "unsigned int" and above). > Note that the bits above the precision are not defined and the > algorithms used here are careful not to depend on their value. In > particular, values that come in from rtx constants may have random > bits. I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't have random bits. The upper bits must be a sign extension of the value. There's exactly one valid rtx for each (value, mode) pair. If you saw something different then that sounds like a bug. The rules should already be fairly well enforced though, since something like (const_int 128) -- or (const_int 256) -- will not match a QImode operand. This is probably the part of the representation that I disagree most with. There seem to be two main ways we could hande the extension to whole HWIs: (1) leave the stored upper bits undefined and extend them on read (2) keep the stored upper bits in extended form The patch goes for (1) but (2) seems better to me, for a few reasons: * As above, constants coming from rtl are already in the right form, so if you create a wide_int from an rtx and only query it, no explicit extension is needed. * Things like logical operations and right shifts naturally preserve the sign-extended form, so only a subset of write operations need to take special measures. * You have a public interface that exposes the underlying HWIs (which is fine with me FWIW), so it seems better to expose a fully-defined HWI rather than only a partially-defined HWI. E.g. zero_p is: HOST_WIDE_INT x; if (precision && precision < HOST_BITS_PER_WIDE_INT) x = sext_hwi (val[0], precision); else if (len == 0) { gcc_assert (precision == 0); return true; } else x = val[0]; return len == 1 && x == 0; but I think it really ought to be just: return len == 1 && val[0] == 0; > When the precision is 0, all the bits in the LEN elements of > VEC are significant with no undefined bits. Precisionless > constants are limited to being one or two HOST_WIDE_INTs. When two > are used the upper value is 0, and the high order bit of the first > value is set. (Note that this may need to be generalized if it is > ever necessary to support 32bit HWIs again). I didn't understand this. When are two HOST_WIDE_INTs needed for "precision 0"? > #define addr_max_bitsize (64) > #define addr_max_precision \ These should either be lower-case C++ constants or upper-case macros. > /* VAL is set to a size that is capable of computing a full > multiplication on the largest mode that is represented on the > target. Currently there is a part of tree-vrp that requires 2x + > 2 bits of precision where x is the precision of the variables > being optimized. */ > HOST_WIDE_INT val[WIDE_INT_MAX_ELTS]; This comment seems redundant with the one above WIDE_INT_MAX_ELTS and likely to get out of date. > So real hardware only looks at a small part of the shift amount. > On IBM machines, this tends to be 1 more than what is necessary > to encode the shift amount. The rest of the world looks at only > the minimum number of bits. This means that only 3 gate delays > are necessary to set up the shifter. I agree that it makes sense for wide_int to provide a form of shift in which the shift amount is truncated. However, I strongly believe wide-int.h should not test SHIFT_COUNT_TRUNCATED directly. It should be up to the callers to decide when truncation is needed (and to what width). We really need to get rid of the #include "tm.h" in wide-int.h. MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent thing in there. If that comes from tm.h then perhaps we should put it into a new header file instead. > /* Return THIS as a signed HOST_WIDE_INT. If THIS does not fit in > PREC, the information is lost. */ > HOST_WIDE_INT > to_shwi (unsigned int prec = 0) const Just dropping the excess bits seems dangerous. I think we should assert instead, at least when prec is 0. > /* Return true if THIS is negative based on the interpretation of SGN. > For UNSIGNED, this is always false. This is correct even if > precision is 0. */ > inline bool > wide_int::neg_p (signop sgn) const It seems odd that you have to pass SIGNED here. I assume you were doing it so that the caller is forced to confirm signedness in the cases where a tree type is involved, but: * neg_p kind of implies signedness anyway * you don't require this for minus_one_p, so the interface isn't consistent * at the rtl level signedness isn't a property of the "type" (mode), so it seems strange to add an extra hoop there > /* Return true if THIS fits in an unsigned HOST_WIDE_INT with no > loss of precision. */ > inline bool > wide_int_ro::fits_uhwi_p () const > { > return len == 1 || (len == 2 && val[1] == 0); > } This doesn't look right, since len == 1 could mean that you have a gazillion-bit all-ones number. Also, the val[1] == 0 check seems to assume the upper bits are defined when the precision isn't a multiple of the HWI size (although as above I that's a good thing and should be enforced). sign_mask has: > gcc_unreachable (); > #if 0 > return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1); > #endif Maybe remove this? The inline div routines do: > if (overflow) > *overflow = false; but then just pass overflow to divmod_internal. Seems better to initialise *overflow there instead. div_floor has: > return divmod_internal (true, val, len, p1, s, cl, p2, sgn, > &remainder, false, overflow); > > if (quotient.neg_p (sgn) && !remainder.zero_p ()) > return quotient - 1; > return quotient; > } where the last bit is unreachable. > /* Divide DIVISOR into THIS producing the remainder. The result is > the same size as the operands. The sign is specified in SGN. > The output is floor truncated. OVERFLOW is set to true if the > result overflows, false otherwise. */ > template <typename T> > inline wide_int_ro > wide_int_ro::mod_floor (const T &c, signop sgn, bool *overflow = 0) const It's really the quotient that's floor-truncated, not the output (the remainder). I was a bit confused at first why you'd need to truncate an integer remainder. Same for the other functions. > #ifdef DEBUG_WIDE_INT > debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); > #endif I think these are going to bitrot quickly if we #if 0 then out. I think we should either use: if (DEBUG_WIDE_INT) debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); or drop them. The implementations of the general to_shwi1 and to_shwi2 functions look identical. I think the comment should explain why two functions are needed. > /* Negate THIS. */ > inline wide_int_ro > wide_int_ro::operator - () const > { > wide_int_ro r; > r = wide_int_ro (0) - *this; > return r; > } > > /* Negate THIS. */ > inline wide_int_ro > wide_int_ro::neg () const > { > wide_int_ro z = wide_int_ro::from_shwi (0, precision); > > gcc_checking_assert (precision); > return z - *this; > } Why do we need both of these, and why are they implemented slightly differently? > template <int bitsize> > inline bool > fixed_wide_int <bitsize>::multiple_of_p (const wide_int_ro &factor, > signop sgn, > fixed_wide_int *multiple) const > { > return wide_int_ro::multiple_of_p (factor, sgn, > reinterpret_cast <wide_int *> (multiple)); > } The patch has several instances of this kind of reinterpret_cast. It looks like an aliasing violation. The main thing that's changed since the early patches is that we now have a mixture of wide-int types. This seems to have led to a lot of boiler-plate forwarding functions (or at least it felt like that while moving them all out the class). And that in turn seems to be because you're trying to keep everything as member functions. E.g. a lot of the forwarders are from a member function to a static function. Wouldn't it be better to have the actual classes be light-weight, with little more than accessors, and do the actual work with non-member template functions? There seems to be 3 grades of wide-int: (1) read-only, constant precision (from int, etc.) (2) read-write, constant precision (fixed_wide_int) (3) read-write, variable precision (wide_int proper) but we should be able to hide that behind templates, with compiler errors if you try to write to (1), etc. To take one example, the reason we can't simply use things like std::min on wide ints is because signedness needs to be specified explicitly, but there's a good reason why the standard defined std::min (x, y) rather than x.min (y). It seems like we ought to have smin and umin functions alongside std::min, rather than make them member functions. We could put them in a separate namespace if necessary. I might have a go at trying this last part next week, unless Richard is already in the process of rewriting things :-) Thanks, Richard
On 08/23/2013 11:02 AM, Richard Sandiford wrote: > Hi Kenny, > > This is the first time I've looked at the implementation of wide-int.h > (rather than just looking at the rtl changes, which as you know I like > in general), so FWIW here are some comments on wide-int.h. I expect > a lot of them overlap with Richard B.'s comments. > > I also expect many of them are going to be annoying, sorry, but this > first one definitely will. The coding conventions say that functions > should be defined outside the class: > > http://gcc.gnu.org/codingconventions.html > > and that opening braces should be on their own line, so most of the file > needs to be reformatted. I went through and made that change with the > patch below, in the process of reading through. I also removed "SGN > must be SIGNED or UNSIGNED." because it seemed redundant when those are > the only two values available. The patch fixes a few other coding standard > problems and typos, but I've not made any actual code changes (or at least, > I didn't mean to). I had started the file with the functions outside of the class and mike had stated putting them in the class, and so i went with putting them in the class since many of them were one liners and so having them out of line just doubled the size of everything. however, we did not look at the coding conventions and that really settles the argument. Thanks for doing this. > Does it look OK to install? you can check it in. > I'm still unsure about these "infinite" precision types, but I understand > the motivation and I have no objections. However: > >> * Code that does widening conversions. The canonical way that >> this is performed is to sign or zero extend the input value to >> the max width based on the sign of the type of the source and >> then to truncate that value to the target type. This is in >> preference to using the sign of the target type to extend the >> value directly (which gets the wrong value for the conversion >> of large unsigned numbers to larger signed types). > I don't understand this particular reason. Using the sign of the source > type is obviously right, but why does that mean we need "infinite" precision, > rather than just doubling the precision of the source? in a sense, "infinite" does not really mean infinite, it really just means large enough so that you never loose any information from the top. For widening all that you really need to be "infinite" is one more bit larger than the destination type. We could have had an abi where you specified the precision of every operation based on the precisions of the inputs. Instead, for these kinds of operations, we decided to sniff the port and determine a fixed width that was large enough for everything that was needed for the port. We call that number infinite. This sort of follows the convention that double-int was used with were infinite was 128 bits, but with our design/implementation, we (hopefully) have no bugs where the size of the datatypes needed never runs into implementation limits. >> * When a constant that has an integer type is converted to a >> wide-int it comes in with precision 0. For these constants the >> top bit does accurately reflect the sign of that constant; this >> is an exception to the normal rule that the signedness is not >> represented. When used in a binary operation, the wide-int >> implementation properly extends these constants so that they >> properly match the other operand of the computation. This allows >> you write: >> >> tree t = ... >> wide_int x = t + 6; >> >> assuming t is a int_cst. > This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" > actually wants it to be an unsigned value. Some code uses it to avoid > the undefinedness of signed overflow. So these overloads could lead > to us accidentally zero-extending what's conceptually a signed value > without any obvious indication that that's happening. Also, hex constants > are unsigned int, but it doesn't seem safe to assume that 0x80000000 was > meant to be zero-extended. > > I realise the same thing can happen if you mix "unsigned int" with > HOST_WIDE_INT, but the point is that you shouldn't really do that > in general, whereas we're defining these overloads precisely so that > a mixture can be used. > > I'd prefer some explicit indication of the sign, at least for anything > other than plain "int" (so that the compiler will complain about uses > of "unsigned int" and above). There is a part of me that finds this scary and a part of me that feels that the concern is largely theoretical. It does make it much easier to read and understand the code to be able to write "t + 6" rather than "wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some control over how 6 is converted. >> Note that the bits above the precision are not defined and the >> algorithms used here are careful not to depend on their value. In >> particular, values that come in from rtx constants may have random >> bits. > I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't > have random bits. The upper bits must be a sign extension of the value. > There's exactly one valid rtx for each (value, mode) pair. If you saw > something different then that sounds like a bug. The rules should already > be fairly well enforced though, since something like (const_int 128) -- > or (const_int 256) -- will not match a QImode operand. > > This is probably the part of the representation that I disagree most with. > There seem to be two main ways we could hande the extension to whole HWIs: > > (1) leave the stored upper bits undefined and extend them on read > (2) keep the stored upper bits in extended form It is not a matter of opening old wounds. I had run some tests on x86-64 and was never able to assume that the bits above the precision had always been canonized. I will admit that i did not try to run down the bugs because i thought that since the rtx constructors did not have a mode associated with them now was one required to in the constructors, that this was not an easily solvable problem. So i have no idea if i hit the one and only bug or was about to start drinking from a fire hose. But the back ends are full of GEN_INT (a) where a came from god knows where and you almost never see it properly canonized. I think that until GEN_INT takes a mandatory non VOIDmode mode parameter, and that constructor canonizes it, you are doomed chasing this bug forever. My/our experience on the dataflow branch was that unless you go clean things up AND put in a bunch of traps to keep people honest, you are never going to be able to make this assumption. Having said that, we actually do neither of (1) or (2) inside of wide-int. For rtl to wide-int, we leave the upper bits undefined and never allow you to look at them because the constructor has a mode and that mode allows you to draw a line in the sand. There is no constructor for the "infinite" wide ints from rtl so you have no way to look. Doing this allows us to do something that richi really wanted: avoiding copying. We do not get to do as much richi would like and when he comes back from leave, he may have other places where can apply it, but right now if you say w = t + 6 as above, it "borrows" the rep from t to do the add, it does not really build a wide-int. We also do this if t is an rtx const. But if we had to canonize the number, then we could not borrow the rep. > The patch goes for (1) but (2) seems better to me, for a few reasons: > > * As above, constants coming from rtl are already in the right form, > so if you create a wide_int from an rtx and only query it, no explicit > extension is needed. > > * Things like logical operations and right shifts naturally preserve > the sign-extended form, so only a subset of write operations need > to take special measures. right now the internals of wide-int do not keep the bits above the precision clean. as you point out this could be fixed by changing lshift, add, sub, mul, div (and anything else i have forgotten about) and removing the code that cleans this up on exit. I actually do not really care which way we go here but if we do go on keeping the bits clean above the precision inside wide-int, we are going to have to clean the bits in the constructors from rtl, or fix some/a lot of bugs. But if you want to go with the stay clean plan you have to start clean, so at the rtl level this means copying. and it is the not copying trick that pushed me in the direction we went. At the tree level, this is not an issue. There are no constructors for tree-csts that do not have a type and before we copy the rep from the wide-int to the tree, we clean the top bits. (BTW - If i had to guess what the bug is with the missing messages on the mips port, it would be because the front ends HAD a bad habit of creating constants that did not fit into a type and then later checking to see if there were any interesting bits above the precision in the int-cst. This now does not work because we clean out those top bits on construction but it would not surprise me if we missed the fixed point constant path.) So at the tree level, we could easily go either way here, but there is a cost at the rtl level with doing (2). > * You have a public interface that exposes the underlying HWIs > (which is fine with me FWIW), so it seems better to expose a fully-defined > HWI rather than only a partially-defined HWI. > > E.g. zero_p is: > > HOST_WIDE_INT x; > > if (precision && precision < HOST_BITS_PER_WIDE_INT) > x = sext_hwi (val[0], precision); > else if (len == 0) > { > gcc_assert (precision == 0); > return true; > } the above test for len==0 has been removed because it is rot. > else > x = val[0]; > > return len == 1 && x == 0; > > but I think it really ought to be just: > > return len == 1 && val[0] == 0; If we did your 2, it would be this way. >> When the precision is 0, all the bits in the LEN elements of >> VEC are significant with no undefined bits. Precisionless >> constants are limited to being one or two HOST_WIDE_INTs. When two >> are used the upper value is 0, and the high order bit of the first >> value is set. (Note that this may need to be generalized if it is >> ever necessary to support 32bit HWIs again). > I didn't understand this. When are two HOST_WIDE_INTs needed for > "precision 0"? if a large unsigned constant comes in with the top bit set, the canonized value takes 2 hwis, the top hwi being 0. >> #define addr_max_bitsize (64) >> #define addr_max_precision \ > These should either be lower-case C++ constants or upper-case macros. this will be fixed. >> /* VAL is set to a size that is capable of computing a full >> multiplication on the largest mode that is represented on the >> target. Currently there is a part of tree-vrp that requires 2x + >> 2 bits of precision where x is the precision of the variables >> being optimized. */ >> HOST_WIDE_INT val[WIDE_INT_MAX_ELTS]; > This comment seems redundant with the one above WIDE_INT_MAX_ELTS > and likely to get out of date. this will be fixed >> So real hardware only looks at a small part of the shift amount. >> On IBM machines, this tends to be 1 more than what is necessary >> to encode the shift amount. The rest of the world looks at only >> the minimum number of bits. This means that only 3 gate delays >> are necessary to set up the shifter. > I agree that it makes sense for wide_int to provide a form of shift > in which the shift amount is truncated. However, I strongly believe > wide-int.h should not test SHIFT_COUNT_TRUNCATED directly. It should > be up to the callers to decide when truncation is needed (and to what width). richi does not like this either so i will get rid of it. > > We really need to get rid of the #include "tm.h" in wide-int.h. > MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent > thing in there. If that comes from tm.h then perhaps we should put it > into a new header file instead. I will talk to mike about fixing this. >> /* Return THIS as a signed HOST_WIDE_INT. If THIS does not fit in >> PREC, the information is lost. */ >> HOST_WIDE_INT >> to_shwi (unsigned int prec = 0) const > Just dropping the excess bits seems dangerous. I think we should assert > instead, at least when prec is 0. there are times when this is useful. there is a lot of code that just wants to look at the bottom bits to do some alignment stuff. I guess that code could just grab the bottom elt of the array, but has not generally been how these this has been done. >> /* Return true if THIS is negative based on the interpretation of SGN. >> For UNSIGNED, this is always false. This is correct even if >> precision is 0. */ >> inline bool >> wide_int::neg_p (signop sgn) const > It seems odd that you have to pass SIGNED here. I assume you were doing > it so that the caller is forced to confirm signedness in the cases where > a tree type is involved, but: > > * neg_p kind of implies signedness anyway > * you don't require this for minus_one_p, so the interface isn't consistent > * at the rtl level signedness isn't a property of the "type" (mode), > so it seems strange to add an extra hoop there it was done this way so that you can pass in TYPE_SIGN (t) in as the second parameter. We could default the parameter to SIGNED and that would solve both cases. I will look into minus_one_p. >> /* Return true if THIS fits in an unsigned HOST_WIDE_INT with no >> loss of precision. */ >> inline bool >> wide_int_ro::fits_uhwi_p () const >> { >> return len == 1 || (len == 2 && val[1] == 0); >> } > This doesn't look right, since len == 1 could mean that you have a > gazillion-bit all-ones number. Also, the val[1] == 0 check seems > to assume the upper bits are defined when the precision isn't a multiple > of the HWI size (although as above I that's a good thing and should be > enforced). you are correct. > sign_mask has: >> gcc_unreachable (); >> #if 0 >> return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1); >> #endif > Maybe remove this? > > The inline div routines do: i will work on this this weekend. tree vrp has not been our friend and sometimes does not like to compile this function. >> if (overflow) >> *overflow = false; > but then just pass overflow to divmod_internal. Seems better to initialise > *overflow there instead. > > div_floor has: > >> return divmod_internal (true, val, len, p1, s, cl, p2, sgn, >> &remainder, false, overflow); >> >> if (quotient.neg_p (sgn) && !remainder.zero_p ()) >> return quotient - 1; >> return quotient; >> } > where the last bit is unreachable. not to mention that the compiler never complained. > >> /* Divide DIVISOR into THIS producing the remainder. The result is >> the same size as the operands. The sign is specified in SGN. >> The output is floor truncated. OVERFLOW is set to true if the >> result overflows, false otherwise. */ >> template <typename T> >> inline wide_int_ro >> wide_int_ro::mod_floor (const T &c, signop sgn, bool *overflow = 0) const > It's really the quotient that's floor-truncated, not the output > (the remainder). I was a bit confused at first why you'd need to > truncate an integer remainder. Same for the other functions. The comments needs work, not the code. You do have to adjust the remainder in some cases, but it is not truncation. >> #ifdef DEBUG_WIDE_INT >> debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); >> #endif > I think these are going to bitrot quickly if we #if 0 then out. > I think we should either use: > > if (DEBUG_WIDE_INT) > debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); > > or drop them. my plan is to leave these in while the branch is still being developed and then get rid of them before it is merged. My guess is that i am going to need them still when i try the 32bit hwi test. > The implementations of the general to_shwi1 and to_shwi2 functions look > identical. I think the comment should explain why two functions are needed. I will check this >> /* Negate THIS. */ >> inline wide_int_ro >> wide_int_ro::operator - () const >> { >> wide_int_ro r; >> r = wide_int_ro (0) - *this; >> return r; >> } >> >> /* Negate THIS. */ >> inline wide_int_ro >> wide_int_ro::neg () const >> { >> wide_int_ro z = wide_int_ro::from_shwi (0, precision); >> >> gcc_checking_assert (precision); >> return z - *this; >> } > Why do we need both of these, and why are they implemented slightly > differently? neg should go away. >> template <int bitsize> >> inline bool >> fixed_wide_int <bitsize>::multiple_of_p (const wide_int_ro &factor, >> signop sgn, >> fixed_wide_int *multiple) const >> { >> return wide_int_ro::multiple_of_p (factor, sgn, >> reinterpret_cast <wide_int *> (multiple)); >> } > The patch has several instances of this kind of reinterpret_cast. > It looks like an aliasing violation. > > The main thing that's changed since the early patches is that we now > have a mixture of wide-int types. This seems to have led to a lot of > boiler-plate forwarding functions (or at least it felt like that while > moving them all out the class). And that in turn seems to be because > you're trying to keep everything as member functions. E.g. a lot of the > forwarders are from a member function to a static function. > > Wouldn't it be better to have the actual classes be light-weight, > with little more than accessors, and do the actual work with non-member > template functions? There seems to be 3 grades of wide-int: > > (1) read-only, constant precision (from int, etc.) > (2) read-write, constant precision (fixed_wide_int) > (3) read-write, variable precision (wide_int proper) > > but we should be able to hide that behind templates, with compiler errors > if you try to write to (1), etc. > > To take one example, the reason we can't simply use things like > std::min on wide ints is because signedness needs to be specified > explicitly, but there's a good reason why the standard defined > std::min (x, y) rather than x.min (y). It seems like we ought > to have smin and umin functions alongside std::min, rather than > make them member functions. We could put them in a separate namespace > if necessary. > > I might have a go at trying this last part next week, unless Richard is > already in the process of rewriting things :-) mike will answer this. > Thanks, > Richard > >
On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: >> * When a constant that has an integer type is converted to a >> wide-int it comes in with precision 0. For these constants the >> top bit does accurately reflect the sign of that constant; this >> is an exception to the normal rule that the signedness is not >> represented. When used in a binary operation, the wide-int >> implementation properly extends these constants so that they >> properly match the other operand of the computation. This allows >> you write: >> >> tree t = ... >> wide_int x = t + 6; >> >> assuming t is a int_cst. > > This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" > actually wants it to be an unsigned value. Some code uses it to avoid > the undefinedness of signed overflow. So these overloads could lead > to us accidentally zero-extending what's conceptually a signed value > without any obvious indication that that's happening. Also, hex constants > are unsigned int, but it doesn't seem safe to assume that 0x80000000 was > meant to be zero-extended. > > I realise the same thing can happen if you mix "unsigned int" with > HOST_WIDE_INT, but the point is that you shouldn't really do that > in general, whereas we're defining these overloads precisely so that > a mixture can be used. So, I don't like penalizing all users, because one user might write incorrect code. We have the simple operators so that users can retain some of the simplicity and beauty that is the underlying language. Those semantics are well known and reasonable. We reasonably match those semantics. At the end of the day, we have to be able to trust what the user writes. Now, a user that doesn't trust themselves, can elect to not use these functions; they aren't required to use them.
Mike Stump <mikestump@comcast.net> writes: > On Aug 23, 2013, at 8:02 AM, Richard Sandiford > <rdsandiford@googlemail.com> wrote: >>> * When a constant that has an integer type is converted to a >>> wide-int it comes in with precision 0. For these constants the >>> top bit does accurately reflect the sign of that constant; this >>> is an exception to the normal rule that the signedness is not >>> represented. When used in a binary operation, the wide-int >>> implementation properly extends these constants so that they >>> properly match the other operand of the computation. This allows >>> you write: >>> >>> tree t = ... >>> wide_int x = t + 6; >>> >>> assuming t is a int_cst. >> >> This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" >> actually wants it to be an unsigned value. Some code uses it to avoid >> the undefinedness of signed overflow. So these overloads could lead >> to us accidentally zero-extending what's conceptually a signed value >> without any obvious indication that that's happening. Also, hex constants >> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was >> meant to be zero-extended. >> >> I realise the same thing can happen if you mix "unsigned int" with >> HOST_WIDE_INT, but the point is that you shouldn't really do that >> in general, whereas we're defining these overloads precisely so that >> a mixture can be used. > > So, I don't like penalizing all users, because one user might write > incorrect code. We have the simple operators so that users can retain > some of the simplicity and beauty that is the underlying language. > Those semantics are well known and reasonable. We reasonably match > those semantics. At the end of the day, we have to be able to trust > what the user writes. Now, a user that doesn't trust themselves, can > elect to not use these functions; they aren't required to use them. But the point of my "unsigned HOST_WIDE_INT" example is that the semantics of the language can also get in the way. If you're adding or multiplying two smallish constants, you should generally use "unsigned HOST_WIDE_INT" in order to avoid the undefinedness of signed overflow. In that kind of case the "unsigned" in no way implies that you want the value to be extended to the wider types that we're adding with this patch. I'm worried about making that assumption just to provide a bit of syntactic sugar, especially when any wrong-code bug would be highly value-dependent. When this kind of thing is done at the rtl level, the original constant (in a CONST_INT) is also HOST_WIDE_INT-sized, so the kind of code that I'm talking about did not do any promotion before the wide-int conversion. That changes if we generalise one of the values to a wide_int. It isn't just a case of being confident when writing new code. It's a case of us changing a large body of existing code in one go, some of which uses unsigned values for the reason above. Like I say, all this objection is for "unsigned int and above". If we provide the syntactic sugar for plain "int" and use templates to force a compiler error for "higher" types then that would still allow things like "x + 2" but force a bit of analysis for "x + y" in cases where "x" and "y" are not the same type and one is wide_int. Thanks, Richard
Kenneth Zadeck <zadeck@naturalbridge.com> writes: >>> * Code that does widening conversions. The canonical way that >>> this is performed is to sign or zero extend the input value to >>> the max width based on the sign of the type of the source and >>> then to truncate that value to the target type. This is in >>> preference to using the sign of the target type to extend the >>> value directly (which gets the wrong value for the conversion >>> of large unsigned numbers to larger signed types). > > >> I don't understand this particular reason. Using the sign of the source >> type is obviously right, but why does that mean we need "infinite" precision, >> rather than just doubling the precision of the source? > in a sense, "infinite" does not really mean infinite, it really just > means large enough so that you never loose any information from the > top. For widening all that you really need to be "infinite" is one > more bit larger than the destination type. I'm still being clueless, sorry, but why does the intermediate int need to be wider than the destination type? If you're going from an N1-bit value to an N2>N1-bit value, I don't see why you ever need more than N2 bits. Maybe I'm misunderstanding what the comment means by "widening conversions". >> This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" >> actually wants it to be an unsigned value. Some code uses it to avoid >> the undefinedness of signed overflow. So these overloads could lead >> to us accidentally zero-extending what's conceptually a signed value >> without any obvious indication that that's happening. Also, hex constants >> are unsigned int, but it doesn't seem safe to assume that 0x80000000 was >> meant to be zero-extended. >> >> I realise the same thing can happen if you mix "unsigned int" with >> HOST_WIDE_INT, but the point is that you shouldn't really do that >> in general, whereas we're defining these overloads precisely so that >> a mixture can be used. >> >> I'd prefer some explicit indication of the sign, at least for anything >> other than plain "int" (so that the compiler will complain about uses >> of "unsigned int" and above). > > There is a part of me that finds this scary and a part of me that feels > that the concern is largely theoretical. It does make it much easier > to read and understand the code to be able to write "t + 6" rather than > "wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some > control over how 6 is converted. I replied in more detail to Mike's comment, but the reason I suggested only allowing this for plain "int" (and using templates to forbid "unsigned int" and wider types) was that you could still write "t + 6". You just couldn't write "t + 0x80000000" or "t + x", where "x" is a HOST_WIDE_INT or similar. Code can store values in "HOST_WIDE_INT" or "unsigned HOST_WIDE_INT" if we know at GCC compile time that HOST_WIDE_INT is big enough. But code doesn't necessarily store values in a "HOST_WIDE_INT" because we know at GCC compile time that the value is signed, or in a "unsigned HOST_WIDE_INT" because we know at GCC compile time that the value is unsigned. The signedness often depends on the input. The language still forces us to pick one or the other though. I don't feel comfortable with having syntactic sugar that reads too much into the choice. >>> Note that the bits above the precision are not defined and the >>> algorithms used here are careful not to depend on their value. In >>> particular, values that come in from rtx constants may have random >>> bits. >> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't >> have random bits. The upper bits must be a sign extension of the value. >> There's exactly one valid rtx for each (value, mode) pair. If you saw >> something different then that sounds like a bug. The rules should already >> be fairly well enforced though, since something like (const_int 128) -- >> or (const_int 256) -- will not match a QImode operand. >> >> This is probably the part of the representation that I disagree most with. >> There seem to be two main ways we could hande the extension to whole HWIs: >> >> (1) leave the stored upper bits undefined and extend them on read >> (2) keep the stored upper bits in extended form > It is not a matter of opening old wounds. I had run some tests on > x86-64 and was never able to assume that the bits above the precision > had always been canonized. I will admit that i did not try to run down > the bugs because i thought that since the rtx constructors did not have > a mode associated with them now was one required to in the constructors, > that this was not an easily solvable problem. So i have no idea if i > hit the one and only bug or was about to start drinking from a fire > hose. Hopefully the first one. :-) Would you mind going back and trying again, so that we at least have some idea what kind of bugs they were? > But the back ends are full of GEN_INT (a) where a came from god > knows where and you almost never see it properly canonized. I think > that until GEN_INT takes a mandatory non VOIDmode mode parameter, and > that constructor canonizes it, you are doomed chasing this bug > forever. Well, sure, the bug crops up now and then (I fixed another instance a week or so ago). But generally this bug shows up as an ICE -- usually an unrecognisable instruction ICE -- precisely because this is something that the code is already quite picky about. Most of the GEN_INTs in the backend will be correct. They're in a position to make asumptions about instructions and target modes. > Having said that, we actually do neither of (1) or (2) inside of > wide-int. For rtl to wide-int, we leave the upper bits undefined and > never allow you to look at them because the constructor has a mode and > that mode allows you to draw a line in the sand. There is no > constructor for the "infinite" wide ints from rtl so you have no way to > look. Sorry, I don't understand the distinction you're making between this and (1). The precision is taken from the mode, and you flesh out the upper bits on read if you care. > Doing this allows us to do something that richi really wanted: avoiding > copying. We do not get to do as much richi would like and when he > comes back from leave, he may have other places where can apply it, but > right now if you say w = t + 6 as above, it "borrows" the rep from t to > do the add, it does not really build a wide-int. We also do this if t is > an rtx const. But if we had to canonize the number, then we could not > borrow the rep. But the point is that the number is already canonical for rtx, so you should do nothing. I.e. (2) will be doing less work. If we have other input types besides rtx that don't define the upper bits, then under (2) their wide_int accessor functions (to_shwi[12] in the current scheme) should do the extension. >> The patch goes for (1) but (2) seems better to me, for a few reasons: >> >> * As above, constants coming from rtl are already in the right form, >> so if you create a wide_int from an rtx and only query it, no explicit >> extension is needed. >> >> * Things like logical operations and right shifts naturally preserve >> the sign-extended form, so only a subset of write operations need >> to take special measures. > right now the internals of wide-int do not keep the bits above the > precision clean. as you point out this could be fixed by changing > lshift, add, sub, mul, div (and anything else i have forgotten about) > and removing the code that cleans this up on exit. I actually do not > really care which way we go here but if we do go on keeping the bits > clean above the precision inside wide-int, we are going to have to clean > the bits in the constructors from rtl, or fix some/a lot of bugs. > > But if you want to go with the stay clean plan you have to start clean, > so at the rtl level this means copying. and it is the not copying trick > that pushed me in the direction we went. > > At the tree level, this is not an issue. There are no constructors for > tree-csts that do not have a type and before we copy the rep from the > wide-int to the tree, we clean the top bits. (BTW - If i had to guess > what the bug is with the missing messages on the mips port, it would be > because the front ends HAD a bad habit of creating constants that did > not fit into a type and then later checking to see if there were any > interesting bits above the precision in the int-cst. This now does not > work because we clean out those top bits on construction but it would > not surprise me if we missed the fixed point constant path.) So at the > tree level, we could easily go either way here, but there is a cost at > the rtl level with doing (2). TBH, I think we should do (2) and fix whatever bugs you saw with invalid rtx constants. >>> When the precision is 0, all the bits in the LEN elements of >>> VEC are significant with no undefined bits. Precisionless >>> constants are limited to being one or two HOST_WIDE_INTs. When two >>> are used the upper value is 0, and the high order bit of the first >>> value is set. (Note that this may need to be generalized if it is >>> ever necessary to support 32bit HWIs again). >> I didn't understand this. When are two HOST_WIDE_INTs needed for >> "precision 0"? > if a large unsigned constant comes in with the top bit set, the > canonized value takes 2 hwis, the top hwi being 0. Ah, yeah. But then that sounds like another reason to only allow "int" to have zero precision. >>> /* Return THIS as a signed HOST_WIDE_INT. If THIS does not fit in >>> PREC, the information is lost. */ >>> HOST_WIDE_INT >>> to_shwi (unsigned int prec = 0) const >> Just dropping the excess bits seems dangerous. I think we should assert >> instead, at least when prec is 0. > there are times when this is useful. there is a lot of code that just > wants to look at the bottom bits to do some alignment stuff. I guess > that code could just grab the bottom elt of the array, but has not > generally been how these this has been done. Yeah, using elt to mean "get the low HOST_WIDE_INT" sounds better to me FWIW. "to_shwi" sounds like a type conversion whereas "elt" is more obviously accessing only part of the value. >>> gcc_unreachable (); >>> #if 0 >>> return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1); >>> #endif >> Maybe remove this? >> >> The inline div routines do: > i will work on this this weekend. tree vrp has not been our friend and > sometimes does not like to compile this function. OK. If it's just temporary then never mind, but a comment might help. I remember you mentioned tree-vrp in a reply to one of Richard's reviews, so this is probably something you've been asked before. And will probably be asked again if there's no comment :-) >> I think these are going to bitrot quickly if we #if 0 then out. >> I think we should either use: >> >> if (DEBUG_WIDE_INT) >> debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); >> >> or drop them. > my plan is to leave these in while the branch is still being developed > and then get rid of them before it is merged. My guess is that i am > going to need them still when i try the 32bit hwi test. Ah, OK, sounds good. One thing I hit yesterday while trying out the "lightweight classes" idea from the end of the mail is that, as things stand, most operations end up with machine instructions to do: if (*p1 == 0) *p1 = *p2; if (*p2 == 0) *p2 = *p1; E.g. for wide_int + wide_int, wide_int + tree and wide_int + rtx, where the compiler has no way of statically proving that the precisions are nonzero. Considering all the template goo we're adding to avoid a copy (even though that copy is going to be 24 bytes in the worst case on most targets, and could be easily SRAed), this seems like a relatively high overhead for syntactic sugar, especially on hosts that need branches rather than conditional moves. Either all this is premature optimisation (including avoiding the copying) or we should probably do something to avoid these checks too. I wonder how easy it would be to restrict this use of "zero precision" (i.e. flexible precision) to those where primitive types like "int" are used as template arguments to operators, and require a precision when constructing a wide_int. I wouldn't have expected "real" precision 0 (from zero-width bitfields or whatever) to need any special handling compared to precision 1 or 2. Thanks, Richard
On 08/13/2013 10:57 PM, Kenneth Zadeck wrote: > 1) The 4 files that hold the wide-int code itself. You have seen a > lot of this code before except for the infinite precision > templates. Also the classes are more C++ than C in their flavor. > In particular, the integration with trees is very tight in that an > int-cst or regular integers can be the operands of any wide-int > operation. Are any of these conversions lossy? Maybe some of these constructors should be made explicit?
On 08/24/2013 02:16 PM, Florian Weimer wrote: > On 08/13/2013 10:57 PM, Kenneth Zadeck wrote: >> 1) The 4 files that hold the wide-int code itself. You have seen a >> lot of this code before except for the infinite precision >> templates. Also the classes are more C++ than C in their flavor. >> In particular, the integration with trees is very tight in that an >> int-cst or regular integers can be the operands of any wide-int >> operation. > > Are any of these conversions lossy? Maybe some of these constructors > should be made explicit? > It depends, there is nothing wrong with lossy conversions as long as you know what you are doing.
On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: > We really need to get rid of the #include "tm.h" in wide-int.h. > MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent > thing in there. If that comes from tm.h then perhaps we should put it > into a new header file instead. BITS_PER_UNIT comes from there as well, and I'd need both. Grabbing the #defines we generate is easy enough, but BITS_PER_UNIT would be more annoying. No port in the tree makes use of it yet (other than 8). So, do we just assume BITS_PER_UNIT is 8?
Mike Stump <mikestump@comcast.net> writes: > On Aug 23, 2013, at 8:02 AM, Richard Sandiford > <rdsandiford@googlemail.com> wrote: >> We really need to get rid of the #include "tm.h" in wide-int.h. >> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent >> thing in there. If that comes from tm.h then perhaps we should put it >> into a new header file instead. > > BITS_PER_UNIT comes from there as well, and I'd need both. Grabbing the > #defines we generate is easy enough, but BITS_PER_UNIT would be more > annoying. No port in the tree makes use of it yet (other than 8). So, > do we just assume BITS_PER_UNIT is 8? Looks like wide-int is just using BITS_PER_UNIT to get the number of bits in "char". That's a host thing, so it should be CHAR_BIT instead. Thanks, Richard
On Aug 25, 2013, at 11:29 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: > Mike Stump <mikestump@comcast.net> writes: >> On Aug 23, 2013, at 8:02 AM, Richard Sandiford >> <rdsandiford@googlemail.com> wrote: >>> We really need to get rid of the #include "tm.h" in wide-int.h. >>> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent >>> thing in there. If that comes from tm.h then perhaps we should put it >>> into a new header file instead. >> >> BITS_PER_UNIT comes from there as well, and I'd need both. Grabbing the >> #defines we generate is easy enough, but BITS_PER_UNIT would be more >> annoying. No port in the tree makes use of it yet (other than 8). So, >> do we just assume BITS_PER_UNIT is 8? > > Looks like wide-int is just using BITS_PER_UNIT to get the number of > bits in "char". That's a host thing, so it should be CHAR_BIT instead. ? What? No. BITS_PER_UNIT is a feature of the target machine, so, it is absolutely wrong to use a property of the host machine or the build machine. We don't use sizeof(int) to set the size of int on the target for the example same reason.
On Sun, 25 Aug 2013, Mike Stump wrote: > On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: > > We really need to get rid of the #include "tm.h" in wide-int.h. > > MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent > > thing in there. If that comes from tm.h then perhaps we should put it > > into a new header file instead. > > BITS_PER_UNIT comes from there as well, and I'd need both. Grabbing the > #defines we generate is easy enough, but BITS_PER_UNIT would be more > annoying. No port in the tree makes use of it yet (other than 8). So, > do we just assume BITS_PER_UNIT is 8? Regarding avoiding tm.h dependence through BITS_PER_UNIT (without actually converting it from a target macro to a target hook), see my suggestions at <http://gcc.gnu.org/ml/gcc-patches/2010-11/msg02617.html>. It would seem fairly reasonable, if in future other macros are converted to hooks and it's possible to build multiple back ends into a single compiler binary, to require that all such back ends share a value of BITS_PER_UNIT. BITS_PER_UNIT describes the number of bits in QImode - the RTL-level byte. I don't think wide-int should care about that at all. As I've previously noted, many front-end uses of BITS_PER_UNIT really care about the C-level char and so should be TYPE_PRECISION (char_type_node). Generally, before thinking about how to get BITS_PER_UNIT somewhere, consider whether the code is actually correct to be using BITS_PER_UNIT at all - whether it's the RTL-level QImode that is really what's relevant to the code.
On Aug 25, 2013, at 1:11 PM, "Joseph S. Myers" <joseph@codesourcery.com> wrote: > On Sun, 25 Aug 2013, Mike Stump wrote: >> On Aug 23, 2013, at 8:02 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: >>> We really need to get rid of the #include "tm.h" in wide-int.h. >>> MAX_BITSIZE_MODE_ANY_INT should be the only partially-target-dependent >>> thing in there. If that comes from tm.h then perhaps we should put it >>> into a new header file instead. >> >> BITS_PER_UNIT comes from there as well, and I'd need both. Grabbing the >> #defines we generate is easy enough, but BITS_PER_UNIT would be more >> annoying. No port in the tree makes use of it yet (other than 8). So, >> do we just assume BITS_PER_UNIT is 8? > > Regarding avoiding tm.h dependence through BITS_PER_UNIT (without actually > converting it from a target macro to a target hook), see my suggestions at > <http://gcc.gnu.org/ml/gcc-patches/2010-11/msg02617.html>. It would seem > fairly reasonable, if in future other macros are converted to hooks and > it's possible to build multiple back ends into a single compiler binary, > to require that all such back ends share a value of BITS_PER_UNIT. Ick. I don't see the beauty of this direction. If one wants to move in a, we can generate code for any target, fine, let's do that. If someone wants to make a target, just a dynamically loaded shared library, let's do that. There are all sorts of directions to move it, but the intermediate, let's design and implement a system, where for some targets it works, and for some targets it doesn't work, well, I think that is short sighted and we ought not target that. Having a separate tm-blabla.h for some of the selections of the target is fine, but mandating that as the form for doing a target is, well, bad. I'd love to see the entire hook and target selection mechanism brought up to 1990's level of sophistication, the 1970s are over. Sigh. Anyway, all this is well beyond the scope of the work at hand. > As I've previously > noted, many front-end uses of BITS_PER_UNIT really care about the C-level > char and so should be TYPE_PRECISION (char_type_node). Generally, before > thinking about how to get BITS_PER_UNIT somewhere, consider whether the > code is actually correct to be using BITS_PER_UNIT at all - whether it's > the RTL-level QImode that is really what's relevant to the code. I think we got all the uses correct. Let us know if any are wrong.
On Fri, 23 Aug 2013, Richard Sandiford wrote: > Hi Kenny, > > This is the first time I've looked at the implementation of wide-int.h > (rather than just looking at the rtl changes, which as you know I like > in general), so FWIW here are some comments on wide-int.h. I expect > a lot of them overlap with Richard B.'s comments. > > I also expect many of them are going to be annoying, sorry, but this > first one definitely will. The coding conventions say that functions > should be defined outside the class: > > http://gcc.gnu.org/codingconventions.html > > and that opening braces should be on their own line, so most of the file > needs to be reformatted. I went through and made that change with the > patch below, in the process of reading through. I also removed "SGN > must be SIGNED or UNSIGNED." because it seemed redundant when those are > the only two values available. The patch fixes a few other coding standard > problems and typos, but I've not made any actual code changes (or at least, > I didn't mean to). > > Does it look OK to install? > > I'm still unsure about these "infinite" precision types, but I understand > the motivation and I have no objections. However: > > > * Code that does widening conversions. The canonical way that > > this is performed is to sign or zero extend the input value to > > the max width based on the sign of the type of the source and > > then to truncate that value to the target type. This is in > > preference to using the sign of the target type to extend the > > value directly (which gets the wrong value for the conversion > > of large unsigned numbers to larger signed types). > > I don't understand this particular reason. Using the sign of the source > type is obviously right, but why does that mean we need "infinite" precision, > rather than just doubling the precision of the source? The comment indeed looks redundant - of course it is not correct to extend the value "directly". > > * When a constant that has an integer type is converted to a > > wide-int it comes in with precision 0. For these constants the > > top bit does accurately reflect the sign of that constant; this > > is an exception to the normal rule that the signedness is not > > represented. When used in a binary operation, the wide-int > > implementation properly extends these constants so that they > > properly match the other operand of the computation. This allows > > you write: > > > > tree t = ... > > wide_int x = t + 6; > > > > assuming t is a int_cst. > > This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" > actually wants it to be an unsigned value. Some code uses it to avoid > the undefinedness of signed overflow. So these overloads could lead > to us accidentally zero-extending what's conceptually a signed value > without any obvious indication that that's happening. Also, hex constants > are unsigned int, but it doesn't seem safe to assume that 0x80000000 was > meant to be zero-extended. > > I realise the same thing can happen if you mix "unsigned int" with > HOST_WIDE_INT, but the point is that you shouldn't really do that > in general, whereas we're defining these overloads precisely so that > a mixture can be used. > > I'd prefer some explicit indication of the sign, at least for anything > other than plain "int" (so that the compiler will complain about uses > of "unsigned int" and above). I prefer the automatic promotion - it is exactly what regular C types do. Now, consider wide_int x = ... construct 5 with precision 16 ... wide_int y = x + 6; now, '6' is 'int' (precision 32), but with wide-int we treat it as precision '0' ('infinite'). For x + 6 we the _truncate_ its precision to that of 'x'(?) not exactly matching C behavior (where we'd promote 'x' to 'int', perform the add and then truncate to the precision of 'y' - which for wide-int gets its precision from the result of x + 6). Mimicing C would support dropping those 'require equal precision' asserts but also would require us to properly track a sign to be able to promote properly (or as I argued all the time always properly sign-extend values so we effectively have infinite precision anyway). The fits_uhwi_p implementation changes scares me off that "upper bits are undefined" thing a lot again... (I hate introducing 'undefinedness' into the compiler by 'design') > > Note that the bits above the precision are not defined and the > > algorithms used here are careful not to depend on their value. In > > particular, values that come in from rtx constants may have random > > bits. Which is a red herring. It should be fixed. I cannot even believe that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits anywhere. > I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't > have random bits. The upper bits must be a sign extension of the value. > There's exactly one valid rtx for each (value, mode) pair. If you saw > something different then that sounds like a bug. The rules should already > be fairly well enforced though, since something like (const_int 128) -- > or (const_int 256) -- will not match a QImode operand. See. We're saved ;) > This is probably the part of the representation that I disagree most with. > There seem to be two main ways we could hande the extension to whole HWIs: > > (1) leave the stored upper bits undefined and extend them on read > (2) keep the stored upper bits in extended form > > The patch goes for (1) but (2) seems better to me, for a few reasons: I agree whole-heartedly. > * As above, constants coming from rtl are already in the right form, > so if you create a wide_int from an rtx and only query it, no explicit > extension is needed. > > * Things like logical operations and right shifts naturally preserve > the sign-extended form, so only a subset of write operations need > to take special measures. > > * You have a public interface that exposes the underlying HWIs > (which is fine with me FWIW), so it seems better to expose a fully-defined > HWI rather than only a partially-defined HWI. > > E.g. zero_p is: > > HOST_WIDE_INT x; > > if (precision && precision < HOST_BITS_PER_WIDE_INT) > x = sext_hwi (val[0], precision); > else if (len == 0) > { > gcc_assert (precision == 0); > return true; > } > else > x = val[0]; > > return len == 1 && x == 0; > > but I think it really ought to be just: > > return len == 1 && val[0] == 0; Yes! But then - what value does keeping track of a 'precision' have in this case? It seems to me it's only a "convenient carrier" for wide_int x = wide-int-from-RTX (y); machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode; ... process x ... RTX = RTX-from-wide_int (x, saved_mode); that is, wide-int doesn't do anything with 'precision' but you can extract it later to not need to remember a mode you were interested in? Oh, and of course some operations require a 'precision', like rotate. > > When the precision is 0, all the bits in the LEN elements of > > VEC are significant with no undefined bits. Precisionless > > constants are limited to being one or two HOST_WIDE_INTs. When two > > are used the upper value is 0, and the high order bit of the first > > value is set. (Note that this may need to be generalized if it is > > ever necessary to support 32bit HWIs again). > > I didn't understand this. When are two HOST_WIDE_INTs needed for > "precision 0"? For the wide_int containing unsigned HOST_WIDE_INT ~0. As we sign-extend the representation (heh, yes, we do or should!) we require an extra HWI to store the fact that ~0 is unsigned. > The main thing that's changed since the early patches is that we now > have a mixture of wide-int types. This seems to have led to a lot of > boiler-plate forwarding functions (or at least it felt like that while > moving them all out the class). And that in turn seems to be because > you're trying to keep everything as member functions. E.g. a lot of the > forwarders are from a member function to a static function. > > Wouldn't it be better to have the actual classes be light-weight, > with little more than accessors, and do the actual work with non-member > template functions? There seems to be 3 grades of wide-int: > > (1) read-only, constant precision (from int, etc.) > (2) read-write, constant precision (fixed_wide_int) > (3) read-write, variable precision (wide_int proper) > > but we should be able to hide that behind templates, with compiler errors > if you try to write to (1), etc. Yeah, I'm probably trying to clean up the implementation once I got past recovering from two month without GCC ... Richard.
Richard Biener <rguenther@suse.de> writes: >> * As above, constants coming from rtl are already in the right form, >> so if you create a wide_int from an rtx and only query it, no explicit >> extension is needed. >> >> * Things like logical operations and right shifts naturally preserve >> the sign-extended form, so only a subset of write operations need >> to take special measures. >> >> * You have a public interface that exposes the underlying HWIs >> (which is fine with me FWIW), so it seems better to expose a fully-defined >> HWI rather than only a partially-defined HWI. >> >> E.g. zero_p is: >> >> HOST_WIDE_INT x; >> >> if (precision && precision < HOST_BITS_PER_WIDE_INT) >> x = sext_hwi (val[0], precision); >> else if (len == 0) >> { >> gcc_assert (precision == 0); >> return true; >> } >> else >> x = val[0]; >> >> return len == 1 && x == 0; >> >> but I think it really ought to be just: >> >> return len == 1 && val[0] == 0; > > Yes! > > But then - what value does keeping track of a 'precision' have > in this case? It seems to me it's only a "convenient carrier" > for > > wide_int x = wide-int-from-RTX (y); > machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode; > ... process x ... > RTX = RTX-from-wide_int (x, saved_mode); > > that is, wide-int doesn't do anything with 'precision' but you > can extract it later to not need to remember a mode you were > interested in? I can see why you like the constant-precision, very wide integers for trees, where the constants have an inherent sign. But (and I think this might be old ground too :-)), that isn't the case with rtl. At the tree level, using constant-precision, very wide integers allows you to add a 32-bit signed INTEGER_CST to a 16-unsigned INTEGER_CST. And that has an obvious meaning, both as a 32-bit result or as a wider result, depending on how you choose to use it. But in rtl there is no meaning to adding an SImode and an HImode value together, since we don't know how to extend the HImode value beyond its precision. You must explicitly sign- or zero-extend the value first. (The fact that we choose to sign-extend rtl constants when storing them in HWIs is just a representation detail, to avoid having undefined bits in the HWIs. It doesn't mean that rtx values themselves are signed. We could have used a zero-extending representation instead without changing the semantics.) So the precision variable is good for the rtl level in several ways: - As you say, it avoids adding the explicit truncations that (in practice) every rtl operation would need - It's more efficient in that case, since we don't calculate high values and then discard them immediately. The common GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all the wide-int trappings. - It's a good way of checking type safety and making sure that excess bits aren't accidentally given a semantic meaning. This is the most important reason IMO. The branch has both the constant-precision, very wide integers that we want for trees and the variable-precision integers we want for rtl, so it's not an "either or". With the accessor-based implementation, there should be very little cost to having both. >> The main thing that's changed since the early patches is that we now >> have a mixture of wide-int types. This seems to have led to a lot of >> boiler-plate forwarding functions (or at least it felt like that while >> moving them all out the class). And that in turn seems to be because >> you're trying to keep everything as member functions. E.g. a lot of the >> forwarders are from a member function to a static function. >> >> Wouldn't it be better to have the actual classes be light-weight, >> with little more than accessors, and do the actual work with non-member >> template functions? There seems to be 3 grades of wide-int: >> >> (1) read-only, constant precision (from int, etc.) >> (2) read-write, constant precision (fixed_wide_int) >> (3) read-write, variable precision (wide_int proper) >> >> but we should be able to hide that behind templates, with compiler errors >> if you try to write to (1), etc. > > Yeah, I'm probably trying to clean up the implementation once I > got past recovering from two month without GCC ... FWIW, I've been plugging away at a version that uses accessors. I hope to have it vaguely presentable by the middle of next week, in case your recovery takes that long... Thanks, Richard
On Wed, 28 Aug 2013, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > >> * As above, constants coming from rtl are already in the right form, > >> so if you create a wide_int from an rtx and only query it, no explicit > >> extension is needed. > >> > >> * Things like logical operations and right shifts naturally preserve > >> the sign-extended form, so only a subset of write operations need > >> to take special measures. > >> > >> * You have a public interface that exposes the underlying HWIs > >> (which is fine with me FWIW), so it seems better to expose a fully-defined > >> HWI rather than only a partially-defined HWI. > >> > >> E.g. zero_p is: > >> > >> HOST_WIDE_INT x; > >> > >> if (precision && precision < HOST_BITS_PER_WIDE_INT) > >> x = sext_hwi (val[0], precision); > >> else if (len == 0) > >> { > >> gcc_assert (precision == 0); > >> return true; > >> } > >> else > >> x = val[0]; > >> > >> return len == 1 && x == 0; > >> > >> but I think it really ought to be just: > >> > >> return len == 1 && val[0] == 0; > > > > Yes! > > > > But then - what value does keeping track of a 'precision' have > > in this case? It seems to me it's only a "convenient carrier" > > for > > > > wide_int x = wide-int-from-RTX (y); > > machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode; > > ... process x ... > > RTX = RTX-from-wide_int (x, saved_mode); > > > > that is, wide-int doesn't do anything with 'precision' but you > > can extract it later to not need to remember a mode you were > > interested in? > > I can see why you like the constant-precision, very wide integers for trees, > where the constants have an inherent sign. But (and I think this might be > old ground too :-)), that isn't the case with rtl. At the tree level, > using constant-precision, very wide integers allows you to add a 32-bit > signed INTEGER_CST to a 16-unsigned INTEGER_CST. And that has an > obvious meaning, both as a 32-bit result or as a wider result, depending > on how you choose to use it. But in rtl there is no meaning to adding > an SImode and an HImode value together, since we don't know how to > extend the HImode value beyond its precision. You must explicitly sign- > or zero-extend the value first. (The fact that we choose to sign-extend > rtl constants when storing them in HWIs is just a representation detail, > to avoid having undefined bits in the HWIs. It doesn't mean that rtx > values themselves are signed. We could have used a zero-extending > representation instead without changing the semantics.) Yeah, that was my understanding. > So the precision variable is good for the rtl level in several ways: > > - As you say, it avoids adding the explicit truncations that (in practice) > every rtl operation would need > > - It's more efficient in that case, since we don't calculate high values > and then discard them immediately. The common GET_MODE_PRECISION (mode) > <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all > the wide-int trappings. > > - It's a good way of checking type safety and making sure that excess > bits aren't accidentally given a semantic meaning. This is the most > important reason IMO. > > The branch has both the constant-precision, very wide integers that we > want for trees and the variable-precision integers we want for rtl, > so it's not an "either or". With the accessor-based implementation, > there should be very little cost to having both. So what I wonder (and where we maybe disagree) is how much code wants to inspect "intermediate" results. Say originally you have rtx foo (rtx x, rtx y) { rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, GEN_INT (1)); rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, y); return res; } and with wide-int you want to change that to rtx foo (rtx x, rtx y) { wide_int tem = wide_int (x) + 1; wide_int res = tem - y; return res.to_rtx (); } how much code ever wants to inspect 'tem' or 'res'? That is, does it matter if 'tem' and 'res' would have been calculated in "infinite precision" and only to_rtx () would do the truncation to the desired mode? I think not. The amount of code performing multiple operations on _constants_ in sequence is extremely low (if it even exists). So I'd rather have to_rtx get a mode argument (or a precision) and perform the required truncation / sign-extension at RTX construction time (which is an expensive operation anyway). So where does this "break"? It only breaks where previous code broke (or where previous code had measures to not break that we don't carry over to the wide-int case). Obvious case is unsigned division on the sign-extended rep of RTL constants for example. > >> The main thing that's changed since the early patches is that we now > >> have a mixture of wide-int types. This seems to have led to a lot of > >> boiler-plate forwarding functions (or at least it felt like that while > >> moving them all out the class). And that in turn seems to be because > >> you're trying to keep everything as member functions. E.g. a lot of the > >> forwarders are from a member function to a static function. > >> > >> Wouldn't it be better to have the actual classes be light-weight, > >> with little more than accessors, and do the actual work with non-member > >> template functions? There seems to be 3 grades of wide-int: > >> > >> (1) read-only, constant precision (from int, etc.) > >> (2) read-write, constant precision (fixed_wide_int) > >> (3) read-write, variable precision (wide_int proper) > >> > >> but we should be able to hide that behind templates, with compiler errors > >> if you try to write to (1), etc. > > > > Yeah, I'm probably trying to clean up the implementation once I > > got past recovering from two month without GCC ... > > FWIW, I've been plugging away at a version that uses accessors. > I hope to have it vaguely presentable by the middle of next week, > in case your recovery takes that long... Depends on my priorities ;) Btw, rtl.h still wastes space with struct GTY((variable_size)) hwivec_def { int num_elem; /* number of elements */ HOST_WIDE_INT elem[1]; }; struct GTY((chain_next ("RTX_NEXT (&%h)"), chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def { ... /* The first element of the operands of this rtx. The number of operands and their types are controlled by the `code' field, according to rtl.def. */ union u { rtunion fld[1]; HOST_WIDE_INT hwint[1]; struct block_symbol block_sym; struct real_value rv; struct fixed_value fv; struct hwivec_def hwiv; } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u; }; there are 32bits available before the union. If you don't use those for num_elem then all wide-ints will at least take as much space as DOUBLE_INTs originally took - and large ints that would have required DOUBLE_INTs in the past will now require more space than before. Which means your math motivating the 'num_elem' encoding stuff is wrong. With moving 'num_elem' before u you can even re-use the hwint field in the union as the existing double-int code does (which in fact could simply do the encoding trick in the old CONST_DOUBLE scheme, similar for the tree INTEGER_CST container). Richard.
Richard Biener <rguenther@suse.de> writes: >> So the precision variable is good for the rtl level in several ways: >> >> - As you say, it avoids adding the explicit truncations that (in practice) >> every rtl operation would need >> >> - It's more efficient in that case, since we don't calculate high values >> and then discard them immediately. The common GET_MODE_PRECISION (mode) >> <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all >> the wide-int trappings. >> >> - It's a good way of checking type safety and making sure that excess >> bits aren't accidentally given a semantic meaning. This is the most >> important reason IMO. >> >> The branch has both the constant-precision, very wide integers that we >> want for trees and the variable-precision integers we want for rtl, >> so it's not an "either or". With the accessor-based implementation, >> there should be very little cost to having both. > > So what I wonder (and where we maybe disagree) is how much code > wants to inspect "intermediate" results. Say originally you have > > rtx foo (rtx x, rtx y) > { > rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, > GEN_INT (1)); > rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, > y); > return res; > } > > and with wide-int you want to change that to > > rtx foo (rtx x, rtx y) > { > wide_int tem = wide_int (x) + 1; > wide_int res = tem - y; > return res.to_rtx (); > } > > how much code ever wants to inspect 'tem' or 'res'? > That is, does it matter > if 'tem' and 'res' would have been calculated in "infinite precision" > and only to_rtx () would do the truncation to the desired mode? > > I think not. The amount of code performing multiple operations on > _constants_ in sequence is extremely low (if it even exists). > > So I'd rather have to_rtx get a mode argument (or a precision) and > perform the required truncation / sign-extension at RTX construction > time (which is an expensive operation anyway). I agree this is where we disagree. I don't understand why you think the above is better. Why do we want to do "infinite precision" addition of two values when only the lowest N bits of those values have a (semantically) defined meaning? Earlier in the thread it sounded like we both agreed that having undefined bits in the _representation_ was bad. So why do we want to do calculations on parts of values that are undefined in the (rtx) semantics? E.g. say we're adding two rtx values whose mode just happens to be HOST_BITS_PER_WIDE_INT in size. Why does it make sense to calculate the carry from adding the two HWIs, only to add it to an upper HWI that has no semantically-defined value? It's garbage in, garbage out. Providing this for rtl doesn't affect the tree-level operations in any way. Although things like addition require both arguments to have the same precision after promotion, that's trivially true for trees, since (a) the inputs used there -- C integers and tree constants -- can be promoted and (b) tree-level code uses fixed_wide_int <N>, where every fixed_wide_int <N> has the same precision. And you can still do "infinite-precision" arithmetic on rtx constants if you want. You just have to say how you want the constant to be extended (sign or zero), so that the value of all bits is meaningful. Thanks, Richard
On Wed, 28 Aug 2013, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > >> So the precision variable is good for the rtl level in several ways: > >> > >> - As you say, it avoids adding the explicit truncations that (in practice) > >> every rtl operation would need > >> > >> - It's more efficient in that case, since we don't calculate high values > >> and then discard them immediately. The common GET_MODE_PRECISION (mode) > >> <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all > >> the wide-int trappings. > >> > >> - It's a good way of checking type safety and making sure that excess > >> bits aren't accidentally given a semantic meaning. This is the most > >> important reason IMO. > >> > >> The branch has both the constant-precision, very wide integers that we > >> want for trees and the variable-precision integers we want for rtl, > >> so it's not an "either or". With the accessor-based implementation, > >> there should be very little cost to having both. > > > > So what I wonder (and where we maybe disagree) is how much code > > wants to inspect "intermediate" results. Say originally you have > > > > rtx foo (rtx x, rtx y) > > { > > rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, > > GEN_INT (1)); > > rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, > > y); > > return res; > > } > > > > and with wide-int you want to change that to > > > > rtx foo (rtx x, rtx y) > > { > > wide_int tem = wide_int (x) + 1; > > wide_int res = tem - y; > > return res.to_rtx (); > > } > > > > how much code ever wants to inspect 'tem' or 'res'? > > That is, does it matter > > if 'tem' and 'res' would have been calculated in "infinite precision" > > and only to_rtx () would do the truncation to the desired mode? > > > > I think not. The amount of code performing multiple operations on > > _constants_ in sequence is extremely low (if it even exists). > > > > So I'd rather have to_rtx get a mode argument (or a precision) and > > perform the required truncation / sign-extension at RTX construction > > time (which is an expensive operation anyway). > > I agree this is where we disagree. I don't understand why you think > the above is better. Why do we want to do "infinite precision" > addition of two values when only the lowest N bits of those values > have a (semantically) defined meaning? Earlier in the thread it sounded > like we both agreed that having undefined bits in the _representation_ > was bad. So why do we want to do calculations on parts of values that > are undefined in the (rtx) semantics? > > E.g. say we're adding two rtx values whose mode just happens to be > HOST_BITS_PER_WIDE_INT in size. Why does it make sense to calculate > the carry from adding the two HWIs, only to add it to an upper HWI > that has no semantically-defined value? It's garbage in, garbage out. Not garbage in, and not garbage out (just wasted work). That's the possible downside - the upside is to get rid of the notion of a 'precision'. But yes, it's good that we agree on the fact that undefined bits in the _representation_ are wrong. OTOH they still will be in some ways "undefined" if you consider wide_int xw = from_rtx (xr, mode); tree xt = to_tree (xw, type); wide_int xw2 = from_tree (xt); with an unsigned type xw and xw2 will not be equal (in the 'extension' bits) for a value with MSB set. That is, RTL chooses to always sign-extend, tree chooses to extend according to sign information. wide-int chooses to ... ? (it seems the wide-int overall comment lost the part that defined its encoding, but it seems that we still sign-extend val[len-1], so (unsigned HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is { -1 } with len == 1. In RTL both would be encoded with len == 1 (no distinction between a signed and unsigned number with all bits set), on the current tree representation the encoding would be with len == 1, too, as we have TYPE_UNSIGNED to tell us the sign. So we still need to somehow "map" between those representations. Coming from the tree side (as opposed to from the RTL side) I'd have argued you need a 'signed' flag ;) We side-stepped that by doing the extension trick in a way that preserves sign information. Looking at the RTL representation from that wide-int representation makes RTL look as if all constants are signed. That's fine if code that wants to do unsigned stuff properly extends. So - do we need a from_rtx_unsigned () constructor that does this? I'm worried about all the extensions done in operations like add (): if (p1 <= HOST_BITS_PER_WIDE_INT) { result.len = 1; result.precision = p1; result.val[0] = val[0] + s[0]; if (p1 < HOST_BITS_PER_WIDE_INT) result.val[0] = sext_hwi (result.val[0], p1); if (sgn == SIGNED) { HOST_WIDE_INT x = (((result.val[0] ^ val[0]) & (result.val[0] ^ s[0])) >> (p1 - 1)) & 1; *overflow = (x != 0); } else *overflow = ((unsigned HOST_WIDE_INT) result.val[0] < (unsigned HOST_WIDE_INT) val[0]); } that's supposed to be a cheap operation ... and as far as I can see it even computes wrong outcome :/ Given precision 32 and the unsigned value 0xfffffffe, thus { 0x00000000fffffffe }, and adding 0 it will produce { 0xfffffffffffffffe }, the signed value -2. So that if (p1 < HOST_BITS_PER_WIDE_INT) result.val[0] = sext_hwi (result.val[0], p1); is clearly wrong. Yes, it may be needed when constructing a RTX with that value. Richard.
Richard Biener <rguenther@suse.de> writes: > On Wed, 28 Aug 2013, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> >> So the precision variable is good for the rtl level in several ways: >> >> >> >> - As you say, it avoids adding the explicit truncations that (in practice) >> >> every rtl operation would need >> >> >> >> - It's more efficient in that case, since we don't calculate high values >> >> and then discard them immediately. The common GET_MODE_PRECISION (mode) >> >> <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all >> >> the wide-int trappings. >> >> >> >> - It's a good way of checking type safety and making sure that excess >> >> bits aren't accidentally given a semantic meaning. This is the most >> >> important reason IMO. >> >> >> >> The branch has both the constant-precision, very wide integers that we >> >> want for trees and the variable-precision integers we want for rtl, >> >> so it's not an "either or". With the accessor-based implementation, >> >> there should be very little cost to having both. >> > >> > So what I wonder (and where we maybe disagree) is how much code >> > wants to inspect "intermediate" results. Say originally you have >> > >> > rtx foo (rtx x, rtx y) >> > { >> > rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, >> > GEN_INT (1)); >> > rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, >> > y); >> > return res; >> > } >> > >> > and with wide-int you want to change that to >> > >> > rtx foo (rtx x, rtx y) >> > { >> > wide_int tem = wide_int (x) + 1; >> > wide_int res = tem - y; >> > return res.to_rtx (); >> > } >> > >> > how much code ever wants to inspect 'tem' or 'res'? >> > That is, does it matter >> > if 'tem' and 'res' would have been calculated in "infinite precision" >> > and only to_rtx () would do the truncation to the desired mode? >> > >> > I think not. The amount of code performing multiple operations on >> > _constants_ in sequence is extremely low (if it even exists). >> > >> > So I'd rather have to_rtx get a mode argument (or a precision) and >> > perform the required truncation / sign-extension at RTX construction >> > time (which is an expensive operation anyway). >> >> I agree this is where we disagree. I don't understand why you think >> the above is better. Why do we want to do "infinite precision" >> addition of two values when only the lowest N bits of those values >> have a (semantically) defined meaning? Earlier in the thread it sounded >> like we both agreed that having undefined bits in the _representation_ >> was bad. So why do we want to do calculations on parts of values that >> are undefined in the (rtx) semantics? >> >> E.g. say we're adding two rtx values whose mode just happens to be >> HOST_BITS_PER_WIDE_INT in size. Why does it make sense to calculate >> the carry from adding the two HWIs, only to add it to an upper HWI >> that has no semantically-defined value? It's garbage in, garbage out. > > Not garbage in, and not garbage out (just wasted work). Well, it's not garbage in the sense of an uninitialised HWI detected by valgrind (say). But it's semantic garbage. > That's the possible downside - the upside is to get rid of the notion > of a 'precision'. No, it's still there, just in a different place. > OTOH they still will be in some ways "undefined" if you consider > > wide_int xw = from_rtx (xr, mode); > tree xt = to_tree (xw, type); > wide_int xw2 = from_tree (xt); > > with an unsigned type xw and xw2 will not be equal (in the > 'extension' bits) for a value with MSB set. Do you mean it's undefined as things stand, or when using "infinite precision" for rtl? It shouldn't lead to anything undefined at the moment. Only the low GET_MODE_BITSIZE (mode) bits of xw are meaningful, but those are also the only bits that would be used. > That is, RTL chooses to always sign-extend, tree chooses to extend > according to sign information. wide-int chooses to ... ? (it seems > the wide-int overall comment lost the part that defined its encoding, > but it seems that we still sign-extend val[len-1], so (unsigned > HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is > { -1 } with len == 1. Only if the precision is > HOST_BITS_PER_WIDE_INT. If the precision is HOST_BITS_PER_WIDE_INT then both are { -1U }. "len" is never greater than precision * HOST_BITS_PER_WIDE_INT. > In RTL both would be encoded with len == 1 (no > distinction between a signed and unsigned number with all bits set), Same again: both are -1 if the mode is HOST_BITS_PER_WIDE_INT or smaller. If the mode is wider then RTL too uses { -1, 0 }. So the current wide_int representation matches the RTL representation pretty closely, except for the part about wide_int leaving excess bits undefined. But that's just a convenience, it isn't important in terms of what operators to. > on the current tree representation the encoding would be with len == > 1, too, as we have TYPE_UNSIGNED to tell us the sign. OK. > So we still need to somehow "map" between those representations. Right, that's what the constructors, from_* and to_* routines do. > Looking at the RTL representation from that wide-int representation > makes RTL look as if all constants are signed. Well, except for direct accessors like elt(), the wide-int representation is shielded from the interface (as it should be). It doesn't affect the result of arithmetic. The representation should have no visible effect for rtl or tree users who avoid elt(). Thanks, Richard
On Wed, 28 Aug 2013, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > > On Wed, 28 Aug 2013, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> >> So the precision variable is good for the rtl level in several ways: > >> >> > >> >> - As you say, it avoids adding the explicit truncations that (in practice) > >> >> every rtl operation would need > >> >> > >> >> - It's more efficient in that case, since we don't calculate high values > >> >> and then discard them immediately. The common GET_MODE_PRECISION (mode) > >> >> <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all > >> >> the wide-int trappings. > >> >> > >> >> - It's a good way of checking type safety and making sure that excess > >> >> bits aren't accidentally given a semantic meaning. This is the most > >> >> important reason IMO. > >> >> > >> >> The branch has both the constant-precision, very wide integers that we > >> >> want for trees and the variable-precision integers we want for rtl, > >> >> so it's not an "either or". With the accessor-based implementation, > >> >> there should be very little cost to having both. > >> > > >> > So what I wonder (and where we maybe disagree) is how much code > >> > wants to inspect "intermediate" results. Say originally you have > >> > > >> > rtx foo (rtx x, rtx y) > >> > { > >> > rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, > >> > GEN_INT (1)); > >> > rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, > >> > y); > >> > return res; > >> > } > >> > > >> > and with wide-int you want to change that to > >> > > >> > rtx foo (rtx x, rtx y) > >> > { > >> > wide_int tem = wide_int (x) + 1; > >> > wide_int res = tem - y; > >> > return res.to_rtx (); > >> > } > >> > > >> > how much code ever wants to inspect 'tem' or 'res'? > >> > That is, does it matter > >> > if 'tem' and 'res' would have been calculated in "infinite precision" > >> > and only to_rtx () would do the truncation to the desired mode? > >> > > >> > I think not. The amount of code performing multiple operations on > >> > _constants_ in sequence is extremely low (if it even exists). > >> > > >> > So I'd rather have to_rtx get a mode argument (or a precision) and > >> > perform the required truncation / sign-extension at RTX construction > >> > time (which is an expensive operation anyway). > >> > >> I agree this is where we disagree. I don't understand why you think > >> the above is better. Why do we want to do "infinite precision" > >> addition of two values when only the lowest N bits of those values > >> have a (semantically) defined meaning? Earlier in the thread it sounded > >> like we both agreed that having undefined bits in the _representation_ > >> was bad. So why do we want to do calculations on parts of values that > >> are undefined in the (rtx) semantics? > >> > >> E.g. say we're adding two rtx values whose mode just happens to be > >> HOST_BITS_PER_WIDE_INT in size. Why does it make sense to calculate > >> the carry from adding the two HWIs, only to add it to an upper HWI > >> that has no semantically-defined value? It's garbage in, garbage out. > > > > Not garbage in, and not garbage out (just wasted work). > > Well, it's not garbage in the sense of an uninitialised HWI detected > by valgrind (say). But it's semantic garbage. > > > That's the possible downside - the upside is to get rid of the notion > > of a 'precision'. > > No, it's still there, just in a different place. > > > OTOH they still will be in some ways "undefined" if you consider > > > > wide_int xw = from_rtx (xr, mode); > > tree xt = to_tree (xw, type); > > wide_int xw2 = from_tree (xt); > > > > with an unsigned type xw and xw2 will not be equal (in the > > 'extension' bits) for a value with MSB set. > > Do you mean it's undefined as things stand, or when using "infinite > precision" for rtl? It shouldn't lead to anything undefined at > the moment. Only the low GET_MODE_BITSIZE (mode) bits of xw are > meaningful, but those are also the only bits that would be used. > > > That is, RTL chooses to always sign-extend, tree chooses to extend > > according to sign information. wide-int chooses to ... ? (it seems > > the wide-int overall comment lost the part that defined its encoding, > > but it seems that we still sign-extend val[len-1], so (unsigned > > HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is > > { -1 } with len == 1. > > Only if the precision is > HOST_BITS_PER_WIDE_INT. If the precision > is HOST_BITS_PER_WIDE_INT then both are { -1U }. That wasn't my understanding on how things work. > "len" is never > greater than precision * HOST_BITS_PER_WIDE_INT. "len" can be one larger than precision * HOST_BITS_PER_WIDE_INT as I originally designed the encoding scheme. It was supposed to be able to capture the difference between a positive and a negative number (unlike the RTL rep). I see canonize() truncates to blocks_needed. That said, I'm still missing one of my most important requests: - all references to HOST_WIDE_INT (and HOST_BITS_PER_WIDE_INT and firends) need to go and be replaced with a private typedef and proper constants (apart from in the _hwi interface API of course) - wide_int needs to work with the storage not being HOST_WIDE_INT, in the end it should be HOST_WIDEST_FAST_INT, but for testing coverage it ideally should work for 'signed char' as well (at _least_ it needs to work for plain 'int') > > In RTL both would be encoded with len == 1 (no > > distinction between a signed and unsigned number with all bits set), > > Same again: both are -1 if the mode is HOST_BITS_PER_WIDE_INT or smaller. > If the mode is wider then RTL too uses { -1, 0 }. So the current wide_int > representation matches the RTL representation pretty closely, except for > the part about wide_int leaving excess bits undefined. But that's just > a convenience, it isn't important in terms of what operators to. > > > on the current tree representation the encoding would be with len == > > 1, too, as we have TYPE_UNSIGNED to tell us the sign. > > OK. > > > So we still need to somehow "map" between those representations. > > Right, that's what the constructors, from_* and to_* routines do. I wonder where the from_tree and to_tree ones are? Are they from_double_int / wide_int_to_tree (what's wide_int_to_infinite_tree?) > > Looking at the RTL representation from that wide-int representation > > makes RTL look as if all constants are signed. > > Well, except for direct accessors like elt(), the wide-int representation > is shielded from the interface (as it should be). It doesn't affect the > result of arithmetic. The representation should have no visible effect > for rtl or tree users who avoid elt(). True. Though it should be one that allows an efficient implementation. Richard.
>>> Note that the bits above the precision are not defined and the >>> algorithms used here are careful not to depend on their value. In >>> particular, values that come in from rtx constants may have random >>> bits. > Which is a red herring. It should be fixed. I cannot even believe > that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH > or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits > anywhere. Richi, you asked for the entire patch as a branch so that you could look at the whole picture. It is now time for you to do that. I understand it is very large and it will take some time for you to get your head around the whole thing. But remember this: The vast majority of the clients are dealing with intermediate code that has explicit promotions. Not only the rtl level, but the majority of the tree level takes inputs that have explicit precisions that have been matched and wants to see an explicit precision result. For this code, doing the fixed precision thing where you do not ever ask about what is behind the curtain is a very good match. However, there are parts of the compiler, all within the tree or gimple level, that do not have this view. For this, there are two templates export an interface that behaves in a manner very similar to what double-int does when the precision was smaller than 128 bits. (we discovered a large number of bugs when using double int for timode because they did make an infinite precision assumption, but that is another story.) All numbers are converted to signed numbers that are extended based on their input type and the math is performed in a large enough field so that they never push near the end. We know what the end is because we sniff the port. At this point we looked at the pass we were converting and we used the appropriate implementation that match the style of coding and the algorithm. i.e. we made no substantive changes. As mentioned in my earlier mail, i plan to change in the future tree-ssa-ccp to use the fixed precision form, but this is a change that is motivated by being able to find more constants with the proper representation than what is beautiful. But this is the only case where i think the rep should be substantially changed. I know you find the fixed precision stuff unappealing. But the truth is that you wanted us to make this patch so that it did as little damage to the way the compiler worked as possible and given that so much of the compiler actually does fixed precision math, this is the path of least resistance. If it is reasonable to change the rtl, we may change that, but the truth is that the clients never see this so it is not as much of an issue as you are making it. Now you can see all of the clients, see this for yourself. Kenny
On Aug 28, 2013, at 3:22 AM, Richard Biener <rguenther@suse.de> wrote: > Btw, rtl.h still wastes space with > > struct GTY((variable_size)) hwivec_def { > int num_elem; /* number of elements */ > HOST_WIDE_INT elem[1]; > }; > > struct GTY((chain_next ("RTX_NEXT (&%h)"), > chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def { > ... > /* The first element of the operands of this rtx. > The number of operands and their types are controlled > by the `code' field, according to rtl.def. */ > union u { > rtunion fld[1]; > HOST_WIDE_INT hwint[1]; > struct block_symbol block_sym; > struct real_value rv; > struct fixed_value fv; > struct hwivec_def hwiv; > } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u; > }; > > there are 32bits available before the union. If you don't use > those for num_elem then all wide-ints will at least take as > much space as DOUBLE_INTs originally took - and large ints > that would have required DOUBLE_INTs in the past will now > require more space than before. Which means your math > motivating the 'num_elem' encoding stuff is wrong. With > moving 'num_elem' before u you can even re-use the hwint > field in the union as the existing double-int code does > (which in fact could simply do the encoding trick in the > old CONST_DOUBLE scheme, similar for the tree INTEGER_CST > container). So, HOST_WIDE_INT is likely 64 bits, and likely is 64 bit aligned. The base (stuff before the union) is 32 bits. There is a 32 bit gap, even if not used before the HOST_WIDE_INT elem. We place the num_elem is this gap. Even if the field were removed, the size would not change, nor the placement of elem. So, short of packing, a 32-bit HWI host or going with a 32-bit type instead of a HOST_WIDE_INT, I'm not sure I follow you? I tend to discount 32-bit hosted compilers as a thing of the past.
>>> Note that the bits above the precision are not defined and the >>> algorithms used here are careful not to depend on their value. In >>> particular, values that come in from rtx constants may have random >>> bits. > Which is a red herring. It should be fixed. I cannot even believe > that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH > or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits > anywhere. > I can agree with you that this could be fixed. But it is not necessary to fix it. The rtl level and most of the tree level has existed for a long time by doing math within the precision. you do not see the masking at the rtl level because the masking is not necessary. if you never look beyond the precision you just do not care. There is the issue with divide and mod and quite frankly the code on the trunk scares me to death. my code at the rtl level makes sure that everything is clean since i can see if it is a udiv or a div that is enough info to clean the value up. >> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't >> have random bits. The upper bits must be a sign extension of the value. >> There's exactly one valid rtx for each (value, mode) pair. If you saw >> something different then that sounds like a bug. The rules should already >> be fairly well enforced though, since something like (const_int 128) -- >> or (const_int 256) -- will not match a QImode operand. > See. We're saved ;) this is richard's theory. There is a block of code at wide-int.cc:175 that is ifdefed out that checks to see if the rtl consts are canonical. if you bring it in, building the libraries fails very quickly. The million dollar question is this the only bug or is this the first bug of a 1000 bugs. Your comment about not seeing any masking cuts both ways. There is very little code on the way in if you create ints with GEN_INT that makes sure someone has done the right thing on that side. So my guess is that there are a lot of failures and a large number of them will be in the ports. If richard is right then there will be changes. The internals of wide-int will be changed so that everything is maintained in canonical form rather than just doing the canonization on the outside. This will clean up things like fits_uhwi_p and a lot more of the wide-int internals. But i think that if you want to change the compiler to use infinite precision arithmetic, you really ought to have a better reason that you think that it is cleaner. Because, it buys you nothing for most of the compiler and it is slower AND it is a much bigger change to the compiler than the one we want to make. >> This is probably the part of the representation that I disagree most with. >> There seem to be two main ways we could hande the extension to whole HWIs: >> >> (1) leave the stored upper bits undefined and extend them on read >> (2) keep the stored upper bits in extended form >> >> The patch goes for (1) but (2) seems better to me, for a few reasons: > I agree whole-heartedly. my statement is above. if we can do 2 then we should. But I do not think that it is worth several people-years to do this. > >> * As above, constants coming from rtl are already in the right form, >> so if you create a wide_int from an rtx and only query it, no explicit >> extension is needed. >> >> * Things like logical operations and right shifts naturally preserve >> the sign-extended form, so only a subset of write operations need >> to take special measures. >> >> * You have a public interface that exposes the underlying HWIs >> (which is fine with me FWIW), so it seems better to expose a fully-defined >> HWI rather than only a partially-defined HWI. >> >> E.g. zero_p is: >> >> HOST_WIDE_INT x; >> >> if (precision && precision < HOST_BITS_PER_WIDE_INT) >> x = sext_hwi (val[0], precision); >> else if (len == 0) >> { >> gcc_assert (precision == 0); >> return true; >> } >> else >> x = val[0]; >> >> return len == 1 && x == 0; >> >> but I think it really ought to be just: >> >> return len == 1 && val[0] == 0; > Yes! > > But then - what value does keeping track of a 'precision' have > in this case? It seems to me it's only a "convenient carrier" > for > > wide_int x = wide-int-from-RTX (y); > machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode; > ... process x ... > RTX = RTX-from-wide_int (x, saved_mode); > > that is, wide-int doesn't do anything with 'precision' but you > can extract it later to not need to remember a mode you were > interested in? > > Oh, and of course some operations require a 'precision', like rotate. As i have said, when the operands and result are all precision correct, as they are generally in both rtl and tree, then the default fixed precision interface is really very clean. I know you object to some implementation issues, but this is why we put the branch up, so you can see what you get from the other side. >>> When the precision is 0, all the bits in the LEN elements of >>> VEC are significant with no undefined bits. Precisionless >>> constants are limited to being one or two HOST_WIDE_INTs. When two >>> are used the upper value is 0, and the high order bit of the first >>> value is set. (Note that this may need to be generalized if it is >>> ever necessary to support 32bit HWIs again). >> I didn't understand this. When are two HOST_WIDE_INTs needed for >> "precision 0"? > For the wide_int containing unsigned HOST_WIDE_INT ~0. As we > sign-extend the representation (heh, yes, we do or should!) we > require an extra HWI to store the fact that ~0 is unsigned. > >> The main thing that's changed since the early patches is that we now >> have a mixture of wide-int types. This seems to have led to a lot of >> boiler-plate forwarding functions (or at least it felt like that while >> moving them all out the class). And that in turn seems to be because >> you're trying to keep everything as member functions. E.g. a lot of the >> forwarders are from a member function to a static function. >> >> Wouldn't it be better to have the actual classes be light-weight, >> with little more than accessors, and do the actual work with non-member >> template functions? There seems to be 3 grades of wide-int: >> >> (1) read-only, constant precision (from int, etc.) >> (2) read-write, constant precision (fixed_wide_int) >> (3) read-write, variable precision (wide_int proper) >> >> but we should be able to hide that behind templates, with compiler errors >> if you try to write to (1), etc. > Yeah, I'm probably trying to clean up the implementation once I > got past recovering from two month without GCC ... > Richard.
On Wed, 28 Aug 2013, Mike Stump wrote: > On Aug 28, 2013, at 3:22 AM, Richard Biener <rguenther@suse.de> wrote: > > Btw, rtl.h still wastes space with > > > > struct GTY((variable_size)) hwivec_def { > > int num_elem; /* number of elements */ > > HOST_WIDE_INT elem[1]; > > }; > > > > struct GTY((chain_next ("RTX_NEXT (&%h)"), > > chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def { > > ... > > /* The first element of the operands of this rtx. > > The number of operands and their types are controlled > > by the `code' field, according to rtl.def. */ > > union u { > > rtunion fld[1]; > > HOST_WIDE_INT hwint[1]; > > struct block_symbol block_sym; > > struct real_value rv; > > struct fixed_value fv; > > struct hwivec_def hwiv; > > } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u; > > }; > > > > there are 32bits available before the union. If you don't use > > those for num_elem then all wide-ints will at least take as > > much space as DOUBLE_INTs originally took - and large ints > > that would have required DOUBLE_INTs in the past will now > > require more space than before. Which means your math > > motivating the 'num_elem' encoding stuff is wrong. With > > moving 'num_elem' before u you can even re-use the hwint > > field in the union as the existing double-int code does > > (which in fact could simply do the encoding trick in the > > old CONST_DOUBLE scheme, similar for the tree INTEGER_CST > > container). > > So, HOST_WIDE_INT is likely 64 bits, and likely is 64 bit aligned. The > base (stuff before the union) is 32 bits. There is a 32 bit gap, even > if not used before the HOST_WIDE_INT elem. We place the num_elem is > this gap. No, you don't. You place num_elem 64bit aligned _after_ the gap. And you have another 32bit gap, as you say, before elem. > Even if the field were removed, the size would not change, > nor the placement of elem. So, short of packing, a 32-bit HWI host or > going with a 32-bit type instead of a HOST_WIDE_INT, I'm not sure I > follow you? I tend to discount 32-bit hosted compilers as a thing of > the past. Me, too. On 32bit hosts nothing would change as 'u' is 32bit aligned there (ok, on 32bit hosts putting num_elem before 'u' would actually increase memory usage - but as you say, 32bit hosted compilers are a thing of the past ;)). Richard.
On Wed, 28 Aug 2013, Kenneth Zadeck wrote: > > > > > Note that the bits above the precision are not defined and the > > > > algorithms used here are careful not to depend on their value. In > > > > particular, values that come in from rtx constants may have random > > > > bits. > > Which is a red herring. It should be fixed. I cannot even believe > > that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH > > or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits > > anywhere. > > > I can agree with you that this could be fixed. But it is not necessary to > fix it. The rtl level and most of the tree level has existed for a long time > by doing math within the precision. > > you do not see the masking at the rtl level because the masking is not > necessary. if you never look beyond the precision you just do not care. > There is the issue with divide and mod and quite frankly the code on the trunk > scares me to death. my code at the rtl level makes sure that everything is > clean since i can see if it is a udiv or a div that is enough info to clean > the value up. > > > > I have a feeling I'm rehashing a past debate, sorry, but rtx constants > > > can't > > > have random bits. The upper bits must be a sign extension of the value. > > > There's exactly one valid rtx for each (value, mode) pair. If you saw > > > something different then that sounds like a bug. The rules should already > > > be fairly well enforced though, since something like (const_int 128) -- > > > or (const_int 256) -- will not match a QImode operand. > > See. We're saved ;) > this is richard's theory. There is a block of code at wide-int.cc:175 that > is ifdefed out that checks to see if the rtl consts are canonical. if you > bring it in, building the libraries fails very quickly. The million dollar > question is this the only bug or is this the first bug of a 1000 bugs. Your > comment about not seeing any masking cuts both ways. There is very little > code on the way in if you create ints with GEN_INT that makes sure someone has > done the right thing on that side. So my guess is that there are a lot of > failures and a large number of them will be in the ports. Well, clearly RTL code _expects_ constants to be properly extended. I can reproduce the bug and I simply guess that's a matter of mismatching modes here (or performing an implicit truncation without properly extending the constant). at /space/rguenther/src/svn/wide-int/gcc/combine.c:10086 10086 GEN_INT (count)); (gdb) l 10081 10082 mask_rtx = GEN_INT (nonzero_bits (varop, GET_MODE (varop))); 10083 10084 mask_rtx 10085 = simplify_const_binary_operation (code, result_mode, mask_rtx, 10086 GEN_INT (count)); uses of GEN_INT are frowned upon ... for exactly that reason - the mask_rtx is not a proper RTL constant for SImode. Btw, all this isn't a reason to not have a well-defined wide-int representation. You just have (as you have for trees) to properly canonize the representation at the time you convert from RTL constants to wide-int constants. In the long run we want a uniform representation of constants so we can do zero-copying - but it looks like we now have three different representations - the tree one (sign or zero extended dependent on sign), RTL (garbage as you show) and wide-int (always sign-extended). That's why I was looking at at least matching what tree does (because tree constants _are_ properly canonized). Richard.
On 08/29/2013 04:42 AM, Richard Biener wrote: > On Wed, 28 Aug 2013, Kenneth Zadeck wrote: > >>>>> Note that the bits above the precision are not defined and the >>>>> algorithms used here are careful not to depend on their value. In >>>>> particular, values that come in from rtx constants may have random >>>>> bits. >>> Which is a red herring. It should be fixed. I cannot even believe >>> that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH >>> or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits >>> anywhere. >>> >> I can agree with you that this could be fixed. But it is not necessary to >> fix it. The rtl level and most of the tree level has existed for a long time >> by doing math within the precision. >> >> you do not see the masking at the rtl level because the masking is not >> necessary. if you never look beyond the precision you just do not care. >> There is the issue with divide and mod and quite frankly the code on the trunk >> scares me to death. my code at the rtl level makes sure that everything is >> clean since i can see if it is a udiv or a div that is enough info to clean >> the value up. >> >>>> I have a feeling I'm rehashing a past debate, sorry, but rtx constants >>>> can't >>>> have random bits. The upper bits must be a sign extension of the value. >>>> There's exactly one valid rtx for each (value, mode) pair. If you saw >>>> something different then that sounds like a bug. The rules should already >>>> be fairly well enforced though, since something like (const_int 128) -- >>>> or (const_int 256) -- will not match a QImode operand. >>> See. We're saved ;) >> this is richard's theory. There is a block of code at wide-int.cc:175 that >> is ifdefed out that checks to see if the rtl consts are canonical. if you >> bring it in, building the libraries fails very quickly. The million dollar >> question is this the only bug or is this the first bug of a 1000 bugs. Your >> comment about not seeing any masking cuts both ways. There is very little >> code on the way in if you create ints with GEN_INT that makes sure someone has >> done the right thing on that side. So my guess is that there are a lot of >> failures and a large number of them will be in the ports. > Well, clearly RTL code _expects_ constants to be properly extended. I > can reproduce the bug and I simply guess that's a matter of mismatching > modes here (or performing an implicit truncation without properly > extending the constant). > > at /space/rguenther/src/svn/wide-int/gcc/combine.c:10086 > 10086 GEN_INT > (count)); > (gdb) l > 10081 > 10082 mask_rtx = GEN_INT (nonzero_bits (varop, GET_MODE > (varop))); > 10083 > 10084 mask_rtx > 10085 = simplify_const_binary_operation (code, > result_mode, mask_rtx, > 10086 GEN_INT > (count)); > > uses of GEN_INT are frowned upon ... for exactly that reason - the > mask_rtx is not a proper RTL constant for SImode. over time, the GEN_INTs will go away at the portable rtl level as more of the code is transitioned to use wide-int. The port story is not so good. Any port that uses TI or beyond will likely evolve to using wide-int for the math and unifying the cases between CONST_INT and CONST_WIDE_INT. (the wide-int constructor from rtl takes either and the constructor to rtl looks at the constant.) But a port like the mips that has no TI will likely never change. > > Btw, all this isn't a reason to not have a well-defined wide-int > representation. You just have (as you have for trees) to properly > canonize the representation at the time you convert from RTL > constants to wide-int constants. The wide-int representation is completely well defined. This is what i cannot see why you do not understand. You do not need to look above the precision so there is nothing undefined!!!!!! I know this bothers you very badly and i will agree that if it is easy to clean up rtl to match this, then it would simplify the code somewhat. But it is not undefined or random. Furthermore, cleaning it will not change the abi so if rtl gets cleaner, we can change this. > In the long run we want a uniform representation of constants > so we can do zero-copying - but it looks like we now have > three different representations - the tree one (sign or zero > extended dependent on sign), RTL (garbage as you show) and > wide-int (always sign-extended). You wanted this to look more like double-int, now that it does, you now to see the flaws in it. Do i ever get to win? I do not think anyone would have been served well by trying to make trees have only signed constants in the way that double-int did. I will admit that the implementation would be cleaner if we could just change everything else in the compiler to match a super clean wide-int design. I took the other approach and bottled as much of the ugliness behind the a api and tried to keep the rest of the compiler looking mostly the way that it does. To a first approximation, ugliness is a conserved force in the universe. > > That's why I was looking at at least matching what tree does > (because tree constants _are_ properly canonized). well, they are very close. i am likely working on the "last" bug now for mips. > > Richard.
===================================== The two patches for the truck below are necessary to get identical code between the wide-int branch and the truck. The first patch has been submitted for review and fixes a bug. The second patch will not be submitted as it is just for compatibility. The second patch slightly changes the hash function that the rtl gcse passes use. Code is modified based on the traversal of a hash function, so if the hash functions are not identical, the code is slightly different between the two branches. ===================================== diff --git a/gcc/expr.c b/gcc/expr.c index 923f59b..f5744b0 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -4815,7 +4815,8 @@ expand_assignment (tree to, tree from, bool nontemporal) bitregion_start, bitregion_end, mode1, from, get_alias_set (to), nontemporal); - else if (bitpos >= mode_bitsize / 2) + else if (bitpos >= mode_bitsize / 2 + && bitpos+bitsize <= mode_bitsize) result = store_field (XEXP (to_rtx, 1), bitsize, bitpos - mode_bitsize / 2, bitregion_start, bitregion_end, @@ -4834,8 +4835,12 @@ expand_assignment (tree to, tree from, bool nontemporal) } else { + HOST_WIDE_INT extra = 0; + if (bitpos+bitsize > mode_bitsize) + extra = bitpos+bitsize - mode_bitsize; rtx temp = assign_stack_temp (GET_MODE (to_rtx), - GET_MODE_SIZE (GET_MODE (to_rtx))); + GET_MODE_SIZE (GET_MODE (to_rtx)) + + extra); write_complex_part (temp, XEXP (to_rtx, 0), false); write_complex_part (temp, XEXP (to_rtx, 1), true); result = store_field (temp, bitsize, bitpos, diff --git a/gcc/rtl.def b/gcc/rtl.def index b4ce1b9..5ed015c 100644 --- a/gcc/rtl.def +++ b/gcc/rtl.def @@ -342,6 +342,8 @@ DEF_RTL_EXPR(TRAP_IF, "trap_if", "ee", RTX_EXTRA) /* numeric integer constant */ DEF_RTL_EXPR(CONST_INT, "const_int", "w", RTX_CONST_OBJ) +DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ) + /* fixed-point constant */ DEF_RTL_EXPR(CONST_FIXED, "const_fixed", "www", RTX_CONST_OBJ)