diff mbox

[RFC] ipa bitwise constant propagation

Message ID CAAgBjM=iB5xbu6FLuzRui4fsHnGRJSC-ZpHYmvFmrSD3Dy+hig@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni Aug. 4, 2016, 6:36 a.m. UTC
Hi,
This is a prototype patch for propagating known/unknown bits inter-procedurally.
for integral types which propagates info obtained from get_nonzero_bits ().

Patch required making following changes:
a) To make info from get_nonzero_bits() available to ipa, I had to remove
guard !nonzero_p in ccp_finalize. However that triggered the following ICE
in get_ptr_info() for default_none.f95 (and several other fortran tests)
with options: -fopenacc -O2
ICE: http://pastebin.com/KjD7HMQi
I confirmed with Richard that this was a latent issue.

b) I chose widest_int for representing value, mask in ipcp_bits_lattice
and correspondingly changed declarations for
bit_value_unop_1/bit_value_binop_1 to take
precision and sign instead of type (those are the only two fields that
were used). Both these functions are exported by tree-ssa-ccp.h
I hope that's ok ?

c) Changed streamer_read_wi/streamer_write_wi to non-static.
Ah I see Kugan has submitted a patch for this, so I will drop this hunk.

d) We have following in tree-ssa-ccp.c:get_default_value ():
          if (flag_tree_bit_ccp)
            {
              wide_int nonzero_bits = get_nonzero_bits (var);
              if (nonzero_bits != -1)
                {
                  val.lattice_val = CONSTANT;
                  val.value = build_zero_cst (TREE_TYPE (var));
                  val.mask = extend_mask (nonzero_bits);
                }

extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying
in terms of bit-ccp.
I suppose in tree-ccp we need to extend mask if var is parameter since we don't
know in advance what values it will receive from different callers and mark all
upper bits as 1 to be safe.
However I suppose with ipa, we can determine exactly which bits of
parameter are constant and
setting all upper bits to 1 will become unnecessary ?

For example, consider following artificial test-case:
int f(int x)
{
  if (x > 300)
    return 1;
  else
    return 2;
}

int main(int argc, char **argv)
{
  return f(argc & 0xc) + f (argc & 0x3);
}

For x, the mask would be meet of:
<0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf
and ipcp_update_bits() sets nonzero_bits for x to 0xf.
However get_default_value then calls extend_mask (0xf), resulting in
all upper bits
being set to 1 and consequently the condition if (x > 300) doesn't get folded.

To resolve this, I added a new flag "set_by_ipa" to decl_common,
which is set to true if the mask of parameter is determined by ipa-cp,
and the condition changes to:

if (SSA_NAME_VAR (var)
    && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
    && DECL_SET_BY_IPA (SSA_NAME_VAR (var))
  val.mask = widest_int::from (nonzero_bits,
                          TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var)));
else
  val.mask = extend_mask (nonzero_bits);

I am not sure if adding a new flag to decl_common is a good idea. How
do other ipa passes deal with this/similar issue ?

I suppose we would want to gate this on some flag, say -fipa-bit-cp ?
I haven't yet gated it on the flag, will do in next version of patch.
I have added some very simple test-cases, I will try to add more
meaningful ones.

Patch passes bootstrap+test on x86_64-unknown-linux-gnu
and cross-tested on arm*-*-* and aarch64*-*-* with the exception
of some fortran tests failing due to above ICE.

As next steps, I am planning to extend it to handle alignment propagation,
and do further testing (lto-bootstrap, chromium).
I would be grateful for feedback on the current patch.

Thanks,
Prathamesh

Comments

Richard Biener Aug. 4, 2016, 8:01 a.m. UTC | #1
On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:

> Hi,
> This is a prototype patch for propagating known/unknown bits inter-procedurally.
> for integral types which propagates info obtained from get_nonzero_bits ().
> 
> Patch required making following changes:
> a) To make info from get_nonzero_bits() available to ipa, I had to remove
> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
> in get_ptr_info() for default_none.f95 (and several other fortran tests)
> with options: -fopenacc -O2
> ICE: http://pastebin.com/KjD7HMQi
> I confirmed with Richard that this was a latent issue.

Can you plase bootstrap/test the fix for this separately?  (doesn't
seem to be included in this patch btw)

> b) I chose widest_int for representing value, mask in ipcp_bits_lattice
> and correspondingly changed declarations for
> bit_value_unop_1/bit_value_binop_1 to take
> precision and sign instead of type (those are the only two fields that
> were used). Both these functions are exported by tree-ssa-ccp.h
> I hope that's ok ?

That's ok, but please change the functions to overloads of
bit_value_binop / bit_value_unop to not export ugly _1 names.

-  signop sgn = TYPE_SIGN (type);
-  int width = TYPE_PRECISION (type);
+  signop sgn = type_sgn;
+  int width = (int) type_precision;

please adjust parameter names to get rid of those now unnecessary
locals (and make the precision parameter an 'int').

> c) Changed streamer_read_wi/streamer_write_wi to non-static.
> Ah I see Kugan has submitted a patch for this, so I will drop this hunk.

But he streams wide_int, not widest_int.  I followed up on his
patch.

> d) We have following in tree-ssa-ccp.c:get_default_value ():
>           if (flag_tree_bit_ccp)
>             {
>               wide_int nonzero_bits = get_nonzero_bits (var);
>               if (nonzero_bits != -1)
>                 {
>                   val.lattice_val = CONSTANT;
>                   val.value = build_zero_cst (TREE_TYPE (var));
>                   val.mask = extend_mask (nonzero_bits);
>                 }
> 
> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying
> in terms of bit-ccp.
> I suppose in tree-ccp we need to extend mask if var is parameter since we don't
> know in advance what values it will receive from different callers and mark all
> upper bits as 1 to be safe.

Not sure, it seems to me that we can zero-extend for unsigned types
and sign-extend for signed types (if the "sign"-bit of nonzero_bits
is one it properly makes higher bits undefined).  Can you change
the code accordingly?  (simply give extend_mask a sign-op and use
that appropriately?)  Please split out this change so it can be
tested separately.

> However I suppose with ipa, we can determine exactly which bits of
> parameter are constant and
> setting all upper bits to 1 will become unnecessary ?
> 
> For example, consider following artificial test-case:
> int f(int x)
> {
>   if (x > 300)
>     return 1;
>   else
>     return 2;
> }
> 
> int main(int argc, char **argv)
> {
>   return f(argc & 0xc) + f (argc & 0x3);
> }
> 
> For x, the mask would be meet of:
> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf
> and ipcp_update_bits() sets nonzero_bits for x to 0xf.
> However get_default_value then calls extend_mask (0xf), resulting in
> all upper bits
> being set to 1 and consequently the condition if (x > 300) doesn't get folded.

But then why would the code trying to optimize the comparison look at
bits that are outside of the precision?  (where do we try to use this
info?  I see that VRP misses to use nonzero bits if no range info
is present - I suppose set_nonzero_bits misses to eventually adjust
the range.

That said, where is the folding code and why does it care for those
"uninteresting" bits at all?

> To resolve this, I added a new flag "set_by_ipa" to decl_common,
> which is set to true if the mask of parameter is determined by ipa-cp,
> and the condition changes to:
> 
> if (SSA_NAME_VAR (var)
>     && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
>     && DECL_SET_BY_IPA (SSA_NAME_VAR (var))
>   val.mask = widest_int::from (nonzero_bits,
>                           TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var)));
> else
>   val.mask = extend_mask (nonzero_bits);
> 
> I am not sure if adding a new flag to decl_common is a good idea. How
> do other ipa passes deal with this/similar issue ?
> 
> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?
> I haven't yet gated it on the flag, will do in next version of patch.
> I have added some very simple test-cases, I will try to add more
> meaningful ones.

See above - we should avoid needing this.

> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
> of some fortran tests failing due to above ICE.
> 
> As next steps, I am planning to extend it to handle alignment propagation,
> and do further testing (lto-bootstrap, chromium).
> I would be grateful for feedback on the current patch.

I see you do

@@ -895,7 +899,7 @@ do_dbg_cnt (void)
    Return TRUE when something was optimized.  */

 static bool
-ccp_finalize (bool nonzero_p)
+ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED)
 {
   bool something_changed;
   unsigned i;
@@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p)

       if (!name
          || (!POINTER_TYPE_P (TREE_TYPE (name))
-             && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
-                 /* Don't record nonzero bits before IPA to avoid
-                    using too much memory.  */
-                 || !nonzero_p)))
+             && (!INTEGRAL_TYPE_P (TREE_TYPE (name)))))
        continue;

can you instead adjust the caller to do sth like

  if (ccp_finalize (nonzero_p || flag_ipa_cp))
    {

?  What we miss to optimize memory usage in the early CCP case
(it's run very early, before dead code elimination) is to
avoid setting alignment / nonzero bits for the case of
fully propagatable (and thus dead after substitute_and_fold)
SSA names.

So in ccp_finalize do sth like

      val = get_value (name);
      if (val->lattice_val != CONSTANT
          || TREE_CODE (val->value) != INTEGER_CST
          || val->mask == 0)
        continue;

That should cut down early CCP memory use in case of nonzero
setting significantly.

I didn't look at the propagation part but eventually the IPA-CP
lattice gets quite big.  Also the alignment lattice is very
similar to the bits lattice so why not merge those two?  But
in the end it's Martins/Honzas call here.  Note there is
trailing_wide_ints <> which could be used to improve memory usage
based on the underlying type.

Thanks,
Richard.
Prathamesh Kulkarni Aug. 4, 2016, 8:57 a.m. UTC | #2
On 4 August 2016 at 13:31, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:
>
>> Hi,
>> This is a prototype patch for propagating known/unknown bits inter-procedurally.
>> for integral types which propagates info obtained from get_nonzero_bits ().
>>
>> Patch required making following changes:
>> a) To make info from get_nonzero_bits() available to ipa, I had to remove
>> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
>> in get_ptr_info() for default_none.f95 (and several other fortran tests)
>> with options: -fopenacc -O2
>> ICE: http://pastebin.com/KjD7HMQi
>> I confirmed with Richard that this was a latent issue.
>
> Can you plase bootstrap/test the fix for this separately?  (doesn't
> seem to be included in this patch btw)
Well I don't have the fix available -;)
>
>> b) I chose widest_int for representing value, mask in ipcp_bits_lattice
>> and correspondingly changed declarations for
>> bit_value_unop_1/bit_value_binop_1 to take
>> precision and sign instead of type (those are the only two fields that
>> were used). Both these functions are exported by tree-ssa-ccp.h
>> I hope that's ok ?
>
> That's ok, but please change the functions to overloads of
> bit_value_binop / bit_value_unop to not export ugly _1 names.
>
> -  signop sgn = TYPE_SIGN (type);
> -  int width = TYPE_PRECISION (type);
> +  signop sgn = type_sgn;
> +  int width = (int) type_precision;
>
> please adjust parameter names to get rid of those now unnecessary
> locals (and make the precision parameter an 'int').
>
>> c) Changed streamer_read_wi/streamer_write_wi to non-static.
>> Ah I see Kugan has submitted a patch for this, so I will drop this hunk.
>
> But he streams wide_int, not widest_int.  I followed up on his
> patch.
Oops, I got confused, sorry about that.
>
>> d) We have following in tree-ssa-ccp.c:get_default_value ():
>>           if (flag_tree_bit_ccp)
>>             {
>>               wide_int nonzero_bits = get_nonzero_bits (var);
>>               if (nonzero_bits != -1)
>>                 {
>>                   val.lattice_val = CONSTANT;
>>                   val.value = build_zero_cst (TREE_TYPE (var));
>>                   val.mask = extend_mask (nonzero_bits);
>>                 }
>>
>> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying
>> in terms of bit-ccp.
>> I suppose in tree-ccp we need to extend mask if var is parameter since we don't
>> know in advance what values it will receive from different callers and mark all
>> upper bits as 1 to be safe.
>
> Not sure, it seems to me that we can zero-extend for unsigned types
> and sign-extend for signed types (if the "sign"-bit of nonzero_bits
> is one it properly makes higher bits undefined).  Can you change
> the code accordingly?  (simply give extend_mask a sign-op and use
> that appropriately?)  Please split out this change so it can be
> tested separately.
>
>> However I suppose with ipa, we can determine exactly which bits of
>> parameter are constant and
>> setting all upper bits to 1 will become unnecessary ?
>>
>> For example, consider following artificial test-case:
>> int f(int x)
>> {
>>   if (x > 300)
>>     return 1;
>>   else
>>     return 2;
>> }
>>
>> int main(int argc, char **argv)
>> {
>>   return f(argc & 0xc) + f (argc & 0x3);
>> }
>>
>> For x, the mask would be meet of:
>> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf
>> and ipcp_update_bits() sets nonzero_bits for x to 0xf.
>> However get_default_value then calls extend_mask (0xf), resulting in
>> all upper bits
>> being set to 1 and consequently the condition if (x > 300) doesn't get folded.
>
> But then why would the code trying to optimize the comparison look at
> bits that are outside of the precision?  (where do we try to use this
> info?  I see that VRP misses to use nonzero bits if no range info
> is present - I suppose set_nonzero_bits misses to eventually adjust
> the range.
>
> That said, where is the folding code and why does it care for those
> "uninteresting" bits at all?
Well there is following in bit_value_binop_1 for case LT_EXPR / LE_EXPR:
        /* If the most significant bits are not known we know nothing.  */
        if (wi::neg_p (o1mask) || wi::neg_p (o2mask))
          break;

IIUC extend_mask extends all upper bits to 1, and we hit break and
thus not perform folding.
ccp2 dump shows:
Folding statement: if (x_2(D) > 300)
which is likely CONSTANT
Not folded

Instead if we extend based on signop, then the condition gets folded correctly:
Folding statement: if (x_2(D) > 300)
which is likely CONSTANT
Folding predicate x_2(D) > 300 to 0
gimple_simplified to if (0 != 0)
Folded into: if (0 != 0)

I thought it was unsafe for ccp to extend based on sign-op,
so I guarded that on DECL_SET_BY_IPA.
I will try to change extend_mask to extend the mask based on signop
and get rid of the flag.

I will address your other comments in follow-up patch.

Thanks,
Prathamesh
>
>> To resolve this, I added a new flag "set_by_ipa" to decl_common,
>> which is set to true if the mask of parameter is determined by ipa-cp,
>> and the condition changes to:
>>
>> if (SSA_NAME_VAR (var)
>>     && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
>>     && DECL_SET_BY_IPA (SSA_NAME_VAR (var))
>>   val.mask = widest_int::from (nonzero_bits,
>>                           TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var)));
>> else
>>   val.mask = extend_mask (nonzero_bits);
>>
>> I am not sure if adding a new flag to decl_common is a good idea. How
>> do other ipa passes deal with this/similar issue ?
>>
>> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?
>> I haven't yet gated it on the flag, will do in next version of patch.
>> I have added some very simple test-cases, I will try to add more
>> meaningful ones.
>
> See above - we should avoid needing this.
>
>> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
>> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
>> of some fortran tests failing due to above ICE.
>>
>> As next steps, I am planning to extend it to handle alignment propagation,
>> and do further testing (lto-bootstrap, chromium).
>> I would be grateful for feedback on the current patch.
>
> I see you do
>
> @@ -895,7 +899,7 @@ do_dbg_cnt (void)
>     Return TRUE when something was optimized.  */
>
>  static bool
> -ccp_finalize (bool nonzero_p)
> +ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED)
>  {
>    bool something_changed;
>    unsigned i;
> @@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p)
>
>        if (!name
>           || (!POINTER_TYPE_P (TREE_TYPE (name))
> -             && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
> -                 /* Don't record nonzero bits before IPA to avoid
> -                    using too much memory.  */
> -                 || !nonzero_p)))
> +             && (!INTEGRAL_TYPE_P (TREE_TYPE (name)))))
>         continue;
>
> can you instead adjust the caller to do sth like
>
>   if (ccp_finalize (nonzero_p || flag_ipa_cp))
>     {
>
> ?  What we miss to optimize memory usage in the early CCP case
> (it's run very early, before dead code elimination) is to
> avoid setting alignment / nonzero bits for the case of
> fully propagatable (and thus dead after substitute_and_fold)
> SSA names.
>
> So in ccp_finalize do sth like
>
>       val = get_value (name);
>       if (val->lattice_val != CONSTANT
>           || TREE_CODE (val->value) != INTEGER_CST
>           || val->mask == 0)
>         continue;
>
> That should cut down early CCP memory use in case of nonzero
> setting significantly.
>
> I didn't look at the propagation part but eventually the IPA-CP
> lattice gets quite big.  Also the alignment lattice is very
> similar to the bits lattice so why not merge those two?  But
> in the end it's Martins/Honzas call here.  Note there is
> trailing_wide_ints <> which could be used to improve memory usage
> based on the underlying type.
>
> Thanks,
> Richard.
Kugan Vivekanandarajah Aug. 4, 2016, 9:07 a.m. UTC | #3
On 04/08/16 18:57, Prathamesh Kulkarni wrote:
> On 4 August 2016 at 13:31, Richard Biener <rguenther@suse.de> wrote:
>> On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:
>>
>>> Hi,
>>> This is a prototype patch for propagating known/unknown bits inter-procedurally.
>>> for integral types which propagates info obtained from get_nonzero_bits ().
>>>
>>> Patch required making following changes:
>>> a) To make info from get_nonzero_bits() available to ipa, I had to remove
>>> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
>>> in get_ptr_info() for default_none.f95 (and several other fortran tests)
>>> with options: -fopenacc -O2
>>> ICE: http://pastebin.com/KjD7HMQi
>>> I confirmed with Richard that this was a latent issue.
>>
>> Can you plase bootstrap/test the fix for this separately?  (doesn't
>> seem to be included in this patch btw)
> Well I don't have the fix available -;)

This looks like what I fixed in 
https://patchwork.ozlabs.org/patch/648662/. I will commit that soon.

Thanks,
Kugan
Jan Hubicka Aug. 4, 2016, 1:05 p.m. UTC | #4
> I didn't look at the propagation part but eventually the IPA-CP
> lattice gets quite big.  Also the alignment lattice is very
> similar to the bits lattice so why not merge those two?  But

This was always the original idea to replace alignment propagation by bitwise
ccp.  I suppose we only have issue here because nonzero bits are not tracked for
pointers so we need to feed the original lattices by hand?

We could also make use of VR ranges and bits while evaultaing predicates
in ipa-inline-analysis. I can look into it after returning from Leeds.

Honza
> in the end it's Martins/Honzas call here.  Note there is
> trailing_wide_ints <> which could be used to improve memory usage
> based on the underlying type.
> 
> Thanks,
> Richard.
Kugan Vivekanandarajah Aug. 4, 2016, 11:04 p.m. UTC | #5
Hi Honza,

On 04/08/16 23:05, Jan Hubicka wrote:
>> I didn't look at the propagation part but eventually the IPA-CP
>> lattice gets quite big.  Also the alignment lattice is very
>> similar to the bits lattice so why not merge those two?  But
>
> This was always the original idea to replace alignment propagation by bitwise
> ccp.  I suppose we only have issue here because nonzero bits are not tracked for
> pointers so we need to feed the original lattices by hand?

I also raised this one with Prathamesh off line. With the early-vrp, we 
should have nonzero_bits for non pointers. For pointers we should feed 
the lattices with get_pointer_alignment_1 as it is done in 
ipa-cpalignment propagation.

> We could also make use of VR ranges and bits while evaultaing predicates
> in ipa-inline-analysis. I can look into it after returning from Leeds.

Indeed. With ealrly dom based VRP (non iterative at this point), some of 
the ranges can be pessimistic and can impact the estimation. Let me have 
a look at this.

Thanks,
Kugan
Jan Hubicka Aug. 5, 2016, 11:36 a.m. UTC | #6
> Hi Honza,
> 
> On 04/08/16 23:05, Jan Hubicka wrote:
> >>I didn't look at the propagation part but eventually the IPA-CP
> >>lattice gets quite big.  Also the alignment lattice is very
> >>similar to the bits lattice so why not merge those two?  But
> >
> >This was always the original idea to replace alignment propagation by bitwise
> >ccp.  I suppose we only have issue here because nonzero bits are not tracked for
> >pointers so we need to feed the original lattices by hand?
> 
> I also raised this one with Prathamesh off line. With the early-vrp,
> we should have nonzero_bits for non pointers. For pointers we should
> feed the lattices with get_pointer_alignment_1 as it is done in
> ipa-cpalignment propagation.

Yes, that is the general idea. Note that also for pointers it would be
very useful to track what pointers are non-NULL (C++ multiple inheritance inserts
a lot of NULL pointer checks that confuse us in later analysis and it would
be nice to optimize them out). I am not very convinced saving a pointer is
worth to make difference between pointers/nonpointers for all the local
tracking.
> 
> >We could also make use of VR ranges and bits while evaultaing predicates
> >in ipa-inline-analysis. I can look into it after returning from Leeds.
> 
> Indeed. With ealrly dom based VRP (non iterative at this point),
> some of the ranges can be pessimistic and can impact the estimation.
> Let me have a look at this.

Yes, but those are independent issues - size/time estimates should
take into account the new info we have and we should work on getting
it better when we can ;)

I will try to revisit the size/time code after returning from
my vacation and turn it into sreals for time + cleanup/generalize the APIs
a bit. I tried to do it last stage1 but got stuck on some of ugly side cases
+ gengtype not liking sreal type.

Honza
> 
> Thanks,
> Kugan
>
Martin Jambor Aug. 5, 2016, 12:36 p.m. UTC | #7
Hi,

generally speaking, the ipa-cp.c and ipa-cp.[hc] bits look reasonable,
but I have a few comments:

On Thu, Aug 04, 2016 at 12:06:18PM +0530, Prathamesh Kulkarni wrote:
> Hi,
> This is a prototype patch for propagating known/unknown bits inter-procedurally.
> for integral types which propagates info obtained from get_nonzero_bits ().
> 
> Patch required making following changes:
> a) To make info from get_nonzero_bits() available to ipa

in IPA phase, you should not be looking at SSA_NAMEs, those will not
be available with LTO when you do not have access to function bodies
and thus also not to SSA_NAMES.  In IPA, you should only rely on hat
you have in jump functions.

> , I had to remove
> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
> in get_ptr_info() for default_none.f95 (and several other fortran tests)
> with options: -fopenacc -O2
> ICE: http://pastebin.com/KjD7HMQi
> I confirmed with Richard that this was a latent issue.
> 
> 
> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?

Yes, definitely.

> I haven't yet gated it on the flag, will do in next version of patch.
> I have added some very simple test-cases, I will try to add more
> meaningful ones.


> 
> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
> of some fortran tests failing due to above ICE.
> 
> As next steps, I am planning to extend it to handle alignment propagation,
> and do further testing (lto-bootstrap, chromium).
> I would be grateful for feedback on the current patch.
> 
> Thanks,
> Prathamesh

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 5b6cb9a..b770f6a 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -120,6 +120,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "ipa-inline.h"
>  #include "ipa-utils.h"
> +#include "tree-ssa-ccp.h"
>  
>  template <typename valtype> class ipcp_value;
>  
> @@ -266,6 +267,40 @@ private:
>    bool meet_with_1 (unsigned new_align, unsigned new_misalign);
>  };
>  
> +/* Lattice of known bits, only capable of holding one value.
> +   Similar to ccp_prop_value_t, mask represents which bits of value are constant.
> +   If a bit in mask is set to 0, then the corresponding bit in
> +   value is known to be constant.  */
> +
> +class ipcp_bits_lattice
> +{
> +public:
> +  bool bottom_p () { return lattice_val == IPA_BITS_VARYING; }
> +  bool top_p () { return lattice_val == IPA_BITS_UNDEFINED; }
> +  bool constant_p () { return lattice_val == IPA_BITS_CONSTANT; }
> +  bool set_to_bottom ();
> +  bool set_to_constant (widest_int, widest_int, signop, unsigned);
> + 
> +  widest_int get_value () { return value; }
> +  widest_int get_mask () { return mask; }
> +  signop get_sign () { return sgn; }
> +  unsigned get_precision () { return precision; }
> +
> +  bool meet_with (ipcp_bits_lattice& other, enum tree_code, tree);
> +  bool meet_with (widest_int, widest_int, signop, unsigned);
> +
> +  void print (FILE *);
> +
> +private:
> +  enum { IPA_BITS_UNDEFINED, IPA_BITS_CONSTANT, IPA_BITS_VARYING } lattice_val;
> +  widest_int value, mask;
> +  signop sgn;
> +  unsigned precision;
> +
> +  bool meet_with_1 (widest_int, widest_int); 
> +  void get_value_and_mask (tree, widest_int *, widest_int *);
> +}; 
> +
>  /* Structure containing lattices for a parameter itself and for pieces of
>     aggregates that are passed in the parameter or by a reference in a parameter
>     plus some other useful flags.  */
> @@ -281,6 +316,8 @@ public:
>    ipcp_agg_lattice *aggs;
>    /* Lattice describing known alignment.  */
>    ipcp_alignment_lattice alignment;
> +  /* Lattice describing known bits.  */
> +  ipcp_bits_lattice bits_lattice;
>    /* Number of aggregate lattices */
>    int aggs_count;
>    /* True if aggregate data were passed by reference (as opposed to by
> @@ -458,6 +495,21 @@ ipcp_alignment_lattice::print (FILE * f)
>      fprintf (f, "         Alignment %u, misalignment %u\n", align, misalign);
>  }
>  
> +void
> +ipcp_bits_lattice::print (FILE *f)
> +{
> +  if (top_p ())
> +    fprintf (f, "         Bits unknown (TOP)\n");
> +  else if (bottom_p ())
> +    fprintf (f, "         Bits unusable (BOTTOM)\n");
> +  else
> +    {
> +      fprintf (f, "         Bits: value = "); print_hex (get_value (), f);
> +      fprintf (f, ", mask = "); print_hex (get_mask (), f);
> +      fprintf (f, "\n");
> +    }
> +}
> +
>  /* Print all ipcp_lattices of all functions to F.  */
>  
>  static void
> @@ -484,6 +536,7 @@ print_all_lattices (FILE * f, bool dump_sources, bool dump_benefits)
>  	  fprintf (f, "         ctxs: ");
>  	  plats->ctxlat.print (f, dump_sources, dump_benefits);
>  	  plats->alignment.print (f);
> +	  plats->bits_lattice.print (f);
>  	  if (plats->virt_call)
>  	    fprintf (f, "        virt_call flag set\n");
>  
> @@ -911,6 +964,161 @@ ipcp_alignment_lattice::meet_with (const ipcp_alignment_lattice &other,
>    return meet_with_1 (other.align, adjusted_misalign);
>  }
>  
> +/* Set lattice value to bottom, if it already isn't the case.  */
> +
> +bool
> +ipcp_bits_lattice::set_to_bottom ()
> +{
> +  if (bottom_p ())
> +    return false;
> +  lattice_val = IPA_BITS_VARYING;
> +  value = 0;
> +  mask = -1;
> +  return true;
> +}
> +
> +/* Set to constant if it isn't already. Only meant to be called
> +   when switching state from TOP.  */
> +
> +bool
> +ipcp_bits_lattice::set_to_constant (widest_int value, widest_int mask,
> +				    signop sgn, unsigned precision)
> +{
> +  gcc_assert (top_p ());
> +  this->lattice_val = IPA_BITS_CONSTANT;
> +  this->value = value;
> +  this->mask = mask;
> +  this->sgn = sgn;
> +  this->precision = precision;
> +  return true;
> +}
> +
> +/* Convert operand to value, mask form.  */
> +
> +void
> +ipcp_bits_lattice::get_value_and_mask (tree operand, widest_int *valuep, widest_int *maskp)
> +{
> +  wide_int get_nonzero_bits (const_tree);
> +
> +  if (TREE_CODE (operand) == INTEGER_CST)
> +    {
> +      *valuep = wi::to_widest (operand); 
> +      *maskp = 0;
> +    }
> +  else if (TREE_CODE (operand) == SSA_NAME)

IIUC, operand is the operand from pass-through jump function and that
should never be an SSA_NAME.  I have even looked at how we generate
them and it seems fairly safe to say that they never are.  If you have
seen an SSA_NAME here, it is a bug and please let me know because
sooner or later it will cause an assert.

> +    {
> +      *valuep = 0;
> +      *maskp = widest_int::from (get_nonzero_bits (operand), UNSIGNED);
> +    }
> +  else
> +    gcc_unreachable ();

The operand however can be any any other is_gimple_ip_invariant tree.
I assume that you could hit this gcc_unreachable only in a program
with undefined behavior (or with a Fortran CONST_DECL?) but you should
not ICE here.


> +}  
> +
> +/* Meet operation, similar to ccp_lattice_meet, we xor values
> +   if this->value, value have different values at same bit positions, we want
> +   to drop that bit to varying. Return true if mask is changed.
> +   This function assumes that the lattice value is in CONSTANT state  */
> +
> +bool
> +ipcp_bits_lattice::meet_with_1 (widest_int value, widest_int mask)
> +{
> +  gcc_assert (constant_p ());
> +  
> +  widest_int old_mask = this->mask;
> +  this->mask = (this->mask | mask) | (this->value ^ value);
> +
> +  if (wi::sext (this->mask, this->precision) == -1)
> +    return set_to_bottom ();
> +
> +  bool changed = this->mask != old_mask;
> +  return changed;
> +}
> +
> +/* Meet the bits lattice with operand
> +   described by <value, mask, sgn, precision.  */
> +
> +bool
> +ipcp_bits_lattice::meet_with (widest_int value, widest_int mask,
> +			      signop sgn, unsigned precision)
> +{
> +  if (bottom_p ())
> +    return false;
> +
> +  if (top_p ())
> +    {
> +      if (wi::sext (mask, precision) == -1)
> +	return set_to_bottom ();
> +      return set_to_constant (value, mask, sgn, precision);
> +    }
> +
> +  return meet_with_1 (value, mask);

What if precisions do not match?

> +}
> +
> +/* Meet bits lattice with the result of bit_value_binop_1 (other, operand)
> +   if code is binary operation or bit_value_unop_1 (other) if code is unary op.
> +   In the case when code is nop_expr, no adjustment is required. */
> +
> +bool
> +ipcp_bits_lattice::meet_with (ipcp_bits_lattice& other, enum tree_code code, tree operand)
> +{
> +  if (other.bottom_p ())
> +    return set_to_bottom ();
> +
> +  if (bottom_p () || other.top_p ())
> +    return false;
> +
> +  widest_int adjusted_value, adjusted_mask;
> +
> +  if (TREE_CODE_CLASS (code) == tcc_binary)
> +    {
> +      tree type = TREE_TYPE (operand);
> +      gcc_assert (INTEGRAL_TYPE_P (type));
> +      widest_int o_value, o_mask;
> +      get_value_and_mask (operand, &o_value, &o_mask);
> +
> +      signop sgn = other.get_sign ();
> +      unsigned prec = other.get_precision ();
> +
> +      bit_value_binop_1 (code, sgn, prec, &adjusted_value, &adjusted_mask,
> +			 sgn, prec, other.get_value (), other.get_mask (),
> +			 TYPE_SIGN (type), TYPE_PRECISION (type), o_value, o_mask);

It is probably just me not being particularly sharp on a Friday
afternoon and I might not understand the semantics of mask well (also,
you did not document it :-), but... assume that we are looking at a
binary and operation, other comes from an SSA pointer and its mask
would be binary 100 and its value 0 because that's what you set for
ssa names in ipa-prop.h, and the operand is binary value 101, which
means that get_value_and_mask returns mask 0 and value 101.  Now,
bit_value_binop_1 would return value 0 & 101 = 0 and mask according to

(m1 | m2) & ((v1 | m1) & (v2 | m2))

so in our case

(100b & 0) & ((0 | 100b) & (101b | 0)) = 0 & 100b = 0.

So both value and mask would be zero, which, if this was the only
incoming value to the lattice, I am afraid you would later on in
ipcp_update_bits interpret as that no bits can be non-zero.  Yet the
third bit clearly could.

I think that this is the only place where you interpret mask as a mask
and actually use it for and-ing, everywhere else you interpret it
basically only as the result of get_nonzero_bits and use it for
or-ing.

> +
> +      if (wi::sext (adjusted_mask, prec) == -1)
> +	return set_to_bottom ();
> +    }
> +
> +  else if (TREE_CODE_CLASS (code) == tcc_unary)
> +    {
> +      signop sgn = other.get_sign ();
> +      unsigned prec = other.get_precision ();
> +
> +      bit_value_unop_1 (code, sgn, prec, &adjusted_value,
> +			&adjusted_mask, sgn, prec, other.get_value (),
> +			other.get_mask ());
> +
> +      if (wi::sext (adjusted_mask, prec) == -1)
> +	return set_to_bottom ();
> +    }
> +
> +  else if (code == NOP_EXPR)
> +    {
> +      adjusted_value = other.value;
> +      adjusted_mask = other.mask;
> +    }
> +
> +  else
> +    return set_to_bottom ();
> +
> +  if (top_p ())
> +    {
> +      if (wi::sext (adjusted_mask, other.get_precision ()) == -1)
> +	return set_to_bottom ();
> +      return set_to_constant (adjusted_value, adjusted_mask, other.get_sign (), other.get_precision ());
> +    }
> +  else
> +    return meet_with_1 (adjusted_value, adjusted_mask);

Again, What if precisions do not match?

> +}
> +
>  /* Mark bot aggregate and scalar lattices as containing an unknown variable,
>     return true is any of them has not been marked as such so far.  */
>  
> @@ -922,6 +1130,7 @@ set_all_contains_variable (struct ipcp_param_lattices *plats)
>    ret |= plats->ctxlat.set_contains_variable ();
>    ret |= set_agg_lats_contain_variable (plats);
>    ret |= plats->alignment.set_to_bottom ();
> +  ret |= plats->bits_lattice.set_to_bottom ();
>    return ret;
>  }
>  
> @@ -1003,6 +1212,7 @@ initialize_node_lattices (struct cgraph_node *node)
>  	      plats->ctxlat.set_to_bottom ();
>  	      set_agg_lats_to_bottom (plats);
>  	      plats->alignment.set_to_bottom ();
> +	      plats->bits_lattice.set_to_bottom ();
>  	    }
>  	  else
>  	    set_all_contains_variable (plats);
> @@ -1621,6 +1831,57 @@ propagate_alignment_accross_jump_function (cgraph_edge *cs,
>      }
>  }
>  
> +/* Propagate bits across jfunc that is associated with
> +   edge cs and update dest_lattice accordingly.  */
> +
> +bool
> +propagate_bits_accross_jump_function (cgraph_edge *cs, ipa_jump_func *jfunc,
> +				      ipcp_bits_lattice *dest_lattice)
> +{
> +  if (dest_lattice->bottom_p ())
> +    return false;
> +
> +  if (jfunc->type == IPA_JF_PASS_THROUGH)
> +    {
> +      struct ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
> +      enum tree_code code = ipa_get_jf_pass_through_operation (jfunc);
> +      tree operand = NULL_TREE;
> +
> +      if (code != NOP_EXPR)
> +	operand = ipa_get_jf_pass_through_operand (jfunc);
> +
> +      int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
> +      struct ipcp_param_lattices *src_lats
> +	= ipa_get_parm_lattices (caller_info, src_idx);
> +
> +      /* Try to proapgate bits if src_lattice is bottom, but jfunc is known.

propagate

> +	 for eg consider:
> +	 int f(int x)
> +	 {
> +	   g (x & 0xff);
> +	 }
> +	 Assume lattice for x is bottom, however we can still propagate
> +	 result of x & 0xff == 0xff, which gets computed during ccp1 pass
> +	 and we store it in jump function during analysis stage.  */
> +
> +      if (src_lats->bits_lattice.bottom_p ()
> +	  && jfunc->bits.known)
> +	return dest_lattice->meet_with (jfunc->bits.value, jfunc->bits.mask,
> +				        jfunc->bits.sgn, jfunc->bits.precision);
> +      else
> +	return dest_lattice->meet_with (src_lats->bits_lattice, code, operand);
> +    }
> +
> +  else if (jfunc->type == IPA_JF_ANCESTOR)
> +    return dest_lattice->set_to_bottom ();
> +
> +  else if (jfunc->bits.known) 
> +    return dest_lattice->meet_with (jfunc->bits.value, jfunc->bits.mask,
> +				    jfunc->bits.sgn, jfunc->bits.precision);
> +  else
> +    return dest_lattice->set_to_bottom ();
> +}
> +

...

> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index e32d078..d69a071 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -154,6 +154,16 @@ struct GTY(()) ipa_alignment
>    unsigned misalign;
>  };
>  
> +/* Information about zero/non-zero bits.  */
> +struct GTY(()) ipa_bits
> +{
> +  bool known;
> +  widest_int value;
> +  widest_int mask;
> +  enum signop sgn;
> +  unsigned precision;
> +};

Please order the fields according to their size, the largest first and
add a comment describing each one of them.  As I explained above, it
is not immediately clear why the mask would be a mask, for example.

Nevertheless, thanks for looking into this.  It would be nice to have
this for pointers too, not least because we could represent
non-NULL-ness this way, which could be very interesting for cloning
and inlining.  But those are further steps, once normal propagation
works.

Martin
diff mbox

Patch

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 5b6cb9a..b770f6a 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -120,6 +120,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "ipa-inline.h"
 #include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
 
 template <typename valtype> class ipcp_value;
 
@@ -266,6 +267,40 @@  private:
   bool meet_with_1 (unsigned new_align, unsigned new_misalign);
 };
 
+/* Lattice of known bits, only capable of holding one value.
+   Similar to ccp_prop_value_t, mask represents which bits of value are constant.
+   If a bit in mask is set to 0, then the corresponding bit in
+   value is known to be constant.  */
+
+class ipcp_bits_lattice
+{
+public:
+  bool bottom_p () { return lattice_val == IPA_BITS_VARYING; }
+  bool top_p () { return lattice_val == IPA_BITS_UNDEFINED; }
+  bool constant_p () { return lattice_val == IPA_BITS_CONSTANT; }
+  bool set_to_bottom ();
+  bool set_to_constant (widest_int, widest_int, signop, unsigned);
+ 
+  widest_int get_value () { return value; }
+  widest_int get_mask () { return mask; }
+  signop get_sign () { return sgn; }
+  unsigned get_precision () { return precision; }
+
+  bool meet_with (ipcp_bits_lattice& other, enum tree_code, tree);
+  bool meet_with (widest_int, widest_int, signop, unsigned);
+
+  void print (FILE *);
+
+private:
+  enum { IPA_BITS_UNDEFINED, IPA_BITS_CONSTANT, IPA_BITS_VARYING } lattice_val;
+  widest_int value, mask;
+  signop sgn;
+  unsigned precision;
+
+  bool meet_with_1 (widest_int, widest_int); 
+  void get_value_and_mask (tree, widest_int *, widest_int *);
+}; 
+
 /* Structure containing lattices for a parameter itself and for pieces of
    aggregates that are passed in the parameter or by a reference in a parameter
    plus some other useful flags.  */
@@ -281,6 +316,8 @@  public:
   ipcp_agg_lattice *aggs;
   /* Lattice describing known alignment.  */
   ipcp_alignment_lattice alignment;
+  /* Lattice describing known bits.  */
+  ipcp_bits_lattice bits_lattice;
   /* Number of aggregate lattices */
   int aggs_count;
   /* True if aggregate data were passed by reference (as opposed to by
@@ -458,6 +495,21 @@  ipcp_alignment_lattice::print (FILE * f)
     fprintf (f, "         Alignment %u, misalignment %u\n", align, misalign);
 }
 
+void
+ipcp_bits_lattice::print (FILE *f)
+{
+  if (top_p ())
+    fprintf (f, "         Bits unknown (TOP)\n");
+  else if (bottom_p ())
+    fprintf (f, "         Bits unusable (BOTTOM)\n");
+  else
+    {
+      fprintf (f, "         Bits: value = "); print_hex (get_value (), f);
+      fprintf (f, ", mask = "); print_hex (get_mask (), f);
+      fprintf (f, "\n");
+    }
+}
+
 /* Print all ipcp_lattices of all functions to F.  */
 
 static void
@@ -484,6 +536,7 @@  print_all_lattices (FILE * f, bool dump_sources, bool dump_benefits)
 	  fprintf (f, "         ctxs: ");
 	  plats->ctxlat.print (f, dump_sources, dump_benefits);
 	  plats->alignment.print (f);
+	  plats->bits_lattice.print (f);
 	  if (plats->virt_call)
 	    fprintf (f, "        virt_call flag set\n");
 
@@ -911,6 +964,161 @@  ipcp_alignment_lattice::meet_with (const ipcp_alignment_lattice &other,
   return meet_with_1 (other.align, adjusted_misalign);
 }
 
+/* Set lattice value to bottom, if it already isn't the case.  */
+
+bool
+ipcp_bits_lattice::set_to_bottom ()
+{
+  if (bottom_p ())
+    return false;
+  lattice_val = IPA_BITS_VARYING;
+  value = 0;
+  mask = -1;
+  return true;
+}
+
+/* Set to constant if it isn't already. Only meant to be called
+   when switching state from TOP.  */
+
+bool
+ipcp_bits_lattice::set_to_constant (widest_int value, widest_int mask,
+				    signop sgn, unsigned precision)
+{
+  gcc_assert (top_p ());
+  this->lattice_val = IPA_BITS_CONSTANT;
+  this->value = value;
+  this->mask = mask;
+  this->sgn = sgn;
+  this->precision = precision;
+  return true;
+}
+
+/* Convert operand to value, mask form.  */
+
+void
+ipcp_bits_lattice::get_value_and_mask (tree operand, widest_int *valuep, widest_int *maskp)
+{
+  wide_int get_nonzero_bits (const_tree);
+
+  if (TREE_CODE (operand) == INTEGER_CST)
+    {
+      *valuep = wi::to_widest (operand); 
+      *maskp = 0;
+    }
+  else if (TREE_CODE (operand) == SSA_NAME)
+    {
+      *valuep = 0;
+      *maskp = widest_int::from (get_nonzero_bits (operand), UNSIGNED);
+    }
+  else
+    gcc_unreachable ();
+}  
+
+/* Meet operation, similar to ccp_lattice_meet, we xor values
+   if this->value, value have different values at same bit positions, we want
+   to drop that bit to varying. Return true if mask is changed.
+   This function assumes that the lattice value is in CONSTANT state  */
+
+bool
+ipcp_bits_lattice::meet_with_1 (widest_int value, widest_int mask)
+{
+  gcc_assert (constant_p ());
+  
+  widest_int old_mask = this->mask;
+  this->mask = (this->mask | mask) | (this->value ^ value);
+
+  if (wi::sext (this->mask, this->precision) == -1)
+    return set_to_bottom ();
+
+  bool changed = this->mask != old_mask;
+  return changed;
+}
+
+/* Meet the bits lattice with operand
+   described by <value, mask, sgn, precision.  */
+
+bool
+ipcp_bits_lattice::meet_with (widest_int value, widest_int mask,
+			      signop sgn, unsigned precision)
+{
+  if (bottom_p ())
+    return false;
+
+  if (top_p ())
+    {
+      if (wi::sext (mask, precision) == -1)
+	return set_to_bottom ();
+      return set_to_constant (value, mask, sgn, precision);
+    }
+
+  return meet_with_1 (value, mask);
+}
+
+/* Meet bits lattice with the result of bit_value_binop_1 (other, operand)
+   if code is binary operation or bit_value_unop_1 (other) if code is unary op.
+   In the case when code is nop_expr, no adjustment is required. */
+
+bool
+ipcp_bits_lattice::meet_with (ipcp_bits_lattice& other, enum tree_code code, tree operand)
+{
+  if (other.bottom_p ())
+    return set_to_bottom ();
+
+  if (bottom_p () || other.top_p ())
+    return false;
+
+  widest_int adjusted_value, adjusted_mask;
+
+  if (TREE_CODE_CLASS (code) == tcc_binary)
+    {
+      tree type = TREE_TYPE (operand);
+      gcc_assert (INTEGRAL_TYPE_P (type));
+      widest_int o_value, o_mask;
+      get_value_and_mask (operand, &o_value, &o_mask);
+
+      signop sgn = other.get_sign ();
+      unsigned prec = other.get_precision ();
+
+      bit_value_binop_1 (code, sgn, prec, &adjusted_value, &adjusted_mask,
+			 sgn, prec, other.get_value (), other.get_mask (),
+			 TYPE_SIGN (type), TYPE_PRECISION (type), o_value, o_mask);
+
+      if (wi::sext (adjusted_mask, prec) == -1)
+	return set_to_bottom ();
+    }
+
+  else if (TREE_CODE_CLASS (code) == tcc_unary)
+    {
+      signop sgn = other.get_sign ();
+      unsigned prec = other.get_precision ();
+
+      bit_value_unop_1 (code, sgn, prec, &adjusted_value,
+			&adjusted_mask, sgn, prec, other.get_value (),
+			other.get_mask ());
+
+      if (wi::sext (adjusted_mask, prec) == -1)
+	return set_to_bottom ();
+    }
+
+  else if (code == NOP_EXPR)
+    {
+      adjusted_value = other.value;
+      adjusted_mask = other.mask;
+    }
+
+  else
+    return set_to_bottom ();
+
+  if (top_p ())
+    {
+      if (wi::sext (adjusted_mask, other.get_precision ()) == -1)
+	return set_to_bottom ();
+      return set_to_constant (adjusted_value, adjusted_mask, other.get_sign (), other.get_precision ());
+    }
+  else
+    return meet_with_1 (adjusted_value, adjusted_mask);
+}
+
 /* Mark bot aggregate and scalar lattices as containing an unknown variable,
    return true is any of them has not been marked as such so far.  */
 
@@ -922,6 +1130,7 @@  set_all_contains_variable (struct ipcp_param_lattices *plats)
   ret |= plats->ctxlat.set_contains_variable ();
   ret |= set_agg_lats_contain_variable (plats);
   ret |= plats->alignment.set_to_bottom ();
+  ret |= plats->bits_lattice.set_to_bottom ();
   return ret;
 }
 
@@ -1003,6 +1212,7 @@  initialize_node_lattices (struct cgraph_node *node)
 	      plats->ctxlat.set_to_bottom ();
 	      set_agg_lats_to_bottom (plats);
 	      plats->alignment.set_to_bottom ();
+	      plats->bits_lattice.set_to_bottom ();
 	    }
 	  else
 	    set_all_contains_variable (plats);
@@ -1621,6 +1831,57 @@  propagate_alignment_accross_jump_function (cgraph_edge *cs,
     }
 }
 
+/* Propagate bits across jfunc that is associated with
+   edge cs and update dest_lattice accordingly.  */
+
+bool
+propagate_bits_accross_jump_function (cgraph_edge *cs, ipa_jump_func *jfunc,
+				      ipcp_bits_lattice *dest_lattice)
+{
+  if (dest_lattice->bottom_p ())
+    return false;
+
+  if (jfunc->type == IPA_JF_PASS_THROUGH)
+    {
+      struct ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
+      enum tree_code code = ipa_get_jf_pass_through_operation (jfunc);
+      tree operand = NULL_TREE;
+
+      if (code != NOP_EXPR)
+	operand = ipa_get_jf_pass_through_operand (jfunc);
+
+      int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
+      struct ipcp_param_lattices *src_lats
+	= ipa_get_parm_lattices (caller_info, src_idx);
+
+      /* Try to proapgate bits if src_lattice is bottom, but jfunc is known.
+	 for eg consider:
+	 int f(int x)
+	 {
+	   g (x & 0xff);
+	 }
+	 Assume lattice for x is bottom, however we can still propagate
+	 result of x & 0xff == 0xff, which gets computed during ccp1 pass
+	 and we store it in jump function during analysis stage.  */
+
+      if (src_lats->bits_lattice.bottom_p ()
+	  && jfunc->bits.known)
+	return dest_lattice->meet_with (jfunc->bits.value, jfunc->bits.mask,
+				        jfunc->bits.sgn, jfunc->bits.precision);
+      else
+	return dest_lattice->meet_with (src_lats->bits_lattice, code, operand);
+    }
+
+  else if (jfunc->type == IPA_JF_ANCESTOR)
+    return dest_lattice->set_to_bottom ();
+
+  else if (jfunc->bits.known) 
+    return dest_lattice->meet_with (jfunc->bits.value, jfunc->bits.mask,
+				    jfunc->bits.sgn, jfunc->bits.precision);
+  else
+    return dest_lattice->set_to_bottom ();
+}
+
 /* If DEST_PLATS already has aggregate items, check that aggs_by_ref matches
    NEW_AGGS_BY_REF and if not, mark all aggs as bottoms and return true (in all
    other cases, return false).  If there are no aggregate items, set
@@ -1968,6 +2229,8 @@  propagate_constants_accross_call (struct cgraph_edge *cs)
 							  &dest_plats->ctxlat);
 	  ret |= propagate_alignment_accross_jump_function (cs, jump_func,
 							 &dest_plats->alignment);
+	  ret |= propagate_bits_accross_jump_function (cs, jump_func,
+						       &dest_plats->bits_lattice);
 	  ret |= propagate_aggs_accross_jump_function (cs, jump_func,
 						       dest_plats);
 	}
@@ -4592,6 +4855,74 @@  ipcp_store_alignment_results (void)
   }
 }
 
+/* Look up all the bits information that we have discovered and copy it over
+   to the transformation summary.  */
+
+static void
+ipcp_store_bits_results (void)
+{
+  cgraph_node *node;
+
+  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+    {
+      ipa_node_params *info = IPA_NODE_REF (node);
+      bool dumped_sth = false;
+      bool found_useful_result = false;
+
+      if (info->ipcp_orig_node)
+	info = IPA_NODE_REF (info->ipcp_orig_node);
+
+      unsigned count = ipa_get_param_count (info);
+      for (unsigned i = 0; i < count; i++)
+	{
+	  ipcp_param_lattices *plats = ipa_get_parm_lattices (info, i);
+	  if (plats->bits_lattice.constant_p ())
+	    {
+	      found_useful_result = true;
+	      break;
+	    }
+	}
+
+    if (!found_useful_result)
+      continue;
+
+    ipcp_grow_transformations_if_necessary ();
+    ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+    vec_safe_reserve_exact (ts->bits, count);
+
+    for (unsigned i = 0; i < count; i++)
+      {
+	ipcp_param_lattices *plats = ipa_get_parm_lattices (info, i);
+	ipa_bits bits_jfunc;			 
+
+	if (plats->bits_lattice.constant_p ())
+	  {
+	    bits_jfunc.known = true;
+	    bits_jfunc.value = plats->bits_lattice.get_value ();
+	    bits_jfunc.mask = plats->bits_lattice.get_mask ();
+	    bits_jfunc.sgn = plats->bits_lattice.get_sign ();
+	    bits_jfunc.precision = plats->bits_lattice.get_precision ();
+	  }
+	else
+	  bits_jfunc.known = false;
+
+	ts->bits->quick_push (bits_jfunc);
+	if (!dump_file || !bits_jfunc.known)
+	  continue;
+	if (!dumped_sth)
+	  {
+	    fprintf (dump_file, "Propagated bits info for function %s/%i:\n",
+				node->name (), node->order);
+	    dumped_sth = true;
+	  }
+	fprintf (dump_file, " param %i: value = ", i);
+	print_hex (bits_jfunc.value, dump_file);
+	fprintf (dump_file, ", mask = ");
+	print_hex (bits_jfunc.mask, dump_file);
+	fprintf (dump_file, "\n");
+      }
+    }
+}
 /* The IPCP driver.  */
 
 static unsigned int
@@ -4625,6 +4956,8 @@  ipcp_driver (void)
   ipcp_decision_stage (&topo);
   /* Store results of alignment propagation. */
   ipcp_store_alignment_results ();
+  /* Store results of bits propagation.  */
+  ipcp_store_bits_results ();
 
   /* Free all IPCP structures.  */
   free_toporder_info (&topo);
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 132b622..0913cc5 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -302,6 +302,15 @@  ipa_print_node_jump_functions_for_edge (FILE *f, struct cgraph_edge *cs)
 	}
       else
 	fprintf (f, "         Unknown alignment\n");
+
+      if (jump_func->bits.known)
+	{
+	  fprintf (f, "         value: "); print_hex (jump_func->bits.value, f);
+	  fprintf (f, ", mask: "); print_hex (jump_func->bits.mask, f);
+	  fprintf (f, "\n");
+	}
+      else
+	fprintf (f, "         Unknown bits\n");
     }
 }
 
@@ -381,6 +390,7 @@  ipa_set_jf_unknown (struct ipa_jump_func *jfunc)
 {
   jfunc->type = IPA_JF_UNKNOWN;
   jfunc->alignment.known = false;
+  jfunc->bits.known = false;
 }
 
 /* Set JFUNC to be a copy of another jmp (to be used by jump function
@@ -1674,6 +1684,27 @@  ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
       else
 	gcc_assert (!jfunc->alignment.known);
 
+      if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
+	  && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
+	{
+	  jfunc->bits.known = true;
+	  jfunc->bits.sgn = TYPE_SIGN (TREE_TYPE (arg));
+	  jfunc->bits.precision = TYPE_PRECISION (TREE_TYPE (arg));
+	  
+	  if (TREE_CODE (arg) == SSA_NAME)
+	    {
+	      jfunc->bits.value = 0;
+	      jfunc->bits.mask = widest_int::from (get_nonzero_bits (arg), UNSIGNED); 
+	    }
+	  else
+	    {
+	      jfunc->bits.value = wi::to_widest (arg);
+	      jfunc->bits.mask = 0;
+	    }
+	}
+      else
+	gcc_assert (!jfunc->bits.known);
+
       if (is_gimple_ip_invariant (arg)
 	  || (TREE_CODE (arg) == VAR_DECL
 	      && is_global_var (arg)
@@ -3690,6 +3721,18 @@  ipa_node_params_t::duplicate(cgraph_node *src, cgraph_node *dst,
       for (unsigned i = 0; i < src_alignments->length (); ++i)
 	dst_alignments->quick_push ((*src_alignments)[i]);
     }
+
+  if (src_trans && vec_safe_length (src_trans->bits) > 0)
+    {
+      ipcp_grow_transformations_if_necessary ();
+      src_trans = ipcp_get_transformation_summary (src);
+      const vec<ipa_bits, va_gc> *src_bits = src_trans->bits;
+      vec<ipa_bits, va_gc> *&dst_bits
+	= ipcp_get_transformation_summary (dst)->bits;
+      vec_safe_reserve_exact (dst_bits, src_bits->length ());
+      for (unsigned i = 0; i < src_bits->length (); ++i)
+	dst_bits->quick_push ((*src_bits)[i]);
+    }
 }
 
 /* Register our cgraph hooks if they are not already there.  */
@@ -4609,6 +4652,17 @@  ipa_write_jump_function (struct output_block *ob,
       streamer_write_uhwi (ob, jump_func->alignment.align);
       streamer_write_uhwi (ob, jump_func->alignment.misalign);
     }
+
+  bp = bitpack_create (ob->main_stream);
+  bp_pack_value (&bp, jump_func->bits.known, 1);
+  streamer_write_bitpack (&bp);
+  if (jump_func->bits.known)
+    {
+      streamer_write_wi (ob, jump_func->bits.value);
+      streamer_write_wi (ob, jump_func->bits.mask);
+      streamer_write_enum (ob->main_stream, signop, UNSIGNED + 1, jump_func->bits.sgn);
+      streamer_write_uhwi (ob, jump_func->bits.precision);
+    }   
 }
 
 /* Read in jump function JUMP_FUNC from IB.  */
@@ -4685,6 +4739,19 @@  ipa_read_jump_function (struct lto_input_block *ib,
     }
   else
     jump_func->alignment.known = false;
+
+  bp = streamer_read_bitpack (ib);
+  bool bits_known = bp_unpack_value (&bp, 1);
+  if (bits_known)
+    {
+      jump_func->bits.known = true;
+      jump_func->bits.value = streamer_read_wi (ib);
+      jump_func->bits.mask = streamer_read_wi (ib);
+      jump_func->bits.sgn = streamer_read_enum (ib, signop, UNSIGNED + 1);
+      jump_func->bits.precision = streamer_read_uhwi (ib); 
+    }
+  else
+    jump_func->bits.known = false;
 }
 
 /* Stream out parts of cgraph_indirect_call_info corresponding to CS that are
@@ -5050,6 +5117,31 @@  write_ipcp_transformation_info (output_block *ob, cgraph_node *node)
     }
   else
     streamer_write_uhwi (ob, 0);
+
+  ts = ipcp_get_transformation_summary (node);
+  if (ts && vec_safe_length (ts->bits) > 0)
+    {
+      count = ts->bits->length ();
+      streamer_write_uhwi (ob, count);
+
+      for (unsigned i = 0; i < count; ++i)
+	{
+	  const ipa_bits& bits_jfunc = (*ts->bits)[i];
+	  struct bitpack_d bp = bitpack_create (ob->main_stream);
+	  bp_pack_value (&bp, bits_jfunc.known, 1);
+	  streamer_write_bitpack (&bp);
+	  if (bits_jfunc.known)
+	    {
+	      streamer_write_wi (ob, bits_jfunc.value);
+	      streamer_write_wi (ob, bits_jfunc.mask);
+	      streamer_write_enum (ob->main_stream, signop,
+				   UNSIGNED + 1, bits_jfunc.sgn);
+	      streamer_write_uhwi (ob, bits_jfunc.precision);
+	    }
+	}
+    }
+  else
+    streamer_write_uhwi (ob, 0);
 }
 
 /* Stream in the aggregate value replacement chain for NODE from IB.  */
@@ -5102,6 +5194,28 @@  read_ipcp_transformation_info (lto_input_block *ib, cgraph_node *node,
 	    }
 	}
     }
+
+  count = streamer_read_uhwi (ib);
+  if (count > 0)
+    {
+      ipcp_grow_transformations_if_necessary ();
+      ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+      vec_safe_grow_cleared (ts->bits, count);
+
+      for (i = 0; i < count; i++)
+	{
+	  ipa_bits& bits_jfunc = (*ts->bits)[i];
+	  struct bitpack_d bp = streamer_read_bitpack (ib);
+	  bits_jfunc.known = bp_unpack_value (&bp, 1);
+	  if (bits_jfunc.known)
+	    {
+	      bits_jfunc.value = streamer_read_wi (ib);
+	      bits_jfunc.mask = streamer_read_wi (ib);
+	      bits_jfunc.sgn = streamer_read_enum (ib, signop, UNSIGNED + 1);
+	      bits_jfunc.precision = streamer_read_uhwi (ib);
+	    }
+	}
+    }
 }
 
 /* Write all aggregate replacement for nodes in set.  */
@@ -5404,6 +5518,55 @@  ipcp_update_alignments (struct cgraph_node *node)
     }
 }
 
+/* Update bits info of formal parameters as described in
+   ipcp_transformation_summary.  */
+
+static void
+ipcp_update_bits (struct cgraph_node *node)
+{
+  tree parm = DECL_ARGUMENTS (node->decl);
+  tree next_parm = parm;
+  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+
+  if (!ts || vec_safe_length (ts->bits) == 0)
+    return;
+
+  vec<ipa_bits, va_gc> &bits = *ts->bits;
+  unsigned count = bits.length ();
+
+  for (unsigned i = 0; i < count; ++i, parm = next_parm)
+    {
+      if (node->clone.combined_args_to_skip
+	  && bitmap_bit_p (node->clone.combined_args_to_skip, i))
+	continue;
+
+      gcc_checking_assert (parm);
+      next_parm = DECL_CHAIN (parm);
+
+      if (!bits[i].known
+	  || !INTEGRAL_TYPE_P (TREE_TYPE (parm))
+	  || !is_gimple_reg (parm))
+	continue;       
+
+      tree ddef = ssa_default_def (DECL_STRUCT_FUNCTION (node->decl), parm);
+      if (!ddef)
+	continue;
+
+      if (dump_file)
+	{
+	  fprintf (dump_file, "Adjusting mask for param %u to ", i); 
+	  print_hex (bits[i].mask, dump_file);
+	  fprintf (dump_file, "\n");
+	}
+
+      unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef));
+      wide_int nonzero_bits = wide_int::from (bits[i].mask, prec, UNSIGNED)
+			      | wide_int::from (bits[i].value, prec, bits[i].sgn);
+      set_nonzero_bits (ddef, nonzero_bits);
+      DECL_SET_BY_IPA (parm) = 1;
+    }
+}
+
 /* IPCP transformation phase doing propagation of aggregate values.  */
 
 unsigned int
@@ -5423,6 +5586,7 @@  ipcp_transform_function (struct cgraph_node *node)
 	     node->name (), node->order);
 
   ipcp_update_alignments (node);
+  ipcp_update_bits (node);
   aggval = ipa_get_agg_replacements_for_node (node);
   if (!aggval)
       return 0;
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index e32d078..d69a071 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -154,6 +154,16 @@  struct GTY(()) ipa_alignment
   unsigned misalign;
 };
 
+/* Information about zero/non-zero bits.  */
+struct GTY(()) ipa_bits
+{
+  bool known;
+  widest_int value;
+  widest_int mask;
+  enum signop sgn;
+  unsigned precision;
+};
+
 /* A jump function for a callsite represents the values passed as actual
    arguments of the callsite. See enum jump_func_type for the various
    types of jump functions supported.  */
@@ -166,6 +176,9 @@  struct GTY (()) ipa_jump_func
   /* Information about alignment of pointers. */
   struct ipa_alignment alignment;
 
+  /* Information about zero/non-zero bits.  */
+  struct ipa_bits bits;
+
   enum jump_func_type type;
   /* Represents a value of a jump function.  pass_through is used only in jump
      function context.  constant represents the actual constant in constant jump
@@ -482,6 +495,8 @@  struct GTY(()) ipcp_transformation_summary
   ipa_agg_replacement_value *agg_values;
   /* Alignment information for pointers.  */
   vec<ipa_alignment, va_gc> *alignments;
+  /* Known bits information.  */
+  vec<ipa_bits, va_gc> *bits;
 };
 
 void ipa_set_node_agg_value_chain (struct cgraph_node *node,
diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
index 1d56d21..01462e2 100644
--- a/gcc/lto-streamer-in.c
+++ b/gcc/lto-streamer-in.c
@@ -712,7 +712,7 @@  make_new_block (struct function *fn, unsigned int index)
 
 /* Read a wide-int.  */
 
-static widest_int
+widest_int
 streamer_read_wi (struct lto_input_block *ib)
 {
   HOST_WIDE_INT a[WIDE_INT_MAX_ELTS];
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index aa6b589..8fbd882 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -1830,7 +1830,7 @@  output_ssa_names (struct output_block *ob, struct function *fn)
 
 /* Output a wide-int.  */
 
-static void
+void
 streamer_write_wi (struct output_block *ob,
 		   const widest_int &w)
 {
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index ecc1e5d..4da89d0 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -1225,4 +1225,7 @@  DEFINE_DECL_STREAM_FUNCS (TYPE_DECL, type_decl)
 DEFINE_DECL_STREAM_FUNCS (NAMESPACE_DECL, namespace_decl)
 DEFINE_DECL_STREAM_FUNCS (LABEL_DECL, label_decl)
 
+widest_int streamer_read_wi (struct lto_input_block *);
+void streamer_write_wi (struct output_block *, const widest_int &);
+
 #endif /* GCC_LTO_STREAMER_H  */
diff --git a/gcc/testsuite/gcc.dg/ipa/prop-bits-1.c b/gcc/testsuite/gcc.dg/ipa/prop-bits-1.c
new file mode 100644
index 0000000..2389d9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/prop-bits-1.c
@@ -0,0 +1,33 @@ 
+/* Propagate 0xff from main to f3 to f2.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-cp -fdump-tree-optimized" } */
+
+int pass_test(void);
+int fail_test(void);
+
+__attribute__((noinline, noclone))
+static int f2(int x)
+{
+  if (x > 300)
+    return fail_test();
+  else
+    return pass_test();
+}
+
+__attribute__((noinline, noclone))
+static int f3(int y)
+{
+  int k = f2(y);
+  return k;
+}
+
+int main(int argc)
+{
+  int k = argc & 0xff;
+  int a = f3(k);
+  return a; 
+}
+
+/* { dg-final { scan-ipa-dump-times "Adjusting mask for param 0 to 0xff" 2 "cp" } } */
+/* { dg-final { scan-tree-dump-not "fail_test" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/prop-bits-2.c b/gcc/testsuite/gcc.dg/ipa/prop-bits-2.c
new file mode 100644
index 0000000..8704cce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/prop-bits-2.c
@@ -0,0 +1,37 @@ 
+/* x's mask should be meet(0xc, 0x3) == 0xf  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-cp" } */
+
+__attribute__((noinline))
+static int f1(int x)
+{
+  if (x > 300)
+    return 1;
+  else
+    return 2;
+}
+
+__attribute__((noinline))
+static int f2(int y)
+{
+  return f1(y & 0x03);
+}
+
+__attribute__((noinline))
+static int f3(int z)
+{
+  return f1(z & 0xc);
+}
+
+extern int a;
+extern int b;
+
+int main(void)
+{
+  int k = f2(a); 
+  int l = f3(b);
+  return k + l;
+}
+
+/* { dg-final { scan-ipa-dump "Adjusting mask for param 0 to 0xf" "cp" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/prop-bits-3.c b/gcc/testsuite/gcc.dg/ipa/prop-bits-3.c
new file mode 100644
index 0000000..1d0a2ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/prop-bits-3.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-cp" } */
+
+__attribute__((noinline))
+static int f(int x)
+{
+  int f2(int);
+
+  if (x > 300)
+    {
+      int z = f(x + 1);
+      return f2 (z);
+    }
+  else
+    return 2;
+}
+
+int main(int argc, char **argv)
+{
+  int k = f(argc & 0xff); 
+  return k;
+}
+
+/* { dg-final { scan-ipa-dump-not "Adjusting mask for" "cp" } } */  
diff --git a/gcc/testsuite/gcc.dg/ipa/prop-bits-4.c b/gcc/testsuite/gcc.dg/ipa/prop-bits-4.c
new file mode 100644
index 0000000..135cde9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/prop-bits-4.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-cp" } */
+
+__attribute__((noinline)) 
+static int f(int x)
+{
+  if (x > 300)
+    return 1;
+  else
+    return 2;
+}
+
+int main(void)
+{
+  int a = f(1);
+  int b = f(2);
+  int c = f(4);
+  return a + b + c;
+}
+
+/* { dg-final { scan-ipa-dump "Adjusting mask for param 0 to 0x7" "cp" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/prop-bits-5.c b/gcc/testsuite/gcc.dg/ipa/prop-bits-5.c
new file mode 100644
index 0000000..731f307
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/prop-bits-5.c
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-cp" } */
+
+__attribute__((noinline))
+static int f1(int x)
+{
+  if (x > 20)
+    return 1;
+  else
+    return 2;
+}
+
+__attribute__((noinline))
+static int f2(int y)
+{
+  return f1 (y & 0x3);
+}
+
+int main(int argc, char **argv)
+{
+  int z = f2 (argc & 0xff);
+  int k = f1 (argc & 0xc);
+  return z + k;
+}
+
+/* { dg-final { scan-ipa-dump "Adjusting mask for param 0 to 0xf" "cp" } } */
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 6e8595c..9369cf7 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1558,7 +1558,8 @@  struct GTY(()) tree_decl_common {
   /* DECL_ALIGN.  It should have the same size as TYPE_ALIGN.  */
   unsigned int align : 6;
 
-  /* 20 bits unused.  */
+  unsigned set_by_ipa: 1;
+  /* 19 bits unused.  */
 
   /* UID for points-to sets, stable over copying from inlining.  */
   unsigned int pt_uid;
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index ae120a8..c75e9ce 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -142,7 +142,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "stor-layout.h"
 #include "optabs-query.h"
-
+#include "tree-ssa-ccp.h"
 
 /* Possible lattice values.  */
 typedef enum
@@ -287,7 +287,11 @@  get_default_value (tree var)
 		{
 		  val.lattice_val = CONSTANT;
 		  val.value = build_zero_cst (TREE_TYPE (var));
-		  val.mask = extend_mask (nonzero_bits);
+		  if (SSA_NAME_VAR (var) && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
+		      && DECL_SET_BY_IPA (SSA_NAME_VAR (var)))
+		    val.mask = widest_int::from (nonzero_bits, TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var))));
+		  else
+		    val.mask = extend_mask (nonzero_bits);
 		}
 	    }
 	}
@@ -537,9 +541,9 @@  set_lattice_value (tree var, ccp_prop_value_t *new_val)
 
 static ccp_prop_value_t get_value_for_expr (tree, bool);
 static ccp_prop_value_t bit_value_binop (enum tree_code, tree, tree, tree);
-static void bit_value_binop_1 (enum tree_code, tree, widest_int *, widest_int *,
-			       tree, const widest_int &, const widest_int &,
-			       tree, const widest_int &, const widest_int &);
+void bit_value_binop_1 (enum tree_code, signop, unsigned, widest_int *, widest_int *,
+			signop, unsigned, const widest_int &, const widest_int &,
+			signop, unsigned, const widest_int &, const widest_int &);
 
 /* Return a widest_int that can be used for bitwise simplifications
    from VAL.  */
@@ -895,7 +899,7 @@  do_dbg_cnt (void)
    Return TRUE when something was optimized.  */
 
 static bool
-ccp_finalize (bool nonzero_p)
+ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED)
 {
   bool something_changed;
   unsigned i;
@@ -913,10 +917,7 @@  ccp_finalize (bool nonzero_p)
 
       if (!name
 	  || (!POINTER_TYPE_P (TREE_TYPE (name))
-	      && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
-		  /* Don't record nonzero bits before IPA to avoid
-		     using too much memory.  */
-		  || !nonzero_p)))
+	      && (!INTEGRAL_TYPE_P (TREE_TYPE (name)))))
 	continue;
 
       val = get_value (name);
@@ -1225,10 +1226,11 @@  ccp_fold (gimple *stmt)
    RVAL and RMASK representing a value of type RTYPE and set
    the value, mask pair *VAL and *MASK to the result.  */
 
-static void
-bit_value_unop_1 (enum tree_code code, tree type,
+void
+bit_value_unop_1 (enum tree_code code, signop type_sgn, unsigned type_precision, 
 		  widest_int *val, widest_int *mask,
-		  tree rtype, const widest_int &rval, const widest_int &rmask)
+		  signop rtype_sgn, unsigned rtype_precision,
+		  const widest_int &rval, const widest_int &rmask)
 {
   switch (code)
     {
@@ -1241,25 +1243,23 @@  bit_value_unop_1 (enum tree_code code, tree type,
       {
 	widest_int temv, temm;
 	/* Return ~rval + 1.  */
-	bit_value_unop_1 (BIT_NOT_EXPR, type, &temv, &temm, type, rval, rmask);
-	bit_value_binop_1 (PLUS_EXPR, type, val, mask,
-			   type, temv, temm, type, 1, 0);
+	bit_value_unop_1 (BIT_NOT_EXPR, type_sgn, type_precision, &temv, &temm,
+			  type_sgn, type_precision, rval, rmask);
+	bit_value_binop_1 (PLUS_EXPR, type_sgn, type_precision, val, mask,
+			   type_sgn, type_precision, temv, temm,
+			   type_sgn, type_precision, 1, 0);
 	break;
       }
 
     CASE_CONVERT:
       {
-	signop sgn;
-
 	/* First extend mask and value according to the original type.  */
-	sgn = TYPE_SIGN (rtype);
-	*mask = wi::ext (rmask, TYPE_PRECISION (rtype), sgn);
-	*val = wi::ext (rval, TYPE_PRECISION (rtype), sgn);
+	*mask = wi::ext (rmask, rtype_precision, rtype_sgn);
+	*val = wi::ext (rval, rtype_precision, rtype_sgn);
 
 	/* Then extend mask and value according to the target type.  */
-	sgn = TYPE_SIGN (type);
-	*mask = wi::ext (*mask, TYPE_PRECISION (type), sgn);
-	*val = wi::ext (*val, TYPE_PRECISION (type), sgn);
+	*mask = wi::ext (*mask, type_precision, type_sgn);
+	*val = wi::ext (*val, type_precision, type_sgn);
 	break;
       }
 
@@ -1273,15 +1273,16 @@  bit_value_unop_1 (enum tree_code code, tree type,
    R1VAL, R1MASK and R2VAL, R2MASK representing a values of type R1TYPE
    and R2TYPE and set the value, mask pair *VAL and *MASK to the result.  */
 
-static void
-bit_value_binop_1 (enum tree_code code, tree type,
+void
+bit_value_binop_1 (enum tree_code code, signop type_sgn, unsigned type_precision,
 		   widest_int *val, widest_int *mask,
-		   tree r1type, const widest_int &r1val,
-		   const widest_int &r1mask, tree r2type,
+		   signop r1type_sgn, unsigned r1type_precision,
+		   const widest_int &r1val, const widest_int &r1mask,
+		   signop r2type_sgn, unsigned r2type_precision,
 		   const widest_int &r2val, const widest_int &r2mask)
 {
-  signop sgn = TYPE_SIGN (type);
-  int width = TYPE_PRECISION (type);
+  signop sgn = type_sgn;
+  int width = (int) type_precision;
   bool swap_p = false;
 
   /* Assume we'll get a constant result.  Use an initial non varying
@@ -1407,11 +1408,11 @@  bit_value_binop_1 (enum tree_code code, tree type,
     case MINUS_EXPR:
       {
 	widest_int temv, temm;
-	bit_value_unop_1 (NEGATE_EXPR, r2type, &temv, &temm,
-			  r2type, r2val, r2mask);
-	bit_value_binop_1 (PLUS_EXPR, type, val, mask,
-			   r1type, r1val, r1mask,
-			   r2type, temv, temm);
+	bit_value_unop_1 (NEGATE_EXPR, r2type_sgn, r2type_precision, &temv, &temm,
+			  r2type_sgn, r2type_precision, r2val, r2mask);
+	bit_value_binop_1 (PLUS_EXPR, type_sgn, type_precision, val, mask,
+			   r1type_sgn, r1type_precision, r1val, r1mask,
+			   r2type_sgn, r2type_precision, temv, temm);
 	break;
       }
 
@@ -1473,7 +1474,7 @@  bit_value_binop_1 (enum tree_code code, tree type,
 	  break;
 
 	/* For comparisons the signedness is in the comparison operands.  */
-	sgn = TYPE_SIGN (r1type);
+	sgn = r1type_sgn;
 
 	/* If we know the most significant bits we know the values
 	   value ranges by means of treating varying bits as zero
@@ -1526,8 +1527,9 @@  bit_value_unop (enum tree_code code, tree type, tree rhs)
   gcc_assert ((rval.lattice_val == CONSTANT
 	       && TREE_CODE (rval.value) == INTEGER_CST)
 	      || wi::sext (rval.mask, TYPE_PRECISION (TREE_TYPE (rhs))) == -1);
-  bit_value_unop_1 (code, type, &value, &mask,
-		    TREE_TYPE (rhs), value_to_wide_int (rval), rval.mask);
+  bit_value_unop_1 (code, TYPE_SIGN (type), TYPE_PRECISION (type), &value, &mask,
+		    TYPE_SIGN (TREE_TYPE (rhs)), TYPE_PRECISION (TREE_TYPE (rhs)),
+		    value_to_wide_int (rval), rval.mask);
   if (wi::sext (mask, TYPE_PRECISION (type)) != -1)
     {
       val.lattice_val = CONSTANT;
@@ -1572,9 +1574,11 @@  bit_value_binop (enum tree_code code, tree type, tree rhs1, tree rhs2)
 	       && TREE_CODE (r2val.value) == INTEGER_CST)
 	      || wi::sext (r2val.mask,
 			   TYPE_PRECISION (TREE_TYPE (rhs2))) == -1);
-  bit_value_binop_1 (code, type, &value, &mask,
-		     TREE_TYPE (rhs1), value_to_wide_int (r1val), r1val.mask,
-		     TREE_TYPE (rhs2), value_to_wide_int (r2val), r2val.mask);
+  bit_value_binop_1 (code, TYPE_SIGN (type), TYPE_PRECISION (type), &value, &mask,
+		     TYPE_SIGN (TREE_TYPE (rhs1)), TYPE_PRECISION (TREE_TYPE (rhs1)),
+		     value_to_wide_int (r1val), r1val.mask,
+		     TYPE_SIGN (TREE_TYPE (rhs2)), TYPE_PRECISION (TREE_TYPE (rhs2)),
+		     value_to_wide_int (r2val), r2val.mask);
   if (wi::sext (mask, TYPE_PRECISION (type)) != -1)
     {
       val.lattice_val = CONSTANT;
@@ -1673,9 +1677,9 @@  bit_value_assume_aligned (gimple *stmt, tree attr, ccp_prop_value_t ptrval,
 
   align = build_int_cst_type (type, -aligni);
   alignval = get_value_for_expr (align, true);
-  bit_value_binop_1 (BIT_AND_EXPR, type, &value, &mask,
-		     type, value_to_wide_int (ptrval), ptrval.mask,
-		     type, value_to_wide_int (alignval), alignval.mask);
+  bit_value_binop_1 (BIT_AND_EXPR, TYPE_SIGN (type), TYPE_PRECISION (type), &value, &mask,
+		     TYPE_SIGN (type), TYPE_PRECISION (type), value_to_wide_int (ptrval), ptrval.mask,
+		     TYPE_SIGN (type), TYPE_PRECISION (type), value_to_wide_int (alignval), alignval.mask);
   if (wi::sext (mask, TYPE_PRECISION (type)) != -1)
     {
       val.lattice_val = CONSTANT;
diff --git a/gcc/tree-ssa-ccp.h b/gcc/tree-ssa-ccp.h
new file mode 100644
index 0000000..b76a834
--- /dev/null
+++ b/gcc/tree-ssa-ccp.h
@@ -0,0 +1,30 @@ 
+/* Copyright (C) 2016-2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef TREE_SSA_CCP_H
+#define TREE_SSA_CCP_H
+
+void bit_value_binop_1 (enum tree_code, signop, unsigned, widest_int *, widest_int *,
+			signop, unsigned, const widest_int &, const widest_int &,
+			signop, unsigned, const widest_int &, const widest_int &);
+
+void bit_value_unop_1 (enum tree_code, signop, unsigned, widest_int *, widest_int *,
+		       signop, unsigned, const widest_int &, const widest_int &);
+
+
+#endif
diff --git a/gcc/tree.h b/gcc/tree.h
index fff65d6..e21b31d 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2346,6 +2346,9 @@  extern machine_mode element_mode (const_tree t);
 #define DECL_IGNORED_P(NODE) \
   (DECL_COMMON_CHECK (NODE)->decl_common.ignored_flag)
 
+#define DECL_SET_BY_IPA(NODE) \
+  (DECL_COMMON_CHECK (NODE)->decl_common.set_by_ipa)
+
 /* Nonzero for a given ..._DECL node means that this node represents an
    "abstract instance" of the given declaration (e.g. in the original
    declaration of an inline function).  When generating symbolic debugging