Patchwork new sign/zero extension elimination pass

login
register
mail settings
Submitter Tom de Vries
Date July 11, 2012, 10:30 a.m.
Message ID <4FFD55B4.6060500@codesourcery.com>
Download mbox | patch
Permalink /patch/170420/
State New
Headers show

Comments

Tom de Vries - July 11, 2012, 10:30 a.m.
On 13/11/10 10:50, Eric Botcazou wrote:
>> I profiled the pass on spec2000:
>>
>>                     -mabi=32     -mabi=64
>> ee-pass (usr time):     0.70         1.16
>> total   (usr time):   919.30       879.26
>> ee-pass        (%):     0.08         0.13
>>
>> The pass takes 0.13% or less of the total usr runtime.
> 
> For how many hits?  What are the numbers with --param ee-max-propagate=0?
> 
>> Is it necessary to improve the runtime of this pass?
> 
> I've already given my opinion about the implementation.  The other passes in 
> the compiler try hard not to rescan everything when a single bit changes; as 
> currently written, yours doesn't.
> 

Eric,

I've done the following:
- refactored the pass such that it now scans at most twice over all
  instructions.
- updated the patch to be applicable to current trunk
- updated the motivating example to a more applicable one (as discussed in
  this thread), and added that one as test-case.
- added a part in the header comment illustrating the working of the pass
  on the motivating example.

bootstrapped and reg-tested on x86_64 and i686.

build and reg-tested on mips, mips64, and arm.

OK for trunk?

Thanks,
- Tom

2012-07-10  Tom de Vries  <tom@codesourcery.com>

	* ee.c: New file.
	* tree-pass.h (pass_ee): Declare.
	* opts.c ( default_options_table): Set flag_ee at -O2.
	* timevar.def (TV_EE): New timevar.
	* common.opt (fextension-elimination): New option.
	* Makefile.in (ee.o): New rule.
	* passes.c (pass_ee): Add it.

	* gcc.dg/extend-1.c: New test.
	* gcc.dg/extend-2.c: Same.
	* gcc.dg/extend-2-64.c: Same.
	* gcc.dg/extend-3.c: Same.
	* gcc.dg/extend-4.c: Same.
	* gcc.dg/extend-5.c: Same.
	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
Jakub Jelinek - July 11, 2012, 11:41 a.m.
On Wed, Jul 11, 2012 at 12:30:12PM +0200, Tom de Vries wrote:
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
> 
> bootstrapped and reg-tested on x86_64 and i686.
> 
> build and reg-tested on mips, mips64, and arm.

How does this relate to the ree.c pass we already have?
Why is not REE sufficient for you?  Having two different zero/sign extension
elimination passes would be really wierd.

	Jakub
Tom de Vries - July 11, 2012, 1:01 p.m.
On 11/07/12 13:41, Jakub Jelinek wrote:
> On Wed, Jul 11, 2012 at 12:30:12PM +0200, Tom de Vries wrote:
>> I've done the following:
>> - refactored the pass such that it now scans at most twice over all
>>   instructions.
>> - updated the patch to be applicable to current trunk
>> - updated the motivating example to a more applicable one (as discussed in
>>   this thread), and added that one as test-case.
>> - added a part in the header comment illustrating the working of the pass
>>   on the motivating example.
>>
>> bootstrapped and reg-tested on x86_64 and i686.
>>
>> build and reg-tested on mips, mips64, and arm.
> 
> How does this relate to the ree.c pass we already have?
> Why is not REE sufficient for you?  Having two different zero/sign extension
> elimination passes would be really wierd.
> 

Jakub,

pass_ree eliminates extensions by merging an extension with the definitions of
the register it extends. So pass_ree can perhaps be described as an inter-bb
combiner that is targeted at extensions.

AFAIU there is no analysis in pass_ree that concludes that an extension is
redundant. If it manages to combine an extension with all the insns feeding into
the extension, the extension has been made redundant.
Both redundant and non-redundant extensions can be eliminated by pass_ree.

pass_ee does an analysis of what parts of registers are used, and concludes
based on that analysis that an extension is redundant, meaning it can be
replaced by a regcopy without changing the semantics of the program.

We currently describe pass_ree as 'redundant extension elimination',
and pass_ee as 'extension elimination'.
Perhaps 'redundant extension elimination' is a more appropriate name for pass_ee
and pass_ree is better described as 'extension combiner'.

In the motivating example of the pass, there are 2 extensions. Those extensions
cannot be combined into the insns feeding into them (or into the insns they feed
into). They are redundant however, something which is analyzed by pass_ee. It
replaces the 2 extensions with regcopies, and in the resulting assembly the 2
redundant extensions are removed.

Thanks,
- Tom

> 	Jakub
>
Kenneth Zadeck - July 12, 2012, 1:51 a.m.
Tom,

I have a problem with the approach that you have taken here.   I believe 
that this could be a very useful addition to gcc so I am in general very 
supportive, but i think you are missing an important case.

My problem is that it the pass does not actually look at the target and 
make any decisions based on that target.

for instance, we have a llp64 target.   As with many targets, the target 
has a rich set of compare and branch instructions.  In particular, it 
can do both 32 and 64 bit comparisons.    We see that many of the 
upstream optimizations that take int (SI mode) index variables generate 
extension operations before doing 64 bit compare and branch 
instructions, even though there are 32 bit comparison and branches on 
the machine.     There are a lot of machines that can do more than one 
size of comparison.

This optimization pass, as it is currently written will not remove those 
extensions because it believes that the length of the destination is the 
"final answer" unless it is wrapped in an explicit truncation.    
Instead it needs to ask the port if there is a shorted compare and 
branch instruction that does not cost more. in that case, those 
instructions should be rewritten to use the shorted compare and branch.

There are many operations other than compare and branch where the pass 
should be asking "can i shorten the target for free and therefore get 
rid of the extension?"   right shifts, rotates, and stores are not in 
this class, but left shifts are as are all comparisons, compare and 
branches, conditional moves.   There may even be machines that have this 
for divide, but i do not know of any off the top of my head.

What i am suggesting moves this pass into the target specific set of 
optimizations rather than target independent set, but at where this pass 
is to be put this is completely appropriate.    Any dest instruction 
where all of the operands have been extended should be checked to see if 
it was really necessary to use the longer form before doing the 
propagation pass.

kenny


On 07/11/2012 06:30 AM, Tom de Vries wrote:
> On 13/11/10 10:50, Eric Botcazou wrote:
>>> I profiled the pass on spec2000:
>>>
>>>                     -mabi=32     -mabi=64
>>> ee-pass (usr time):     0.70         1.16
>>> total   (usr time):   919.30       879.26
>>> ee-pass        (%):     0.08         0.13
>>>
>>> The pass takes 0.13% or less of the total usr runtime.
>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>
>>> Is it necessary to improve the runtime of this pass?
>> I've already given my opinion about the implementation.  The other passes in
>> the compiler try hard not to rescan everything when a single bit changes; as
>> currently written, yours doesn't.
>>
> Eric,
>
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
>
> bootstrapped and reg-tested on x86_64 and i686.
>
> build and reg-tested on mips, mips64, and arm.
>
> OK for trunk?
>
> Thanks,
> - Tom
>
> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>
> 	* ee.c: New file.
> 	* tree-pass.h (pass_ee): Declare.
> 	* opts.c ( default_options_table): Set flag_ee at -O2.
> 	* timevar.def (TV_EE): New timevar.
> 	* common.opt (fextension-elimination): New option.
> 	* Makefile.in (ee.o): New rule.
> 	* passes.c (pass_ee): Add it.
>
> 	* gcc.dg/extend-1.c: New test.
> 	* gcc.dg/extend-2.c: Same.
> 	* gcc.dg/extend-2-64.c: Same.
> 	* gcc.dg/extend-3.c: Same.
> 	* gcc.dg/extend-4.c: Same.
> 	* gcc.dg/extend-5.c: Same.
> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h (revision 189409)
> +++ gcc/tree-pass.h (working copy)
> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>
> extern struct rtl_opt_pass pass_expand;
> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
> +extern struct rtl_opt_pass pass_ee;
> extern struct rtl_opt_pass pass_rtl_fwprop;
> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
> extern struct rtl_opt_pass pass_jump;
> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
> @@ -5,19 +5,19 @@
> /* { dg-final { scan-assembler "\tbnel\t" } } */
> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>
> -NOMIPS16 int
> -f (int n, int i)
> +NOMIPS16 long int
> +f (long int n, long int i)
> {
> -  int s = 0;
> +  long int s = 0;
>    for (; i & 1; i++)
>      s += i;
>    return s;
> }
>
> -NOMIPS16 int
> -g (int n, int i)
> +NOMIPS16 long int
> +g (long int n, long int i)
> {
> -  int s = 0;
> +  long int s = 0;
>    for (i = 0; i < n; i++)
>      s += i;
>    return s;
> Index: gcc/testsuite/gcc.dg/extend-4.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +
> +unsigned char f(unsigned int a, int c)
> +{
> +  unsigned int b = a;
> +  if (c)
> +    b = a & 0x10ff;
> +  return b;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/testsuite/gcc.dg/extend-1.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +
> +void f(unsigned char * p, short s, int c, int *z)
> +{
> +  if (c)
> +    *z = 0;
> +  *p ^= (unsigned char)s;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> Index: gcc/testsuite/gcc.dg/extend-5.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +
> +void f (short d[2][2])
> +{
> +  int d0 = d[0][0] + d[0][1];
> +  int d1 = d[1][0] + d[1][1];
> +  d[0][0] = d0 + d1;
> +      d[0][1] = d0 - d1;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> Index: gcc/testsuite/gcc.dg/extend-2.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +/* { dg-require-effective-target ilp32 } */
> +
> +void f(unsigned char * p, short *s, int c)
> +{
> +  short or = 0;
> +  while (c)
> +    {
> +      or = or | s[c];
> +      c --;
> +    }
> +  *p = (unsigned char)or;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/testsuite/gcc.dg/extend-2-64.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
> +/* { dg-require-effective-target mips64 } */
> +
> +void f(unsigned char * p, short *s, int c)
> +{
> +  short or = 0;
> +  while (c)
> +    {
> +      or = or | s[c];
> +      c --;
> +    }
> +  *p = (unsigned char)or;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/testsuite/gcc.dg/extend-3.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
> +/* { dg-require-effective-target mips64 } */
> +
> +unsigned int f(unsigned char byte)
> +{
> +  return byte << 25;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c (revision 189409)
> +++ gcc/opts.c (working copy)
> @@ -490,6 +490,7 @@ static const struct default_options defa
>      { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>      { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>
>      /* -O3 optimizations.  */
>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def (revision 189409)
> +++ gcc/timevar.def (working copy)
> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
> DEFTIMEVAR (TV_VARCONST              , "varconst")
> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
> DEFTIMEVAR (TV_JUMP                  , "jump")
> +DEFTIMEVAR (TV_EE                    , "extension elimination")
> DEFTIMEVAR (TV_FWPROP                , "forward prop")
> DEFTIMEVAR (TV_CSE                   , "CSE")
> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
> Index: gcc/ee.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/ee.c (revision 0)
> @@ -0,0 +1,1190 @@
> +/* Redundant extension elimination.
> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
> +   Contributed by Tom de Vries (tom@codesourcery.com)
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/*
> +
> +  MOTIVATING EXAMPLE
> +
> +  The motivating example for this pass is the example from PR 40893:
> +
> +    void f (short d[2][2])
> +    {
> +      int d0 = d[0][0] + d[0][1];
> +      int d1 = d[1][0] + d[1][1];
> +      d[0][0] = d0 + d1;
> +      d[0][1] = d0 - d1;
> +    }
> +
> +  For MIPS, compilation results in the following insns.
> +
> +    (set (reg:SI 204)
> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
> +
> +    (set (reg:SI 205)
> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
> +
> +    (set (reg:SI 217)
> +         (plus:SI (reg:SI 205)
> +                  (reg:SI 204)))
> +
> +    (set (reg:SI 218)
> +         (minus:SI (reg:SI 204)
> +                   (reg:SI 205)))
> +
> +    (set (mem:HI (reg/v/f:SI 210))
> +         (subreg:HI (reg:SI 217) 2))
> +
> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
> +                 (const_int 2 [0x2])))
> +         (subreg:HI (reg:SI 218) 2))
> +
> +
> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
> +  are the only uses.  And the plus and minus operators belong to the class of
> +  operators where a bit in the result is only influenced by same-or-less
> +  significant bitss in the operands, so the plus and minus insns only use the
> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
> +  204 and 205, so the zero_extends are redundant.
> +
> +
> +  INTENDED EFFECT
> +
> +  This pass works by removing sign/zero-extensions, or replacing them with
> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
> +  than the extension.
> +
> +
> +  IMPLEMENTATION
> +
> +  The pass scans at most two times over all instructions.
> +
> +  The first scan collects all extensions.  If there are no extensions, we're
> +  done.
> +
> +  The second scan registers all uses of a reg in the biggest_use array.
> +  Additionally, it registers how the use size of a pseudo is propagated to the
> +  operands of the insns defining the pseudo.
> +
> +  The biggest_use array now contains the size in bits of the biggest use
> +  of each reg, which allows us to find redundant extensions.
> +
> +  If there are still non-redundant extensions left, we use the propagation
> +  information in an iterative fashion to improve the biggest_use array, after
> +  which we may find more redundant extensions.
> +
> +  Finally, redundant extensions are deleted or replaced.
> +
> +  In case that the src and dest reg of the replacement are not of the same size,
> +  we do not replace with a normal regcopy, but with a truncate or with the copy
> +  of a paradoxical subreg instead.
> +
> +
> +  ILLUSTRATION OF PASS
> +
> +  The dump of the pass shows us how the pass works on the motivating example.
> +
> +  We find the 2 extensions:
> +    found extension with preserved size 16 defining reg 204
> +    found extension with preserved size 16 defining reg 205
> +
> +  We calculate the biggests uses of a register:
> +    biggest_use
> +    reg 204: size 32
> +    reg 205: size 32
> +    reg 217: size 16
> +    reg 218: size 16
> +
> +  We propagate the biggest uses where possible:
> +    propagations
> +    205: 32 -> 16
> +    204: 32 -> 16
> +    214: 32 -> 16
> +    215: 32 -> 16
> +
> +  We conclude that the extensions are redundant:
> +    found redundant extension with preserved size 16 defining reg 205
> +    found redundant extension with preserved size 16 defining reg 204
> +
> +  And we replace them with regcopies:
> +    (set (reg:SI 204)
> +        (reg:SI 213))
> +
> +    (set (reg:SI 205)
> +        (reg:SI 216))
> +
> +
> +  LIMITATIONS
> +
> +  The scope of the analysis is limited to an extension and its uses.  The other
> +  type of analysis (related to the defs of the operand of an extension) is not
> +  done.
> +
> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
> +  whether an extension is redundant, we take all uses of a dest reg into
> +  account, also the ones that are not uses of the extension.
> +  The consideration is that using use-def chains will give a more precise
> +  analysis, but is much more expensive in terms of runtime.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "tm_p.h"
> +#include "flags.h"
> +#include "regs.h"
> +#include "hard-reg-set.h"
> +#include "basic-block.h"
> +#include "insn-config.h"
> +#include "function.h"
> +#include "expr.h"
> +#include "insn-attr.h"
> +#include "recog.h"
> +#include "toplev.h"
> +#include "target.h"
> +#include "timevar.h"
> +#include "optabs.h"
> +#include "insn-codes.h"
> +#include "rtlhooks-def.h"
> +#include "output.h"
> +#include "params.h"
> +#include "timevar.h"
> +#include "tree-pass.h"
> +#include "cgraph.h"
> +#include "vec.h"
> +
> +#define SKIP_REG (-1)
> +#define NONE (-1)
> +
> +/* Number of registers at start of pass.  */
> +
> +static int n_regs;
> +
> +/* Array to register the biggest use of a reg, in bits.  */
> +
> +static int *biggest_use;
> +
> +/* Array to register the promoted subregs.  */
> +
> +static VEC (rtx,heap) **promoted_subreg;
> +
> +/* Array to register for a reg what the last propagated size is.  */
> +
> +static int *propagated_size;
> +
> +typedef struct use
> +{
> +  int regno;
> +  int size;
> +  int offset;
> +  rtx *use;
> +} use_type;
> +
> +DEF_VEC_O(use_type);
> +DEF_VEC_ALLOC_O(use_type,heap);
> +
> +/* Vector to register the uses.  */
> +
> +static VEC (use_type,heap) **uses;
> +
> +typedef struct prop
> +{
> +  rtx set;
> +  int uses_regno;
> +  int uses_index;
> +} prop_type;
> +
> +DEF_VEC_O(prop_type);
> +DEF_VEC_ALLOC_O(prop_type,heap);
> +
> +/* Vector to register the propagations.  */
> +
> +static VEC (prop_type,heap) **props;
> +
> +/* Work list for propragation.  */
> +
> +static VEC (int,heap) *wl;
> +
> +/* Array to register what regs are in the work list.  */
> +
> +static bool *in_wl;
> +
> +/* Vector that contains the extensions in the function.  */
> +
> +static VEC (rtx,heap) *extensions;
> +
> +/* Vector that contains the extensions in the function that are going to be
> +   removed or replaced.  */
> +
> +static VEC (rtx,heap) *redundant_extensions;
> +
> +/* Forward declaration.  */
> +
> +static void note_use (rtx *x, void *data);
> +static bool skip_reg_p (int regno);
> +static void register_prop (rtx set, use_type *use);
> +
> +/* Check whether SUBREG is a promoted subreg.  */
> +
> +static bool
> +promoted_subreg_p (rtx subreg)
> +{
> +  return (GET_CODE (subreg) == SUBREG
> +	  && SUBREG_PROMOTED_VAR_P (subreg));
> +}
> +
> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
> +   promotion.  */
> +
> +static bool
> +fixed_promoted_subreg_p (rtx subreg)
> +{
> +  int mre;
> +
> +  if (!promoted_subreg_p (subreg))
> +    return false;
> +
> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
> +				   GET_MODE (SUBREG_REG (subreg)));
> +  return mre != UNKNOWN;
> +}
> +
> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
> +   OFFSET.  Return true if successful.  */
> +
> +static bool
> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
> +{
> +  rtx reg;
> +
> +  if (REG_P (use))
> +    {
> +      *regno = REGNO (use);
> +      *offset = 0;
> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
> +      return true;
> +    }
> +  else if (GET_CODE (use) == SUBREG)
> +    {
> +      reg = SUBREG_REG (use);
> +
> +      if (!REG_P (reg))
> +	return false;
> +
> +      *regno = REGNO (reg);
> +
> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
> +	{
> +	  *offset = 0;
> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
> +	}
> +      else
> +	{
> +	  *offset = subreg_lsb (use);
> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
> +	}
> +
> +      return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Create a new empty entry in the uses[REGNO] vector.  */
> +
> +static use_type *
> +new_use (unsigned int regno)
> +{
> +  if (uses[regno] == NULL)
> +    uses[regno] = VEC_alloc (use_type, heap, 4);
> +
> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
> +
> +  return VEC_last (use_type, uses[regno]);
> +}
> +
> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
> +
> +static use_type *
> +register_use (int size, unsigned int regno, int offset, rtx *use)
> +{
> +  int *current;
> +  use_type *p;
> +
> +  gcc_assert (size >= 0);
> +  gcc_assert (regno < (unsigned int)n_regs);
> +
> +  if (skip_reg_p (regno))
> +    return NULL;
> +
> +  p = new_use (regno);
> +  p->regno = regno;
> +  p->size = size;
> +  p->offset = offset;
> +  p->use = use;
> +
> +  /* Update the bigest use.  */
> +  current = &biggest_use[regno];
> +  *current = MAX (*current, size);
> +
> +  return p;
> +}
> +
> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
> +
> +static void
> +note_embedded_uses (rtx use, rtx pattern)
> +{
> +  const char *format_ptr;
> +  int i, j;
> +
> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
> +    if (format_ptr[i] == 'e')
> +      note_use (&XEXP (use, i), pattern);
> +    else if (format_ptr[i] == 'E')
> +      for (j = 0; j < XVECLEN (use, i); j++)
> +	note_use (&XVECEXP (use, i, j), pattern);
> +}
> +
> +/* Get the set in PATTERN that has USE as its src operand.  */
> +
> +static rtx
> +get_set (rtx use, rtx pattern)
> +{
> +  rtx sub;
> +  int i;
> +
> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
> +    return pattern;
> +
> +  if (GET_CODE (pattern) == PARALLEL)
> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
> +      {
> +	sub = XVECEXP (pattern, 0, i);
> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
> +	  return sub;
> +      }
> +
> +  return NULL_RTX;
> +}
> +
> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
> +   a part of PATTERN.  In this context restricted means that a bit in
> +   an operand influences only the same bit or more significant bits in the
> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
> +
> +static void
> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
> +{
> +  unsigned int i, smallest;
> +  int operand_size[2];
> +  int operand_offset[2];
> +  int used_size;
> +  unsigned int operand_regno[2];
> +  bool operand_reg[2];
> +  bool operand_ignore[2];
> +  use_type *p;
> +
> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
> +  for (i = 0; i < nr_operands; ++i)
> +    {
> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
> +				  &operand_regno[i], &operand_offset[i]);
> +      operand_ignore[i] = false;
> +    }
> +
> +  /* Handle case of reg and-masked with const.  */
> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
> +    {
> +      used_size =
> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
> +      operand_size[0] = MIN (operand_size[0], used_size);
> +    }
> +
> +  /* Handle case of reg or-masked with const.  */
> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
> +    {
> +      used_size =
> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
> +      operand_size[0] = MIN (operand_size[0], used_size);
> +    }
> +
> +  /* Ignore the use of a in 'a = a + b'.  */
> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
> +    for (i = 0; i < nr_operands; ++i)
> +      operand_ignore[i] = (operand_reg[i]
> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
> +
> +  /* Handle the case a reg is combined with don't care bits.  */
> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
> +      && operand_size[0] != operand_size[1])
> +    {
> +      smallest = operand_size[0] > operand_size[1];
> +
> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
> +	operand_size[1 - smallest] = operand_size[smallest];
> +    }
> +
> +  /* Register the operand use, if necessary.  */
> +  for (i = 0; i < nr_operands; ++i)
> +    if (!operand_reg[i])
> +      note_use (&XEXP (use, i), pattern);
> +    else if (!operand_ignore[i])
> +      {
> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
> +			  &XEXP (use, i));
> +	register_prop (set, p);
> +      }
> +}
> +
> +/* Register promoted SUBREG in promoted_subreg.  */
> +
> +static void
> +register_promoted_subreg (rtx subreg)
> +{
> +  int index = REGNO (SUBREG_REG (subreg));
> +
> +  if (promoted_subreg[index] == NULL)
> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
> +
> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
> +}
> +
> +/* Note promoted subregs in X.  */
> +
> +static int
> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
> +{
> +  rtx subreg = *x;
> +
> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
> +      && REG_P (SUBREG_REG (subreg)))
> +    register_promoted_subreg (subreg);
> +
> +  return 0;
> +}
> +
> +/* Handle use X in pattern DATA noted by note_uses.  */
> +
> +static void
> +note_use (rtx *x, void *data)
> +{
> +  rtx use = *x;
> +  rtx pattern = (rtx)data;
> +  int use_size, use_offset;
> +  unsigned int use_regno;
> +  rtx set;
> +  use_type *p;
> +
> +  for_each_rtx (x, note_promoted_subreg, NULL);
> +
> +  set = get_set (use, pattern);
> +
> +  switch (GET_CODE (use))
> +    {
> +    case REG:
> +    case SUBREG:
> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
> +	{
> +	  note_embedded_uses (use, pattern);
> +	  return;
> +	}
> +      p = register_use (use_size, use_regno, use_offset, x);
> +      register_prop (set, p);
> +      return;
> +    case SIGN_EXTEND:
> +    case ZERO_EXTEND:
> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
> +	{
> +	  note_embedded_uses (use, pattern);
> +	  return;
> +	}
> +      p = register_use (use_size, use_regno, use_offset, x);
> +      register_prop (set, p);
> +      return;
> +    case IOR:
> +    case AND:
> +    case XOR:
> +    case PLUS:
> +    case MINUS:
> +      note_restricted_op_use (set, use, 2, pattern);
> +      return;
> +    case NOT:
> +    case NEG:
> +      note_restricted_op_use (set, use, 1, pattern);
> +      return;
> +    case ASHIFT:
> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
> +	  || !CONST_INT_P (XEXP (use, 1))
> +	  || INTVAL (XEXP (use, 1)) <= 0
> +	  || paradoxical_subreg_p (XEXP (use, 0)))
> +	{
> +	  note_embedded_uses (use, pattern);
> +	  return;
> +	}
> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
> +			  use_offset, x);
> +      return;
> +    default:
> +      note_embedded_uses (use, pattern);
> +      return;
> +    }
> +}
> +
> +/* Check whether reg REGNO is implicitly used.  */
> +
> +static bool
> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
> +{
> +#ifdef EPILOGUE_USES
> +  if (EPILOGUE_USES (regno))
> +    return true;
> +#endif
> +
> +#ifdef EH_USES
> +  if (EH_USES (regno))
> +    return true;
> +#endif
> +
> +  return false;
> +}
> +
> +/* Check whether reg REGNO should be skipped in analysis.  */
> +
> +static bool
> +skip_reg_p (int regno)
> +{
> +  /* TODO: handle hard registers.  The problem with hard registers is that
> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
> +     We don't handle that properly.  */
> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
> +}
> +
> +/* Note the uses of argument registers in call INSN.  */
> +
> +static void
> +note_call_uses (rtx insn)
> +{
> +  rtx link, link_expr;
> +
> +  if (!CALL_P (insn))
> +    return;
> +
> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
> +    {
> +      link_expr = XEXP (link, 0);
> +
> +      if (GET_CODE (link_expr) == USE)
> +	note_use (&XEXP (link_expr, 0), link);
> +    }
> +}
> +
> +/* Dump the biggest uses found.  */
> +
> +static void
> +dump_biggest_use (void)
> +{
> +  int i;
> +
> +  if (!dump_file)
> +    return;
> +
> +  fprintf (dump_file, "biggest_use:\n");
> +
> +  for (i = 0; i < n_regs; i++)
> +    if (biggest_use[i] > 0)
> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
> +
> +  fprintf (dump_file, "\n");
> +}
> +
> +/* Calculate the biggest use mode for all regs.  */
> +
> +static void
> +calculate_biggest_use (void)
> +{
> +  basic_block bb;
> +  rtx insn;
> +
> +  /* For all insns, call note_use for each use in insn.  */
> +  FOR_EACH_BB (bb)
> +    FOR_BB_INSNS (bb, insn)
> +      {
> +	if (!NONDEBUG_INSN_P (insn))
> +	  continue;
> +
> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
> +
> +	if (CALL_P (insn))
> +	  note_call_uses (insn);
> +      }
> +
> +  dump_biggest_use ();
> +}
> +
> +/* Register a propagation USE in SET in the props vector.  */
> +
> +static void
> +register_prop (rtx set, use_type *use)
> +{
> +  prop_type *p;
> +  int regno;
> +
> +  if (set == NULL_RTX || use == NULL)
> +    return;
> +
> +  if (!REG_P (SET_DEST (set)))
> +    return;
> +
> +  regno = REGNO (SET_DEST (set));
> +
> +  if (skip_reg_p (regno))
> +    return;
> +
> +  if (props[regno] == NULL)
> +    props[regno] = VEC_alloc (prop_type, heap, 4);
> +
> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
> +  p = VEC_last (prop_type, props[regno]);
> +  p->set = set;
> +  p->uses_regno = use->regno;
> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
> +}
> +
> +/* Add REGNO to the worklist.  */
> +
> +static void
> +add_to_wl (int regno)
> +{
> +  if (in_wl[regno])
> +    return;
> +
> +  if (biggest_use[regno] > 0
> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
> +    return;
> +
> +  if (VEC_empty (prop_type, props[regno]))
> +    return;
> +
> +  if (propagated_size[regno] != NONE
> +      && propagated_size[regno] == biggest_use[regno])
> +    return;
> +
> +  VEC_safe_push (int, heap, wl, regno);
> +  in_wl[regno] = true;
> +}
> +
> +/* Pop a reg from the worklist and return it.  */
> +
> +static int
> +pop_wl (void)
> +{
> +  int regno = VEC_pop (int, wl);
> +  in_wl[regno] = false;
> +  return regno;
> +}
> +
> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
> +
> +static int
> +propagate_size (int dest_size, use_type *p)
> +{
> +  if (dest_size == 0)
> +    return 0;
> +
> +  return p->offset + MIN (p->size - p->offset, dest_size);
> +}
> +
> +/* Get the biggest use of REGNO from the uses vector.  */
> +
> +static int
> +get_biggest_use (unsigned int regno)
> +{
> +  int ix;
> +  use_type *p;
> +  int max = 0;
> +
> +  gcc_assert (uses[regno] != NULL);
> +
> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
> +    max = MAX (max, p->size);
> +
> +  return max;
> +}
> +
> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
> +
> +static void
> +propagate_to_use (int dest_size, use_type *use)
> +{
> +  int new_use_size;
> +  int prev_biggest_use;
> +  int *current;
> +
> +  new_use_size = propagate_size (dest_size, use);
> +
> +  if (new_use_size >= use->size)
> +    return;
> +
> +  use->size = new_use_size;
> +
> +  current = &biggest_use[use->regno];
> +
> +  prev_biggest_use = *current;
> +  *current = get_biggest_use (use->regno);
> +
> +  if (*current >= prev_biggest_use)
> +    return;
> +
> +  add_to_wl (use->regno);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
> +	     *current);
> +
> +}
> +
> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
> +   propagations in NR_PROPAGATIONS.  */
> +
> +static void
> +propagate_to_uses (int regno, int *nr_propagations)
> +{
> +  int ix;
> +  prop_type *p;
> +
> +  gcc_assert (!(propagated_size[regno] == NONE
> +		&& propagated_size[regno] == biggest_use[regno]));
> +
> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
> +    {
> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
> +      propagate_to_use (biggest_use[regno], use);
> +      ++(*nr_propagations);
> +    }
> +
> +  propagated_size[regno] = biggest_use[regno];
> +}
> +
> +/* Improve biggest_use array iteratively.  */
> +
> +static void
> +propagate (void)
> +{
> +  int i;
> +  int nr_propagations = 0;
> +
> +  /* Initialize work list.  */
> +
> +  for (i = 0; i < n_regs; ++i)
> +    add_to_wl (i);
> +
> +  /* Work the work list.  */
> +
> +  if (dump_file)
> +    fprintf (dump_file, "propagations: \n");
> +  while (!VEC_empty (int, wl))
> +    propagate_to_uses (pop_wl (), &nr_propagations);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
> +}
> +
> +/* Check whether this is a sign/zero extension.  */
> +
> +static bool
> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
> +{
> +  rtx src, op0;
> +
> +  /* Detect set of reg.  */
> +  if (GET_CODE (PATTERN (insn)) != SET)
> +    return false;
> +
> +  src = SET_SRC (PATTERN (insn));
> +  *dest = SET_DEST (PATTERN (insn));
> +
> +  if (!REG_P (*dest))
> +    return false;
> +
> +  /* Detect sign or zero extension.  */
> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
> +    {
> +      op0 = XEXP (src, 0);
> +
> +      /* Determine amount of least significant bits preserved by operation.  */
> +      if (GET_CODE (src) == AND)
> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
> +      else
> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
> +
> +      if (GET_CODE (op0) == SUBREG)
> +	{
> +	  if (subreg_lsb (op0) != 0)
> +	    return false;
> +
> +	  *inner = SUBREG_REG (op0);
> +
> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
> +	    return false;
> +
> +	  return true;
> +	}
> +      else if (REG_P (op0))
> +	{
> +	  *inner = op0;
> +
> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
> +	    return false;
> +
> +	  return true;
> +	}
> +      else if (GET_CODE (op0) == TRUNCATE)
> +	{
> +	  *inner = XEXP (op0, 0);
> +	  return true;
> +	}
> +    }
> +
> +  return false;
> +}
> +
> +/* Find extensions and store them in the extensions vector.  */
> +
> +static bool
> +find_extensions (void)
> +{
> +  basic_block bb;
> +  rtx insn, dest, inner;
> +  int preserved_size;
> +
> +  /* For all insns, call note_use for each use in insn.  */
> +  FOR_EACH_BB (bb)
> +    FOR_BB_INSNS (bb, insn)
> +      {
> +	if (!NONDEBUG_INSN_P (insn))
> +	  continue;
> +
> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
> +	  continue;
> +
> +	VEC_safe_push (rtx, heap, extensions, insn);
> +
> +	if (dump_file)
> +	  fprintf (dump_file,
> +		   "found extension %u with preserved size %d defining"
> +		   " reg %d\n",
> +		   INSN_UID (insn), preserved_size, REGNO (dest));
> +      }
> +
> +  if (dump_file)
> +    {
> +      if (!VEC_empty (rtx, extensions))
> +	fprintf (dump_file, "\n");
> +      else
> +	fprintf (dump_file, "no extensions found.\n");
> +    }
> +
> +  return !VEC_empty (rtx, extensions);
> +}
> +
> +/* Check whether this is a redundant sign/zero extension.  */
> +
> +static bool
> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
> +{
> +  int biggest_dest_use;
> +
> +  if (!extension_p (insn, dest, inner, preserved_size))
> +    gcc_unreachable ();
> +
> +  biggest_dest_use = biggest_use[REGNO (*dest)];
> +
> +  if (biggest_dest_use == SKIP_REG)
> +    return false;
> +
> +  if (*preserved_size < biggest_dest_use)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Find the redundant extensions in the extensions vector and move them to the
> +   redundant_extensions vector.  */
> +
> +static void
> +find_redundant_extensions (void)
> +{
> +  rtx insn, dest, inner;
> +  int ix;
> +  bool found = false;
> +  int preserved_size;
> +
> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
> +      {
> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
> +	VEC_unordered_remove (rtx, extensions, ix);
> +
> +	if (dump_file)
> +	  fprintf (dump_file,
> +		   "found redundant extension %u with preserved size %d"
> +		   " defining reg %d\n",
> +		   INSN_UID (insn), preserved_size, REGNO (dest));
> +	found = true;
> +      }
> +
> +  if (dump_file && found)
> +    fprintf (dump_file, "\n");
> +}
> +
> +/* Reset promotion of subregs or REG.  */
> +
> +static void
> +reset_promoted_subreg (rtx reg)
> +{
> +  int ix;
> +  rtx subreg;
> +
> +  if (promoted_subreg[REGNO (reg)] == NULL)
> +    return;
> +
> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
> +    {
> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
> +    }
> +
> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
> +}
> +
> +/* Try to remove or replace the redundant extension INSN which extends INNER and
> +   writes to DEST.  */
> +
> +static void
> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
> +{
> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
> +
> +  /* Check whether replacement is needed.  */
> +  if (dest != inner)
> +    {
> +      start_sequence ();
> +
> +      /* Determine the proper replacement operation.  */
> +      if (GET_MODE (dest) == GET_MODE (inner))
> +	{
> +	  cp_src = inner;
> +	  cp_dest = dest;
> +	}
> +      else if (GET_MODE_SIZE (GET_MODE (dest))
> +	       > GET_MODE_SIZE (GET_MODE (inner)))
> +	{
> +	  emit_clobber (dest);
> +	  cp_src = inner;
> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
> +	}
> +      else
> +	{
> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
> +	  cp_dest = dest;
> +	}
> +
> +      emit_move_insn (cp_dest, cp_src);
> +
> +      seq = get_insns ();
> +      end_sequence ();
> +
> +      /* If the replacement is not supported, bail out.  */
> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
> +	  return;
> +
> +      /* Insert the replacement.  */
> +      emit_insn_before (seq, insn);
> +    }
> +
> +  /* Note replacement/removal in the dump.  */
> +  if (dump_file)
> +    {
> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
> +      if (dest != inner)
> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
> +      else
> +	fprintf (dump_file, "removed\n");
> +    }
> +
> +  /* Remove the extension.  */
> +  delete_insn (insn);
> +
> +  reset_promoted_subreg (dest);
> +}
> +
> +/* Setup the variables at the start of the pass.  */
> +
> +static void
> +init_pass (void)
> +{
> +  int i;
> +
> +  biggest_use = XNEWVEC (int, n_regs);
> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
> +  propagated_size = XNEWVEC (int, n_regs);
> +
> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
> +     handle that reg conservatively and set it to SKIP_REG instead.  */
> +  for (i = 0; i < n_regs; i++)
> +    {
> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
> +      propagated_size[i] = NONE;
> +    }
> +
> +  extensions = VEC_alloc (rtx, heap, 10);
> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
> +
> +  wl = VEC_alloc (int, heap, 50);
> +  in_wl = XNEWVEC (bool, n_regs);
> +
> +  uses = XNEWVEC (typeof (*uses), n_regs);
> +  props = XNEWVEC (typeof (*props), n_regs);
> +
> +  for (i = 0; i < n_regs; ++i)
> +    {
> +      uses[i] = NULL;
> +      props[i] = NULL;
> +      in_wl[i] = false;
> +    }
> +}
> +
> +/* Find redundant extensions and remove or replace them if possible.  */
> +
> +static void
> +remove_redundant_extensions (void)
> +{
> +  rtx insn, dest, inner;
> +  int preserved_size;
> +  int ix;
> +
> +  if (!find_extensions ())
> +    return;
> +
> +  calculate_biggest_use ();
> +
> +  find_redundant_extensions ();
> +
> +  if (!VEC_empty (rtx, extensions))
> +    {
> +      propagate ();
> +
> +      find_redundant_extensions ();
> +    }
> +
> +  gcc_checking_assert (n_regs == max_reg_num ());
> +
> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
> +    {
> +      extension_p (insn, &dest, &inner, &preserved_size);
> +      try_remove_or_replace_extension (insn, dest, inner);
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "\n");
> +}
> +
> +/* Free the variables at the end of the pass.  */
> +
> +static void
> +finish_pass (void)
> +{
> +  int i;
> +
> +  XDELETEVEC (propagated_size);
> +
> +  VEC_free (rtx, heap, extensions);
> +  VEC_free (rtx, heap, redundant_extensions);
> +
> +  VEC_free (int, heap, wl);
> +
> +  for (i = 0; i < n_regs; ++i)
> +    {
> +      if (uses[i] != NULL)
> +	VEC_free (use_type, heap, uses[i]);
> +
> +      if (props[i] != NULL)
> +	VEC_free (prop_type, heap, props[i]);
> +    }
> +
> +  XDELETEVEC (uses);
> +  XDELETEVEC (props);
> +  XDELETEVEC (biggest_use);
> +
> +  for (i = 0; i < n_regs; ++i)
> +    if (promoted_subreg[i] != NULL)
> +      VEC_free (rtx, heap, promoted_subreg[i]);
> +  XDELETEVEC (promoted_subreg);
> +}
> +
> +/* Remove redundant extensions.  */
> +
> +static unsigned int
> +rest_of_handle_ee (void)
> +{
> +  n_regs = max_reg_num ();
> +
> +  init_pass ();
> +  remove_redundant_extensions ();
> +  finish_pass ();
> +  return 0;
> +}
> +
> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
> +
> +static bool
> +gate_handle_ee (void)
> +{
> +  return (optimize > 0 && flag_ee);
> +}
> +
> +struct rtl_opt_pass pass_ee =
> +{
> + {
> +  RTL_PASS,
> +  "ee",                                 /* name */
> +  gate_handle_ee,                       /* gate */
> +  rest_of_handle_ee,                    /* execute */
> +  NULL,                                 /* sub */
> +  NULL,                                 /* next */
> +  0,                                    /* static_pass_number */
> +  TV_EE,                                /* tv_id */
> +  0,                                    /* properties_required */
> +  0,                                    /* properties_provided */
> +  0,                                    /* properties_destroyed */
> +  0,                                    /* todo_flags_start */
> +  TODO_ggc_collect |
> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
> + }
> +};
> Index: gcc/common.opt
> ===================================================================
> --- gcc/common.opt (revision 189409)
> +++ gcc/common.opt (working copy)
> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
> Common Report Var(flag_eliminate_dwarf2_dups)
> Perform DWARF2 duplicate elimination
>
> +fextension-elimination
> +Common Report Var(flag_ee) Init(0) Optimization
> +Perform extension elimination
> +
> fipa-sra
> Common Report Var(flag_ipa_sra) Init(0) Optimization
> Perform interprocedural reduction of aggregates
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in (revision 189409)
> +++ gcc/Makefile.in (working copy)
> @@ -1218,6 +1218,7 @@ OBJS = \
> 	dwarf2asm.o \
> 	dwarf2cfi.o \
> 	dwarf2out.o \
> +	ee.o \
> 	ebitmap.o \
> 	emit-rtl.o \
> 	et-forest.o \
> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>     $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>     intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>     $(DF_H) $(CFGLOOP_H)
> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
> +   $(PARAMS_H) $(CGRAPH_H)
> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>     $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>     $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c (revision 189409)
> +++ gcc/passes.c (working copy)
> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>        NEXT_PASS (pass_initialize_regs);
>        NEXT_PASS (pass_ud_rtl_dce);
>        NEXT_PASS (pass_combine);
> +      NEXT_PASS (pass_ee);
>        NEXT_PASS (pass_if_after_combine);
>        NEXT_PASS (pass_partition_blocks);
>        NEXT_PASS (pass_regmove);
Tom de Vries - July 12, 2012, 9:20 a.m.
Kenneth,

I see I replied to your original message that had the wrong CC, I'm now CC-ing
gcc-patches@gcc.gnu.org.

Thanks,
- Tom

On 12/07/12 11:05, Tom de Vries wrote:
> On 12/07/12 03:39, Kenneth Zadeck wrote:
>> Tom,
>>
>> I have a problem with the approach that you have taken here.   I believe 
>> that this could be a very useful addition to gcc so I am in general very 
>> supportive, but i think you are missing an important case.
>>
>> My problem is that it the pass does not actually look at the target and 
>> make any decisions based on that target.
>>
>> for instance, we have a llp64 target.   As with many targets, the target 
>> has a rich set of compare and branch instructions.  In particular, it 
>> can do both 32 and 64 bit comparisons.    We see that many of the 
>> upstream optimizations that take int (SI mode) index variables generate 
>> extension operations before doing 64 bit compare and branch 
>> instructions, even though there are 32 bit comparison and branches on 
>> the machine.     There are a lot of machines that can do more than one 
>> size of comparison.
>>
> 	 This optimization pass, as it is currently written will not remove those
>> extensions because it believes that the length of the destination is the 
>> "final answer" unless it is wrapped in an explicit truncation.
>> Instead it needs to ask the port if there is a shorted compare and 
>> branch instruction that does not cost more. in that case, those 
>> instructions should be rewritten to use the shorted compare and branch.
>>
>> There are many operations other than compare and branch where the pass 
>> should be asking "can i shorten the target for free and therefore get 
>> rid of the extension?"
> 
> Kenneth,
> 
> I'm not sure I understand the optimization you're talking about, in particular
> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
> is the same.
> 
> Assuming it's the same, my understanding is that you're talking about an example
> like this:
> ...
>   (insn (set (reg:DI 5)
>              (zero_extend:DI (reg:SI 4))))
> 
>   (jump_insn (set (pc)
>                   (if_then_else (eq (reg:DI 5)
>                                     (const_int 0))
>                                 (label_ref:DI 62)
>                                 (pc))))
> 
>   ->
> 
>   (jump_insn (set (pc)
>                   (if_then_else (eq (reg:SI 4)
>                                     (const_int 0))
>                                 (label_ref:DI 62)
>                                 (pc))))
> 
> ...
> I would expect combine to optimize this.
> 
> In case I got the example all backwards or it is a too simple one, please
> provide an rtl example that illustrates the optimization.
> 
> Thanks,
> - Tom
> 
> 
>>  right shifts, rotates, and stores are not in 
>> this class, but left shifts are as are all comparisons, compare and 
>> branches, conditional moves.   There may even be machines that have this 
>> for divide, but i do not know of any off the top of my head.
>>
>> What i am suggesting moves this pass into the target specific set of 
>> optimizations rather than target independent set, but at where this pass 
>> is to be put this is completely appropriate.    Any dest instruction 
>> where all of the operands have been extended should be checked to see if 
>> it was really necessary to use the longer form before doing the 
>> propagation pass.
>>
>> kenny
>>
>>
>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>> I profiled the pass on spec2000:
>>>>>
>>>>>                     -mabi=32     -mabi=64
>>>>> ee-pass (usr time):     0.70         1.16
>>>>> total   (usr time):   919.30       879.26
>>>>> ee-pass        (%):     0.08         0.13
>>>>>
>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>
>>>>> Is it necessary to improve the runtime of this pass?
>>>> I've already given my opinion about the implementation.  The other passes in
>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>> currently written, yours doesn't.
>>>>
>>> Eric,
>>>
>>> I've done the following:
>>> - refactored the pass such that it now scans at most twice over all
>>>   instructions.
>>> - updated the patch to be applicable to current trunk
>>> - updated the motivating example to a more applicable one (as discussed in
>>>   this thread), and added that one as test-case.
>>> - added a part in the header comment illustrating the working of the pass
>>>   on the motivating example.
>>>
>>> bootstrapped and reg-tested on x86_64 and i686.
>>>
>>> build and reg-tested on mips, mips64, and arm.
>>>
>>> OK for trunk?
>>>
>>> Thanks,
>>> - Tom
>>>
>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>
>>> 	* ee.c: New file.
>>> 	* tree-pass.h (pass_ee): Declare.
>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>> 	* timevar.def (TV_EE): New timevar.
>>> 	* common.opt (fextension-elimination): New option.
>>> 	* Makefile.in (ee.o): New rule.
>>> 	* passes.c (pass_ee): Add it.
>>>
>>> 	* gcc.dg/extend-1.c: New test.
>>> 	* gcc.dg/extend-2.c: Same.
>>> 	* gcc.dg/extend-2-64.c: Same.
>>> 	* gcc.dg/extend-3.c: Same.
>>> 	* gcc.dg/extend-4.c: Same.
>>> 	* gcc.dg/extend-5.c: Same.
>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>> Index: gcc/tree-pass.h
>>> ===================================================================
>>> --- gcc/tree-pass.h (revision 189409)
>>> +++ gcc/tree-pass.h (working copy)
>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>
>>> extern struct rtl_opt_pass pass_expand;
>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>> +extern struct rtl_opt_pass pass_ee;
>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>> extern struct rtl_opt_pass pass_jump;
>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>> @@ -5,19 +5,19 @@
>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>
>>> -NOMIPS16 int
>>> -f (int n, int i)
>>> +NOMIPS16 long int
>>> +f (long int n, long int i)
>>> {
>>> -  int s = 0;
>>> +  long int s = 0;
>>>    for (; i & 1; i++)
>>>      s += i;
>>>    return s;
>>> }
>>>
>>> -NOMIPS16 int
>>> -g (int n, int i)
>>> +NOMIPS16 long int
>>> +g (long int n, long int i)
>>> {
>>> -  int s = 0;
>>> +  long int s = 0;
>>>    for (i = 0; i < n; i++)
>>>      s += i;
>>>    return s;
>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +
>>> +unsigned char f(unsigned int a, int c)
>>> +{
>>> +  unsigned int b = a;
>>> +  if (c)
>>> +    b = a & 0x10ff;
>>> +  return b;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +
>>> +void f(unsigned char * p, short s, int c, int *z)
>>> +{
>>> +  if (c)
>>> +    *z = 0;
>>> +  *p ^= (unsigned char)s;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +
>>> +void f (short d[2][2])
>>> +{
>>> +  int d0 = d[0][0] + d[0][1];
>>> +  int d1 = d[1][0] + d[1][1];
>>> +  d[0][0] = d0 + d1;
>>> +      d[0][1] = d0 - d1;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>> @@ -0,0 +1,20 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +/* { dg-require-effective-target ilp32 } */
>>> +
>>> +void f(unsigned char * p, short *s, int c)
>>> +{
>>> +  short or = 0;
>>> +  while (c)
>>> +    {
>>> +      or = or | s[c];
>>> +      c --;
>>> +    }
>>> +  *p = (unsigned char)or;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>> @@ -0,0 +1,20 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>> +/* { dg-require-effective-target mips64 } */
>>> +
>>> +void f(unsigned char * p, short *s, int c)
>>> +{
>>> +  short or = 0;
>>> +  while (c)
>>> +    {
>>> +      or = or | s[c];
>>> +      c --;
>>> +    }
>>> +  *p = (unsigned char)or;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>> +/* { dg-require-effective-target mips64 } */
>>> +
>>> +unsigned int f(unsigned char byte)
>>> +{
>>> +  return byte << 25;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/opts.c
>>> ===================================================================
>>> --- gcc/opts.c (revision 189409)
>>> +++ gcc/opts.c (working copy)
>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>      { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>      { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>      { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>
>>>      /* -O3 optimizations.  */
>>>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>> Index: gcc/timevar.def
>>> ===================================================================
>>> --- gcc/timevar.def (revision 189409)
>>> +++ gcc/timevar.def (working copy)
>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>> Index: gcc/ee.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/ee.c (revision 0)
>>> @@ -0,0 +1,1190 @@
>>> +/* Redundant extension elimination.
>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +/*
>>> +
>>> +  MOTIVATING EXAMPLE
>>> +
>>> +  The motivating example for this pass is the example from PR 40893:
>>> +
>>> +    void f (short d[2][2])
>>> +    {
>>> +      int d0 = d[0][0] + d[0][1];
>>> +      int d1 = d[1][0] + d[1][1];
>>> +      d[0][0] = d0 + d1;
>>> +      d[0][1] = d0 - d1;
>>> +    }
>>> +
>>> +  For MIPS, compilation results in the following insns.
>>> +
>>> +    (set (reg:SI 204)
>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>> +
>>> +    (set (reg:SI 205)
>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>> +
>>> +    (set (reg:SI 217)
>>> +         (plus:SI (reg:SI 205)
>>> +                  (reg:SI 204)))
>>> +
>>> +    (set (reg:SI 218)
>>> +         (minus:SI (reg:SI 204)
>>> +                   (reg:SI 205)))
>>> +
>>> +    (set (mem:HI (reg/v/f:SI 210))
>>> +         (subreg:HI (reg:SI 217) 2))
>>> +
>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>> +                 (const_int 2 [0x2])))
>>> +         (subreg:HI (reg:SI 218) 2))
>>> +
>>> +
>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>> +  operators where a bit in the result is only influenced by same-or-less
>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>> +  204 and 205, so the zero_extends are redundant.
>>> +
>>> +
>>> +  INTENDED EFFECT
>>> +
>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>> +  than the extension.
>>> +
>>> +
>>> +  IMPLEMENTATION
>>> +
>>> +  The pass scans at most two times over all instructions.
>>> +
>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>> +  done.
>>> +
>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>> +  operands of the insns defining the pseudo.
>>> +
>>> +  The biggest_use array now contains the size in bits of the biggest use
>>> +  of each reg, which allows us to find redundant extensions.
>>> +
>>> +  If there are still non-redundant extensions left, we use the propagation
>>> +  information in an iterative fashion to improve the biggest_use array, after
>>> +  which we may find more redundant extensions.
>>> +
>>> +  Finally, redundant extensions are deleted or replaced.
>>> +
>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>> +  of a paradoxical subreg instead.
>>> +
>>> +
>>> +  ILLUSTRATION OF PASS
>>> +
>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>> +
>>> +  We find the 2 extensions:
>>> +    found extension with preserved size 16 defining reg 204
>>> +    found extension with preserved size 16 defining reg 205
>>> +
>>> +  We calculate the biggests uses of a register:
>>> +    biggest_use
>>> +    reg 204: size 32
>>> +    reg 205: size 32
>>> +    reg 217: size 16
>>> +    reg 218: size 16
>>> +
>>> +  We propagate the biggest uses where possible:
>>> +    propagations
>>> +    205: 32 -> 16
>>> +    204: 32 -> 16
>>> +    214: 32 -> 16
>>> +    215: 32 -> 16
>>> +
>>> +  We conclude that the extensions are redundant:
>>> +    found redundant extension with preserved size 16 defining reg 205
>>> +    found redundant extension with preserved size 16 defining reg 204
>>> +
>>> +  And we replace them with regcopies:
>>> +    (set (reg:SI 204)
>>> +        (reg:SI 213))
>>> +
>>> +    (set (reg:SI 205)
>>> +        (reg:SI 216))
>>> +
>>> +
>>> +  LIMITATIONS
>>> +
>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>> +  done.
>>> +
>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>> +  account, also the ones that are not uses of the extension.
>>> +  The consideration is that using use-def chains will give a more precise
>>> +  analysis, but is much more expensive in terms of runtime.  */
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "rtl.h"
>>> +#include "tree.h"
>>> +#include "tm_p.h"
>>> +#include "flags.h"
>>> +#include "regs.h"
>>> +#include "hard-reg-set.h"
>>> +#include "basic-block.h"
>>> +#include "insn-config.h"
>>> +#include "function.h"
>>> +#include "expr.h"
>>> +#include "insn-attr.h"
>>> +#include "recog.h"
>>> +#include "toplev.h"
>>> +#include "target.h"
>>> +#include "timevar.h"
>>> +#include "optabs.h"
>>> +#include "insn-codes.h"
>>> +#include "rtlhooks-def.h"
>>> +#include "output.h"
>>> +#include "params.h"
>>> +#include "timevar.h"
>>> +#include "tree-pass.h"
>>> +#include "cgraph.h"
>>> +#include "vec.h"
>>> +
>>> +#define SKIP_REG (-1)
>>> +#define NONE (-1)
>>> +
>>> +/* Number of registers at start of pass.  */
>>> +
>>> +static int n_regs;
>>> +
>>> +/* Array to register the biggest use of a reg, in bits.  */
>>> +
>>> +static int *biggest_use;
>>> +
>>> +/* Array to register the promoted subregs.  */
>>> +
>>> +static VEC (rtx,heap) **promoted_subreg;
>>> +
>>> +/* Array to register for a reg what the last propagated size is.  */
>>> +
>>> +static int *propagated_size;
>>> +
>>> +typedef struct use
>>> +{
>>> +  int regno;
>>> +  int size;
>>> +  int offset;
>>> +  rtx *use;
>>> +} use_type;
>>> +
>>> +DEF_VEC_O(use_type);
>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>> +
>>> +/* Vector to register the uses.  */
>>> +
>>> +static VEC (use_type,heap) **uses;
>>> +
>>> +typedef struct prop
>>> +{
>>> +  rtx set;
>>> +  int uses_regno;
>>> +  int uses_index;
>>> +} prop_type;
>>> +
>>> +DEF_VEC_O(prop_type);
>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>> +
>>> +/* Vector to register the propagations.  */
>>> +
>>> +static VEC (prop_type,heap) **props;
>>> +
>>> +/* Work list for propragation.  */
>>> +
>>> +static VEC (int,heap) *wl;
>>> +
>>> +/* Array to register what regs are in the work list.  */
>>> +
>>> +static bool *in_wl;
>>> +
>>> +/* Vector that contains the extensions in the function.  */
>>> +
>>> +static VEC (rtx,heap) *extensions;
>>> +
>>> +/* Vector that contains the extensions in the function that are going to be
>>> +   removed or replaced.  */
>>> +
>>> +static VEC (rtx,heap) *redundant_extensions;
>>> +
>>> +/* Forward declaration.  */
>>> +
>>> +static void note_use (rtx *x, void *data);
>>> +static bool skip_reg_p (int regno);
>>> +static void register_prop (rtx set, use_type *use);
>>> +
>>> +/* Check whether SUBREG is a promoted subreg.  */
>>> +
>>> +static bool
>>> +promoted_subreg_p (rtx subreg)
>>> +{
>>> +  return (GET_CODE (subreg) == SUBREG
>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>> +}
>>> +
>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>> +   promotion.  */
>>> +
>>> +static bool
>>> +fixed_promoted_subreg_p (rtx subreg)
>>> +{
>>> +  int mre;
>>> +
>>> +  if (!promoted_subreg_p (subreg))
>>> +    return false;
>>> +
>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>> +  return mre != UNKNOWN;
>>> +}
>>> +
>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>> +   OFFSET.  Return true if successful.  */
>>> +
>>> +static bool
>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>> +{
>>> +  rtx reg;
>>> +
>>> +  if (REG_P (use))
>>> +    {
>>> +      *regno = REGNO (use);
>>> +      *offset = 0;
>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>> +      return true;
>>> +    }
>>> +  else if (GET_CODE (use) == SUBREG)
>>> +    {
>>> +      reg = SUBREG_REG (use);
>>> +
>>> +      if (!REG_P (reg))
>>> +	return false;
>>> +
>>> +      *regno = REGNO (reg);
>>> +
>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>> +	{
>>> +	  *offset = 0;
>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>> +	}
>>> +      else
>>> +	{
>>> +	  *offset = subreg_lsb (use);
>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>> +	}
>>> +
>>> +      return true;
>>> +    }
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>> +
>>> +static use_type *
>>> +new_use (unsigned int regno)
>>> +{
>>> +  if (uses[regno] == NULL)
>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>> +
>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>> +
>>> +  return VEC_last (use_type, uses[regno]);
>>> +}
>>> +
>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>> +
>>> +static use_type *
>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>> +{
>>> +  int *current;
>>> +  use_type *p;
>>> +
>>> +  gcc_assert (size >= 0);
>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>> +
>>> +  if (skip_reg_p (regno))
>>> +    return NULL;
>>> +
>>> +  p = new_use (regno);
>>> +  p->regno = regno;
>>> +  p->size = size;
>>> +  p->offset = offset;
>>> +  p->use = use;
>>> +
>>> +  /* Update the bigest use.  */
>>> +  current = &biggest_use[regno];
>>> +  *current = MAX (*current, size);
>>> +
>>> +  return p;
>>> +}
>>> +
>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>> +
>>> +static void
>>> +note_embedded_uses (rtx use, rtx pattern)
>>> +{
>>> +  const char *format_ptr;
>>> +  int i, j;
>>> +
>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>> +    if (format_ptr[i] == 'e')
>>> +      note_use (&XEXP (use, i), pattern);
>>> +    else if (format_ptr[i] == 'E')
>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>> +}
>>> +
>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>> +
>>> +static rtx
>>> +get_set (rtx use, rtx pattern)
>>> +{
>>> +  rtx sub;
>>> +  int i;
>>> +
>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>> +    return pattern;
>>> +
>>> +  if (GET_CODE (pattern) == PARALLEL)
>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>> +      {
>>> +	sub = XVECEXP (pattern, 0, i);
>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>> +	  return sub;
>>> +      }
>>> +
>>> +  return NULL_RTX;
>>> +}
>>> +
>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>> +   an operand influences only the same bit or more significant bits in the
>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>> +
>>> +static void
>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>> +{
>>> +  unsigned int i, smallest;
>>> +  int operand_size[2];
>>> +  int operand_offset[2];
>>> +  int used_size;
>>> +  unsigned int operand_regno[2];
>>> +  bool operand_reg[2];
>>> +  bool operand_ignore[2];
>>> +  use_type *p;
>>> +
>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>> +  for (i = 0; i < nr_operands; ++i)
>>> +    {
>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>> +				  &operand_regno[i], &operand_offset[i]);
>>> +      operand_ignore[i] = false;
>>> +    }
>>> +
>>> +  /* Handle case of reg and-masked with const.  */
>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>> +    {
>>> +      used_size =
>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>> +    }
>>> +
>>> +  /* Handle case of reg or-masked with const.  */
>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>> +    {
>>> +      used_size =
>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>> +    }
>>> +
>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>> +    for (i = 0; i < nr_operands; ++i)
>>> +      operand_ignore[i] = (operand_reg[i]
>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>> +
>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>> +      && operand_size[0] != operand_size[1])
>>> +    {
>>> +      smallest = operand_size[0] > operand_size[1];
>>> +
>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>> +    }
>>> +
>>> +  /* Register the operand use, if necessary.  */
>>> +  for (i = 0; i < nr_operands; ++i)
>>> +    if (!operand_reg[i])
>>> +      note_use (&XEXP (use, i), pattern);
>>> +    else if (!operand_ignore[i])
>>> +      {
>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>> +			  &XEXP (use, i));
>>> +	register_prop (set, p);
>>> +      }
>>> +}
>>> +
>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>> +
>>> +static void
>>> +register_promoted_subreg (rtx subreg)
>>> +{
>>> +  int index = REGNO (SUBREG_REG (subreg));
>>> +
>>> +  if (promoted_subreg[index] == NULL)
>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>> +
>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>> +}
>>> +
>>> +/* Note promoted subregs in X.  */
>>> +
>>> +static int
>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>> +{
>>> +  rtx subreg = *x;
>>> +
>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>> +      && REG_P (SUBREG_REG (subreg)))
>>> +    register_promoted_subreg (subreg);
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>> +
>>> +static void
>>> +note_use (rtx *x, void *data)
>>> +{
>>> +  rtx use = *x;
>>> +  rtx pattern = (rtx)data;
>>> +  int use_size, use_offset;
>>> +  unsigned int use_regno;
>>> +  rtx set;
>>> +  use_type *p;
>>> +
>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>> +
>>> +  set = get_set (use, pattern);
>>> +
>>> +  switch (GET_CODE (use))
>>> +    {
>>> +    case REG:
>>> +    case SUBREG:
>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>> +	{
>>> +	  note_embedded_uses (use, pattern);
>>> +	  return;
>>> +	}
>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>> +      register_prop (set, p);
>>> +      return;
>>> +    case SIGN_EXTEND:
>>> +    case ZERO_EXTEND:
>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>> +	{
>>> +	  note_embedded_uses (use, pattern);
>>> +	  return;
>>> +	}
>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>> +      register_prop (set, p);
>>> +      return;
>>> +    case IOR:
>>> +    case AND:
>>> +    case XOR:
>>> +    case PLUS:
>>> +    case MINUS:
>>> +      note_restricted_op_use (set, use, 2, pattern);
>>> +      return;
>>> +    case NOT:
>>> +    case NEG:
>>> +      note_restricted_op_use (set, use, 1, pattern);
>>> +      return;
>>> +    case ASHIFT:
>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>> +	{
>>> +	  note_embedded_uses (use, pattern);
>>> +	  return;
>>> +	}
>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>> +			  use_offset, x);
>>> +      return;
>>> +    default:
>>> +      note_embedded_uses (use, pattern);
>>> +      return;
>>> +    }
>>> +}
>>> +
>>> +/* Check whether reg REGNO is implicitly used.  */
>>> +
>>> +static bool
>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>> +{
>>> +#ifdef EPILOGUE_USES
>>> +  if (EPILOGUE_USES (regno))
>>> +    return true;
>>> +#endif
>>> +
>>> +#ifdef EH_USES
>>> +  if (EH_USES (regno))
>>> +    return true;
>>> +#endif
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>> +
>>> +static bool
>>> +skip_reg_p (int regno)
>>> +{
>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>> +     We don't handle that properly.  */
>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>> +}
>>> +
>>> +/* Note the uses of argument registers in call INSN.  */
>>> +
>>> +static void
>>> +note_call_uses (rtx insn)
>>> +{
>>> +  rtx link, link_expr;
>>> +
>>> +  if (!CALL_P (insn))
>>> +    return;
>>> +
>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>> +    {
>>> +      link_expr = XEXP (link, 0);
>>> +
>>> +      if (GET_CODE (link_expr) == USE)
>>> +	note_use (&XEXP (link_expr, 0), link);
>>> +    }
>>> +}
>>> +
>>> +/* Dump the biggest uses found.  */
>>> +
>>> +static void
>>> +dump_biggest_use (void)
>>> +{
>>> +  int i;
>>> +
>>> +  if (!dump_file)
>>> +    return;
>>> +
>>> +  fprintf (dump_file, "biggest_use:\n");
>>> +
>>> +  for (i = 0; i < n_regs; i++)
>>> +    if (biggest_use[i] > 0)
>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>> +
>>> +  fprintf (dump_file, "\n");
>>> +}
>>> +
>>> +/* Calculate the biggest use mode for all regs.  */
>>> +
>>> +static void
>>> +calculate_biggest_use (void)
>>> +{
>>> +  basic_block bb;
>>> +  rtx insn;
>>> +
>>> +  /* For all insns, call note_use for each use in insn.  */
>>> +  FOR_EACH_BB (bb)
>>> +    FOR_BB_INSNS (bb, insn)
>>> +      {
>>> +	if (!NONDEBUG_INSN_P (insn))
>>> +	  continue;
>>> +
>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>> +
>>> +	if (CALL_P (insn))
>>> +	  note_call_uses (insn);
>>> +      }
>>> +
>>> +  dump_biggest_use ();
>>> +}
>>> +
>>> +/* Register a propagation USE in SET in the props vector.  */
>>> +
>>> +static void
>>> +register_prop (rtx set, use_type *use)
>>> +{
>>> +  prop_type *p;
>>> +  int regno;
>>> +
>>> +  if (set == NULL_RTX || use == NULL)
>>> +    return;
>>> +
>>> +  if (!REG_P (SET_DEST (set)))
>>> +    return;
>>> +
>>> +  regno = REGNO (SET_DEST (set));
>>> +
>>> +  if (skip_reg_p (regno))
>>> +    return;
>>> +
>>> +  if (props[regno] == NULL)
>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>> +
>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>> +  p = VEC_last (prop_type, props[regno]);
>>> +  p->set = set;
>>> +  p->uses_regno = use->regno;
>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>> +}
>>> +
>>> +/* Add REGNO to the worklist.  */
>>> +
>>> +static void
>>> +add_to_wl (int regno)
>>> +{
>>> +  if (in_wl[regno])
>>> +    return;
>>> +
>>> +  if (biggest_use[regno] > 0
>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>> +    return;
>>> +
>>> +  if (VEC_empty (prop_type, props[regno]))
>>> +    return;
>>> +
>>> +  if (propagated_size[regno] != NONE
>>> +      && propagated_size[regno] == biggest_use[regno])
>>> +    return;
>>> +
>>> +  VEC_safe_push (int, heap, wl, regno);
>>> +  in_wl[regno] = true;
>>> +}
>>> +
>>> +/* Pop a reg from the worklist and return it.  */
>>> +
>>> +static int
>>> +pop_wl (void)
>>> +{
>>> +  int regno = VEC_pop (int, wl);
>>> +  in_wl[regno] = false;
>>> +  return regno;
>>> +}
>>> +
>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>> +
>>> +static int
>>> +propagate_size (int dest_size, use_type *p)
>>> +{
>>> +  if (dest_size == 0)
>>> +    return 0;
>>> +
>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>> +}
>>> +
>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>> +
>>> +static int
>>> +get_biggest_use (unsigned int regno)
>>> +{
>>> +  int ix;
>>> +  use_type *p;
>>> +  int max = 0;
>>> +
>>> +  gcc_assert (uses[regno] != NULL);
>>> +
>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>> +    max = MAX (max, p->size);
>>> +
>>> +  return max;
>>> +}
>>> +
>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>> +
>>> +static void
>>> +propagate_to_use (int dest_size, use_type *use)
>>> +{
>>> +  int new_use_size;
>>> +  int prev_biggest_use;
>>> +  int *current;
>>> +
>>> +  new_use_size = propagate_size (dest_size, use);
>>> +
>>> +  if (new_use_size >= use->size)
>>> +    return;
>>> +
>>> +  use->size = new_use_size;
>>> +
>>> +  current = &biggest_use[use->regno];
>>> +
>>> +  prev_biggest_use = *current;
>>> +  *current = get_biggest_use (use->regno);
>>> +
>>> +  if (*current >= prev_biggest_use)
>>> +    return;
>>> +
>>> +  add_to_wl (use->regno);
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>> +	     *current);
>>> +
>>> +}
>>> +
>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>> +   propagations in NR_PROPAGATIONS.  */
>>> +
>>> +static void
>>> +propagate_to_uses (int regno, int *nr_propagations)
>>> +{
>>> +  int ix;
>>> +  prop_type *p;
>>> +
>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>> +
>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>> +    {
>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>> +      propagate_to_use (biggest_use[regno], use);
>>> +      ++(*nr_propagations);
>>> +    }
>>> +
>>> +  propagated_size[regno] = biggest_use[regno];
>>> +}
>>> +
>>> +/* Improve biggest_use array iteratively.  */
>>> +
>>> +static void
>>> +propagate (void)
>>> +{
>>> +  int i;
>>> +  int nr_propagations = 0;
>>> +
>>> +  /* Initialize work list.  */
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    add_to_wl (i);
>>> +
>>> +  /* Work the work list.  */
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "propagations: \n");
>>> +  while (!VEC_empty (int, wl))
>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>> +}
>>> +
>>> +/* Check whether this is a sign/zero extension.  */
>>> +
>>> +static bool
>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>> +{
>>> +  rtx src, op0;
>>> +
>>> +  /* Detect set of reg.  */
>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>> +    return false;
>>> +
>>> +  src = SET_SRC (PATTERN (insn));
>>> +  *dest = SET_DEST (PATTERN (insn));
>>> +
>>> +  if (!REG_P (*dest))
>>> +    return false;
>>> +
>>> +  /* Detect sign or zero extension.  */
>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>> +    {
>>> +      op0 = XEXP (src, 0);
>>> +
>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>> +      if (GET_CODE (src) == AND)
>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>> +      else
>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>> +
>>> +      if (GET_CODE (op0) == SUBREG)
>>> +	{
>>> +	  if (subreg_lsb (op0) != 0)
>>> +	    return false;
>>> +
>>> +	  *inner = SUBREG_REG (op0);
>>> +
>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>> +	    return false;
>>> +
>>> +	  return true;
>>> +	}
>>> +      else if (REG_P (op0))
>>> +	{
>>> +	  *inner = op0;
>>> +
>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>> +	    return false;
>>> +
>>> +	  return true;
>>> +	}
>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>> +	{
>>> +	  *inner = XEXP (op0, 0);
>>> +	  return true;
>>> +	}
>>> +    }
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Find extensions and store them in the extensions vector.  */
>>> +
>>> +static bool
>>> +find_extensions (void)
>>> +{
>>> +  basic_block bb;
>>> +  rtx insn, dest, inner;
>>> +  int preserved_size;
>>> +
>>> +  /* For all insns, call note_use for each use in insn.  */
>>> +  FOR_EACH_BB (bb)
>>> +    FOR_BB_INSNS (bb, insn)
>>> +      {
>>> +	if (!NONDEBUG_INSN_P (insn))
>>> +	  continue;
>>> +
>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>> +	  continue;
>>> +
>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>> +
>>> +	if (dump_file)
>>> +	  fprintf (dump_file,
>>> +		   "found extension %u with preserved size %d defining"
>>> +		   " reg %d\n",
>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>> +      }
>>> +
>>> +  if (dump_file)
>>> +    {
>>> +      if (!VEC_empty (rtx, extensions))
>>> +	fprintf (dump_file, "\n");
>>> +      else
>>> +	fprintf (dump_file, "no extensions found.\n");
>>> +    }
>>> +
>>> +  return !VEC_empty (rtx, extensions);
>>> +}
>>> +
>>> +/* Check whether this is a redundant sign/zero extension.  */
>>> +
>>> +static bool
>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>> +{
>>> +  int biggest_dest_use;
>>> +
>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>> +    gcc_unreachable ();
>>> +
>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>> +
>>> +  if (biggest_dest_use == SKIP_REG)
>>> +    return false;
>>> +
>>> +  if (*preserved_size < biggest_dest_use)
>>> +    return false;
>>> +
>>> +  return true;
>>> +}
>>> +
>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>> +   redundant_extensions vector.  */
>>> +
>>> +static void
>>> +find_redundant_extensions (void)
>>> +{
>>> +  rtx insn, dest, inner;
>>> +  int ix;
>>> +  bool found = false;
>>> +  int preserved_size;
>>> +
>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>> +      {
>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>> +
>>> +	if (dump_file)
>>> +	  fprintf (dump_file,
>>> +		   "found redundant extension %u with preserved size %d"
>>> +		   " defining reg %d\n",
>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>> +	found = true;
>>> +      }
>>> +
>>> +  if (dump_file && found)
>>> +    fprintf (dump_file, "\n");
>>> +}
>>> +
>>> +/* Reset promotion of subregs or REG.  */
>>> +
>>> +static void
>>> +reset_promoted_subreg (rtx reg)
>>> +{
>>> +  int ix;
>>> +  rtx subreg;
>>> +
>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>> +    return;
>>> +
>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>> +    {
>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>> +    }
>>> +
>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>> +}
>>> +
>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>> +   writes to DEST.  */
>>> +
>>> +static void
>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>> +{
>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>> +
>>> +  /* Check whether replacement is needed.  */
>>> +  if (dest != inner)
>>> +    {
>>> +      start_sequence ();
>>> +
>>> +      /* Determine the proper replacement operation.  */
>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>> +	{
>>> +	  cp_src = inner;
>>> +	  cp_dest = dest;
>>> +	}
>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>> +	{
>>> +	  emit_clobber (dest);
>>> +	  cp_src = inner;
>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>> +	}
>>> +      else
>>> +	{
>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>> +	  cp_dest = dest;
>>> +	}
>>> +
>>> +      emit_move_insn (cp_dest, cp_src);
>>> +
>>> +      seq = get_insns ();
>>> +      end_sequence ();
>>> +
>>> +      /* If the replacement is not supported, bail out.  */
>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>> +	  return;
>>> +
>>> +      /* Insert the replacement.  */
>>> +      emit_insn_before (seq, insn);
>>> +    }
>>> +
>>> +  /* Note replacement/removal in the dump.  */
>>> +  if (dump_file)
>>> +    {
>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>> +      if (dest != inner)
>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>> +      else
>>> +	fprintf (dump_file, "removed\n");
>>> +    }
>>> +
>>> +  /* Remove the extension.  */
>>> +  delete_insn (insn);
>>> +
>>> +  reset_promoted_subreg (dest);
>>> +}
>>> +
>>> +/* Setup the variables at the start of the pass.  */
>>> +
>>> +static void
>>> +init_pass (void)
>>> +{
>>> +  int i;
>>> +
>>> +  biggest_use = XNEWVEC (int, n_regs);
>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>> +  propagated_size = XNEWVEC (int, n_regs);
>>> +
>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>> +  for (i = 0; i < n_regs; i++)
>>> +    {
>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>> +      propagated_size[i] = NONE;
>>> +    }
>>> +
>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>> +
>>> +  wl = VEC_alloc (int, heap, 50);
>>> +  in_wl = XNEWVEC (bool, n_regs);
>>> +
>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    {
>>> +      uses[i] = NULL;
>>> +      props[i] = NULL;
>>> +      in_wl[i] = false;
>>> +    }
>>> +}
>>> +
>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>> +
>>> +static void
>>> +remove_redundant_extensions (void)
>>> +{
>>> +  rtx insn, dest, inner;
>>> +  int preserved_size;
>>> +  int ix;
>>> +
>>> +  if (!find_extensions ())
>>> +    return;
>>> +
>>> +  calculate_biggest_use ();
>>> +
>>> +  find_redundant_extensions ();
>>> +
>>> +  if (!VEC_empty (rtx, extensions))
>>> +    {
>>> +      propagate ();
>>> +
>>> +      find_redundant_extensions ();
>>> +    }
>>> +
>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>> +
>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>> +    {
>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>> +    }
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "\n");
>>> +}
>>> +
>>> +/* Free the variables at the end of the pass.  */
>>> +
>>> +static void
>>> +finish_pass (void)
>>> +{
>>> +  int i;
>>> +
>>> +  XDELETEVEC (propagated_size);
>>> +
>>> +  VEC_free (rtx, heap, extensions);
>>> +  VEC_free (rtx, heap, redundant_extensions);
>>> +
>>> +  VEC_free (int, heap, wl);
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    {
>>> +      if (uses[i] != NULL)
>>> +	VEC_free (use_type, heap, uses[i]);
>>> +
>>> +      if (props[i] != NULL)
>>> +	VEC_free (prop_type, heap, props[i]);
>>> +    }
>>> +
>>> +  XDELETEVEC (uses);
>>> +  XDELETEVEC (props);
>>> +  XDELETEVEC (biggest_use);
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    if (promoted_subreg[i] != NULL)
>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>> +  XDELETEVEC (promoted_subreg);
>>> +}
>>> +
>>> +/* Remove redundant extensions.  */
>>> +
>>> +static unsigned int
>>> +rest_of_handle_ee (void)
>>> +{
>>> +  n_regs = max_reg_num ();
>>> +
>>> +  init_pass ();
>>> +  remove_redundant_extensions ();
>>> +  finish_pass ();
>>> +  return 0;
>>> +}
>>> +
>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>> +
>>> +static bool
>>> +gate_handle_ee (void)
>>> +{
>>> +  return (optimize > 0 && flag_ee);
>>> +}
>>> +
>>> +struct rtl_opt_pass pass_ee =
>>> +{
>>> + {
>>> +  RTL_PASS,
>>> +  "ee",                                 /* name */
>>> +  gate_handle_ee,                       /* gate */
>>> +  rest_of_handle_ee,                    /* execute */
>>> +  NULL,                                 /* sub */
>>> +  NULL,                                 /* next */
>>> +  0,                                    /* static_pass_number */
>>> +  TV_EE,                                /* tv_id */
>>> +  0,                                    /* properties_required */
>>> +  0,                                    /* properties_provided */
>>> +  0,                                    /* properties_destroyed */
>>> +  0,                                    /* todo_flags_start */
>>> +  TODO_ggc_collect |
>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>> + }
>>> +};
>>> Index: gcc/common.opt
>>> ===================================================================
>>> --- gcc/common.opt (revision 189409)
>>> +++ gcc/common.opt (working copy)
>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>> Perform DWARF2 duplicate elimination
>>>
>>> +fextension-elimination
>>> +Common Report Var(flag_ee) Init(0) Optimization
>>> +Perform extension elimination
>>> +
>>> fipa-sra
>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>> Perform interprocedural reduction of aggregates
>>> Index: gcc/Makefile.in
>>> ===================================================================
>>> --- gcc/Makefile.in (revision 189409)
>>> +++ gcc/Makefile.in (working copy)
>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>> 	dwarf2asm.o \
>>> 	dwarf2cfi.o \
>>> 	dwarf2out.o \
>>> +	ee.o \
>>> 	ebitmap.o \
>>> 	emit-rtl.o \
>>> 	et-forest.o \
>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>     $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>     intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>     $(DF_H) $(CFGLOOP_H)
>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>> +   $(PARAMS_H) $(CGRAPH_H)
>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>     $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>     $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>> Index: gcc/passes.c
>>> ===================================================================
>>> --- gcc/passes.c (revision 189409)
>>> +++ gcc/passes.c (working copy)
>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>        NEXT_PASS (pass_initialize_regs);
>>>        NEXT_PASS (pass_ud_rtl_dce);
>>>        NEXT_PASS (pass_combine);
>>> +      NEXT_PASS (pass_ee);
>>>        NEXT_PASS (pass_if_after_combine);
>>>        NEXT_PASS (pass_partition_blocks);
>>>        NEXT_PASS (pass_regmove);
>>
>>
> 
>
Kenneth Zadeck - July 12, 2012, 12:04 p.m.
sorry about the two messages. i mis spelled the gcc-patches on the first 
try.

you are on the right track with the example but combine will not get 
this unless everything is in the same bb.
the whole point of having a separate pass for doing extension 
elimination is that it needs to be done over the entire function.

my example is also a little more complex because, since we are talking 
about induction vars, you have an initial assignment outside of a loop, 
and increment inside the loop and the test you describe at the bottom of 
the loop.

I would point out that with respect to speed optimizations, the case i 
am describing is in fact very important because getting code out of 
loops is were the important gains are.   I believe that the ppc has a 
some significant performance issues because of this kind of thing.

kenny


On 07/12/2012 05:20 AM, Tom de Vries wrote:
> Kenneth,
>
> I see I replied to your original message that had the wrong CC, I'm now CC-ing
> gcc-patches@gcc.gnu.org.
>
> Thanks,
> - Tom
>
> On 12/07/12 11:05, Tom de Vries wrote:
>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>> Tom,
>>>
>>> I have a problem with the approach that you have taken here.   I believe
>>> that this could be a very useful addition to gcc so I am in general very
>>> supportive, but i think you are missing an important case.
>>>
>>> My problem is that it the pass does not actually look at the target and
>>> make any decisions based on that target.
>>>
>>> for instance, we have a llp64 target.   As with many targets, the target
>>> has a rich set of compare and branch instructions.  In particular, it
>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>> upstream optimizations that take int (SI mode) index variables generate
>>> extension operations before doing 64 bit compare and branch
>>> instructions, even though there are 32 bit comparison and branches on
>>> the machine.     There are a lot of machines that can do more than one
>>> size of comparison.
>>>
>> 	 This optimization pass, as it is currently written will not remove those
>>> extensions because it believes that the length of the destination is the
>>> "final answer" unless it is wrapped in an explicit truncation.
>>> Instead it needs to ask the port if there is a shorted compare and
>>> branch instruction that does not cost more. in that case, those
>>> instructions should be rewritten to use the shorted compare and branch.
>>>
>>> There are many operations other than compare and branch where the pass
>>> should be asking "can i shorten the target for free and therefore get
>>> rid of the extension?"
>> Kenneth,
>>
>> I'm not sure I understand the optimization you're talking about, in particular
>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>> is the same.
>>
>> Assuming it's the same, my understanding is that you're talking about an example
>> like this:
>> ...
>>    (insn (set (reg:DI 5)
>>               (zero_extend:DI (reg:SI 4))))
>>
>>    (jump_insn (set (pc)
>>                    (if_then_else (eq (reg:DI 5)
>>                                      (const_int 0))
>>                                  (label_ref:DI 62)
>>                                  (pc))))
>>
>>    ->
>>
>>    (jump_insn (set (pc)
>>                    (if_then_else (eq (reg:SI 4)
>>                                      (const_int 0))
>>                                  (label_ref:DI 62)
>>                                  (pc))))
>>
>> ...
>> I would expect combine to optimize this.
>>
>> In case I got the example all backwards or it is a too simple one, please
>> provide an rtl example that illustrates the optimization.
>>
>> Thanks,
>> - Tom
>>
>>
>>>   right shifts, rotates, and stores are not in
>>> this class, but left shifts are as are all comparisons, compare and
>>> branches, conditional moves.   There may even be machines that have this
>>> for divide, but i do not know of any off the top of my head.
>>>
>>> What i am suggesting moves this pass into the target specific set of
>>> optimizations rather than target independent set, but at where this pass
>>> is to be put this is completely appropriate.    Any dest instruction
>>> where all of the operands have been extended should be checked to see if
>>> it was really necessary to use the longer form before doing the
>>> propagation pass.
>>>
>>> kenny
>>>
>>>
>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>> I profiled the pass on spec2000:
>>>>>>
>>>>>>                      -mabi=32     -mabi=64
>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>> total   (usr time):   919.30       879.26
>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>
>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>
>>>>>> Is it necessary to improve the runtime of this pass?
>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>> currently written, yours doesn't.
>>>>>
>>>> Eric,
>>>>
>>>> I've done the following:
>>>> - refactored the pass such that it now scans at most twice over all
>>>>    instructions.
>>>> - updated the patch to be applicable to current trunk
>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>    this thread), and added that one as test-case.
>>>> - added a part in the header comment illustrating the working of the pass
>>>>    on the motivating example.
>>>>
>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>
>>>> build and reg-tested on mips, mips64, and arm.
>>>>
>>>> OK for trunk?
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>
>>>> 	* ee.c: New file.
>>>> 	* tree-pass.h (pass_ee): Declare.
>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>> 	* timevar.def (TV_EE): New timevar.
>>>> 	* common.opt (fextension-elimination): New option.
>>>> 	* Makefile.in (ee.o): New rule.
>>>> 	* passes.c (pass_ee): Add it.
>>>>
>>>> 	* gcc.dg/extend-1.c: New test.
>>>> 	* gcc.dg/extend-2.c: Same.
>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>> 	* gcc.dg/extend-3.c: Same.
>>>> 	* gcc.dg/extend-4.c: Same.
>>>> 	* gcc.dg/extend-5.c: Same.
>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>> Index: gcc/tree-pass.h
>>>> ===================================================================
>>>> --- gcc/tree-pass.h (revision 189409)
>>>> +++ gcc/tree-pass.h (working copy)
>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>
>>>> extern struct rtl_opt_pass pass_expand;
>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>> +extern struct rtl_opt_pass pass_ee;
>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>> extern struct rtl_opt_pass pass_jump;
>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>> @@ -5,19 +5,19 @@
>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>
>>>> -NOMIPS16 int
>>>> -f (int n, int i)
>>>> +NOMIPS16 long int
>>>> +f (long int n, long int i)
>>>> {
>>>> -  int s = 0;
>>>> +  long int s = 0;
>>>>     for (; i & 1; i++)
>>>>       s += i;
>>>>     return s;
>>>> }
>>>>
>>>> -NOMIPS16 int
>>>> -g (int n, int i)
>>>> +NOMIPS16 long int
>>>> +g (long int n, long int i)
>>>> {
>>>> -  int s = 0;
>>>> +  long int s = 0;
>>>>     for (i = 0; i < n; i++)
>>>>       s += i;
>>>>     return s;
>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>> @@ -0,0 +1,16 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +unsigned char f(unsigned int a, int c)
>>>> +{
>>>> +  unsigned int b = a;
>>>> +  if (c)
>>>> +    b = a & 0x10ff;
>>>> +  return b;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>> +{
>>>> +  if (c)
>>>> +    *z = 0;
>>>> +  *p ^= (unsigned char)s;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +void f (short d[2][2])
>>>> +{
>>>> +  int d0 = d[0][0] + d[0][1];
>>>> +  int d1 = d[1][0] + d[1][1];
>>>> +  d[0][0] = d0 + d1;
>>>> +      d[0][1] = d0 - d1;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>> @@ -0,0 +1,20 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +/* { dg-require-effective-target ilp32 } */
>>>> +
>>>> +void f(unsigned char * p, short *s, int c)
>>>> +{
>>>> +  short or = 0;
>>>> +  while (c)
>>>> +    {
>>>> +      or = or | s[c];
>>>> +      c --;
>>>> +    }
>>>> +  *p = (unsigned char)or;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>> @@ -0,0 +1,20 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>> +/* { dg-require-effective-target mips64 } */
>>>> +
>>>> +void f(unsigned char * p, short *s, int c)
>>>> +{
>>>> +  short or = 0;
>>>> +  while (c)
>>>> +    {
>>>> +      or = or | s[c];
>>>> +      c --;
>>>> +    }
>>>> +  *p = (unsigned char)or;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>> +/* { dg-require-effective-target mips64 } */
>>>> +
>>>> +unsigned int f(unsigned char byte)
>>>> +{
>>>> +  return byte << 25;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/opts.c
>>>> ===================================================================
>>>> --- gcc/opts.c (revision 189409)
>>>> +++ gcc/opts.c (working copy)
>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>       { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>       { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>       { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>
>>>>       /* -O3 optimizations.  */
>>>>       { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>> Index: gcc/timevar.def
>>>> ===================================================================
>>>> --- gcc/timevar.def (revision 189409)
>>>> +++ gcc/timevar.def (working copy)
>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>> Index: gcc/ee.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/ee.c (revision 0)
>>>> @@ -0,0 +1,1190 @@
>>>> +/* Redundant extension elimination.
>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>> +the terms of the GNU General Public License as published by the Free
>>>> +Software Foundation; either version 3, or (at your option) any later
>>>> +version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>> +for more details.
>>>> +
>>>> +You should have received a copy of the GNU General Public License
>>>> +along with GCC; see the file COPYING3.  If not see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +/*
>>>> +
>>>> +  MOTIVATING EXAMPLE
>>>> +
>>>> +  The motivating example for this pass is the example from PR 40893:
>>>> +
>>>> +    void f (short d[2][2])
>>>> +    {
>>>> +      int d0 = d[0][0] + d[0][1];
>>>> +      int d1 = d[1][0] + d[1][1];
>>>> +      d[0][0] = d0 + d1;
>>>> +      d[0][1] = d0 - d1;
>>>> +    }
>>>> +
>>>> +  For MIPS, compilation results in the following insns.
>>>> +
>>>> +    (set (reg:SI 204)
>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>> +
>>>> +    (set (reg:SI 205)
>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>> +
>>>> +    (set (reg:SI 217)
>>>> +         (plus:SI (reg:SI 205)
>>>> +                  (reg:SI 204)))
>>>> +
>>>> +    (set (reg:SI 218)
>>>> +         (minus:SI (reg:SI 204)
>>>> +                   (reg:SI 205)))
>>>> +
>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>> +         (subreg:HI (reg:SI 217) 2))
>>>> +
>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>> +                 (const_int 2 [0x2])))
>>>> +         (subreg:HI (reg:SI 218) 2))
>>>> +
>>>> +
>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>> +  204 and 205, so the zero_extends are redundant.
>>>> +
>>>> +
>>>> +  INTENDED EFFECT
>>>> +
>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>> +  than the extension.
>>>> +
>>>> +
>>>> +  IMPLEMENTATION
>>>> +
>>>> +  The pass scans at most two times over all instructions.
>>>> +
>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>> +  done.
>>>> +
>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>> +  operands of the insns defining the pseudo.
>>>> +
>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>> +  of each reg, which allows us to find redundant extensions.
>>>> +
>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>> +  which we may find more redundant extensions.
>>>> +
>>>> +  Finally, redundant extensions are deleted or replaced.
>>>> +
>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>> +  of a paradoxical subreg instead.
>>>> +
>>>> +
>>>> +  ILLUSTRATION OF PASS
>>>> +
>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>> +
>>>> +  We find the 2 extensions:
>>>> +    found extension with preserved size 16 defining reg 204
>>>> +    found extension with preserved size 16 defining reg 205
>>>> +
>>>> +  We calculate the biggests uses of a register:
>>>> +    biggest_use
>>>> +    reg 204: size 32
>>>> +    reg 205: size 32
>>>> +    reg 217: size 16
>>>> +    reg 218: size 16
>>>> +
>>>> +  We propagate the biggest uses where possible:
>>>> +    propagations
>>>> +    205: 32 -> 16
>>>> +    204: 32 -> 16
>>>> +    214: 32 -> 16
>>>> +    215: 32 -> 16
>>>> +
>>>> +  We conclude that the extensions are redundant:
>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>> +
>>>> +  And we replace them with regcopies:
>>>> +    (set (reg:SI 204)
>>>> +        (reg:SI 213))
>>>> +
>>>> +    (set (reg:SI 205)
>>>> +        (reg:SI 216))
>>>> +
>>>> +
>>>> +  LIMITATIONS
>>>> +
>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>> +  done.
>>>> +
>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>> +  account, also the ones that are not uses of the extension.
>>>> +  The consideration is that using use-def chains will give a more precise
>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>> +
>>>> +#include "config.h"
>>>> +#include "system.h"
>>>> +#include "coretypes.h"
>>>> +#include "tm.h"
>>>> +#include "rtl.h"
>>>> +#include "tree.h"
>>>> +#include "tm_p.h"
>>>> +#include "flags.h"
>>>> +#include "regs.h"
>>>> +#include "hard-reg-set.h"
>>>> +#include "basic-block.h"
>>>> +#include "insn-config.h"
>>>> +#include "function.h"
>>>> +#include "expr.h"
>>>> +#include "insn-attr.h"
>>>> +#include "recog.h"
>>>> +#include "toplev.h"
>>>> +#include "target.h"
>>>> +#include "timevar.h"
>>>> +#include "optabs.h"
>>>> +#include "insn-codes.h"
>>>> +#include "rtlhooks-def.h"
>>>> +#include "output.h"
>>>> +#include "params.h"
>>>> +#include "timevar.h"
>>>> +#include "tree-pass.h"
>>>> +#include "cgraph.h"
>>>> +#include "vec.h"
>>>> +
>>>> +#define SKIP_REG (-1)
>>>> +#define NONE (-1)
>>>> +
>>>> +/* Number of registers at start of pass.  */
>>>> +
>>>> +static int n_regs;
>>>> +
>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>> +
>>>> +static int *biggest_use;
>>>> +
>>>> +/* Array to register the promoted subregs.  */
>>>> +
>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>> +
>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>> +
>>>> +static int *propagated_size;
>>>> +
>>>> +typedef struct use
>>>> +{
>>>> +  int regno;
>>>> +  int size;
>>>> +  int offset;
>>>> +  rtx *use;
>>>> +} use_type;
>>>> +
>>>> +DEF_VEC_O(use_type);
>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>> +
>>>> +/* Vector to register the uses.  */
>>>> +
>>>> +static VEC (use_type,heap) **uses;
>>>> +
>>>> +typedef struct prop
>>>> +{
>>>> +  rtx set;
>>>> +  int uses_regno;
>>>> +  int uses_index;
>>>> +} prop_type;
>>>> +
>>>> +DEF_VEC_O(prop_type);
>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>> +
>>>> +/* Vector to register the propagations.  */
>>>> +
>>>> +static VEC (prop_type,heap) **props;
>>>> +
>>>> +/* Work list for propragation.  */
>>>> +
>>>> +static VEC (int,heap) *wl;
>>>> +
>>>> +/* Array to register what regs are in the work list.  */
>>>> +
>>>> +static bool *in_wl;
>>>> +
>>>> +/* Vector that contains the extensions in the function.  */
>>>> +
>>>> +static VEC (rtx,heap) *extensions;
>>>> +
>>>> +/* Vector that contains the extensions in the function that are going to be
>>>> +   removed or replaced.  */
>>>> +
>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>> +
>>>> +/* Forward declaration.  */
>>>> +
>>>> +static void note_use (rtx *x, void *data);
>>>> +static bool skip_reg_p (int regno);
>>>> +static void register_prop (rtx set, use_type *use);
>>>> +
>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>> +
>>>> +static bool
>>>> +promoted_subreg_p (rtx subreg)
>>>> +{
>>>> +  return (GET_CODE (subreg) == SUBREG
>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>> +}
>>>> +
>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>> +   promotion.  */
>>>> +
>>>> +static bool
>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>> +{
>>>> +  int mre;
>>>> +
>>>> +  if (!promoted_subreg_p (subreg))
>>>> +    return false;
>>>> +
>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>> +  return mre != UNKNOWN;
>>>> +}
>>>> +
>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>> +   OFFSET.  Return true if successful.  */
>>>> +
>>>> +static bool
>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>> +{
>>>> +  rtx reg;
>>>> +
>>>> +  if (REG_P (use))
>>>> +    {
>>>> +      *regno = REGNO (use);
>>>> +      *offset = 0;
>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>> +      return true;
>>>> +    }
>>>> +  else if (GET_CODE (use) == SUBREG)
>>>> +    {
>>>> +      reg = SUBREG_REG (use);
>>>> +
>>>> +      if (!REG_P (reg))
>>>> +	return false;
>>>> +
>>>> +      *regno = REGNO (reg);
>>>> +
>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>> +	{
>>>> +	  *offset = 0;
>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  *offset = subreg_lsb (use);
>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>> +	}
>>>> +
>>>> +      return true;
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>> +
>>>> +static use_type *
>>>> +new_use (unsigned int regno)
>>>> +{
>>>> +  if (uses[regno] == NULL)
>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>> +
>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>> +
>>>> +  return VEC_last (use_type, uses[regno]);
>>>> +}
>>>> +
>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>> +
>>>> +static use_type *
>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>> +{
>>>> +  int *current;
>>>> +  use_type *p;
>>>> +
>>>> +  gcc_assert (size >= 0);
>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>> +
>>>> +  if (skip_reg_p (regno))
>>>> +    return NULL;
>>>> +
>>>> +  p = new_use (regno);
>>>> +  p->regno = regno;
>>>> +  p->size = size;
>>>> +  p->offset = offset;
>>>> +  p->use = use;
>>>> +
>>>> +  /* Update the bigest use.  */
>>>> +  current = &biggest_use[regno];
>>>> +  *current = MAX (*current, size);
>>>> +
>>>> +  return p;
>>>> +}
>>>> +
>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>> +
>>>> +static void
>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>> +{
>>>> +  const char *format_ptr;
>>>> +  int i, j;
>>>> +
>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>> +    if (format_ptr[i] == 'e')
>>>> +      note_use (&XEXP (use, i), pattern);
>>>> +    else if (format_ptr[i] == 'E')
>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>> +}
>>>> +
>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>> +
>>>> +static rtx
>>>> +get_set (rtx use, rtx pattern)
>>>> +{
>>>> +  rtx sub;
>>>> +  int i;
>>>> +
>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>> +    return pattern;
>>>> +
>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>> +      {
>>>> +	sub = XVECEXP (pattern, 0, i);
>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>> +	  return sub;
>>>> +      }
>>>> +
>>>> +  return NULL_RTX;
>>>> +}
>>>> +
>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>> +   an operand influences only the same bit or more significant bits in the
>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>> +
>>>> +static void
>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>> +{
>>>> +  unsigned int i, smallest;
>>>> +  int operand_size[2];
>>>> +  int operand_offset[2];
>>>> +  int used_size;
>>>> +  unsigned int operand_regno[2];
>>>> +  bool operand_reg[2];
>>>> +  bool operand_ignore[2];
>>>> +  use_type *p;
>>>> +
>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>> +  for (i = 0; i < nr_operands; ++i)
>>>> +    {
>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>> +      operand_ignore[i] = false;
>>>> +    }
>>>> +
>>>> +  /* Handle case of reg and-masked with const.  */
>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>> +    {
>>>> +      used_size =
>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>> +    }
>>>> +
>>>> +  /* Handle case of reg or-masked with const.  */
>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>> +    {
>>>> +      used_size =
>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>> +    }
>>>> +
>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>> +    for (i = 0; i < nr_operands; ++i)
>>>> +      operand_ignore[i] = (operand_reg[i]
>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>> +
>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>> +      && operand_size[0] != operand_size[1])
>>>> +    {
>>>> +      smallest = operand_size[0] > operand_size[1];
>>>> +
>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>> +    }
>>>> +
>>>> +  /* Register the operand use, if necessary.  */
>>>> +  for (i = 0; i < nr_operands; ++i)
>>>> +    if (!operand_reg[i])
>>>> +      note_use (&XEXP (use, i), pattern);
>>>> +    else if (!operand_ignore[i])
>>>> +      {
>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>> +			  &XEXP (use, i));
>>>> +	register_prop (set, p);
>>>> +      }
>>>> +}
>>>> +
>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>> +
>>>> +static void
>>>> +register_promoted_subreg (rtx subreg)
>>>> +{
>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>> +
>>>> +  if (promoted_subreg[index] == NULL)
>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>> +
>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>> +}
>>>> +
>>>> +/* Note promoted subregs in X.  */
>>>> +
>>>> +static int
>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>> +{
>>>> +  rtx subreg = *x;
>>>> +
>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>> +    register_promoted_subreg (subreg);
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>> +
>>>> +static void
>>>> +note_use (rtx *x, void *data)
>>>> +{
>>>> +  rtx use = *x;
>>>> +  rtx pattern = (rtx)data;
>>>> +  int use_size, use_offset;
>>>> +  unsigned int use_regno;
>>>> +  rtx set;
>>>> +  use_type *p;
>>>> +
>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>> +
>>>> +  set = get_set (use, pattern);
>>>> +
>>>> +  switch (GET_CODE (use))
>>>> +    {
>>>> +    case REG:
>>>> +    case SUBREG:
>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>> +      register_prop (set, p);
>>>> +      return;
>>>> +    case SIGN_EXTEND:
>>>> +    case ZERO_EXTEND:
>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>> +      register_prop (set, p);
>>>> +      return;
>>>> +    case IOR:
>>>> +    case AND:
>>>> +    case XOR:
>>>> +    case PLUS:
>>>> +    case MINUS:
>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>> +      return;
>>>> +    case NOT:
>>>> +    case NEG:
>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>> +      return;
>>>> +    case ASHIFT:
>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>> +			  use_offset, x);
>>>> +      return;
>>>> +    default:
>>>> +      note_embedded_uses (use, pattern);
>>>> +      return;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>> +
>>>> +static bool
>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>> +{
>>>> +#ifdef EPILOGUE_USES
>>>> +  if (EPILOGUE_USES (regno))
>>>> +    return true;
>>>> +#endif
>>>> +
>>>> +#ifdef EH_USES
>>>> +  if (EH_USES (regno))
>>>> +    return true;
>>>> +#endif
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>> +
>>>> +static bool
>>>> +skip_reg_p (int regno)
>>>> +{
>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>> +     We don't handle that properly.  */
>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>> +}
>>>> +
>>>> +/* Note the uses of argument registers in call INSN.  */
>>>> +
>>>> +static void
>>>> +note_call_uses (rtx insn)
>>>> +{
>>>> +  rtx link, link_expr;
>>>> +
>>>> +  if (!CALL_P (insn))
>>>> +    return;
>>>> +
>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>> +    {
>>>> +      link_expr = XEXP (link, 0);
>>>> +
>>>> +      if (GET_CODE (link_expr) == USE)
>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Dump the biggest uses found.  */
>>>> +
>>>> +static void
>>>> +dump_biggest_use (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  if (!dump_file)
>>>> +    return;
>>>> +
>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>> +
>>>> +  for (i = 0; i < n_regs; i++)
>>>> +    if (biggest_use[i] > 0)
>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>> +
>>>> +  fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Calculate the biggest use mode for all regs.  */
>>>> +
>>>> +static void
>>>> +calculate_biggest_use (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx insn;
>>>> +
>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>> +  FOR_EACH_BB (bb)
>>>> +    FOR_BB_INSNS (bb, insn)
>>>> +      {
>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>> +	  continue;
>>>> +
>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>> +
>>>> +	if (CALL_P (insn))
>>>> +	  note_call_uses (insn);
>>>> +      }
>>>> +
>>>> +  dump_biggest_use ();
>>>> +}
>>>> +
>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>> +
>>>> +static void
>>>> +register_prop (rtx set, use_type *use)
>>>> +{
>>>> +  prop_type *p;
>>>> +  int regno;
>>>> +
>>>> +  if (set == NULL_RTX || use == NULL)
>>>> +    return;
>>>> +
>>>> +  if (!REG_P (SET_DEST (set)))
>>>> +    return;
>>>> +
>>>> +  regno = REGNO (SET_DEST (set));
>>>> +
>>>> +  if (skip_reg_p (regno))
>>>> +    return;
>>>> +
>>>> +  if (props[regno] == NULL)
>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>> +
>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>> +  p = VEC_last (prop_type, props[regno]);
>>>> +  p->set = set;
>>>> +  p->uses_regno = use->regno;
>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>> +}
>>>> +
>>>> +/* Add REGNO to the worklist.  */
>>>> +
>>>> +static void
>>>> +add_to_wl (int regno)
>>>> +{
>>>> +  if (in_wl[regno])
>>>> +    return;
>>>> +
>>>> +  if (biggest_use[regno] > 0
>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>> +    return;
>>>> +
>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>> +    return;
>>>> +
>>>> +  if (propagated_size[regno] != NONE
>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>> +    return;
>>>> +
>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>> +  in_wl[regno] = true;
>>>> +}
>>>> +
>>>> +/* Pop a reg from the worklist and return it.  */
>>>> +
>>>> +static int
>>>> +pop_wl (void)
>>>> +{
>>>> +  int regno = VEC_pop (int, wl);
>>>> +  in_wl[regno] = false;
>>>> +  return regno;
>>>> +}
>>>> +
>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>> +
>>>> +static int
>>>> +propagate_size (int dest_size, use_type *p)
>>>> +{
>>>> +  if (dest_size == 0)
>>>> +    return 0;
>>>> +
>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>> +}
>>>> +
>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>> +
>>>> +static int
>>>> +get_biggest_use (unsigned int regno)
>>>> +{
>>>> +  int ix;
>>>> +  use_type *p;
>>>> +  int max = 0;
>>>> +
>>>> +  gcc_assert (uses[regno] != NULL);
>>>> +
>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>> +    max = MAX (max, p->size);
>>>> +
>>>> +  return max;
>>>> +}
>>>> +
>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>> +
>>>> +static void
>>>> +propagate_to_use (int dest_size, use_type *use)
>>>> +{
>>>> +  int new_use_size;
>>>> +  int prev_biggest_use;
>>>> +  int *current;
>>>> +
>>>> +  new_use_size = propagate_size (dest_size, use);
>>>> +
>>>> +  if (new_use_size >= use->size)
>>>> +    return;
>>>> +
>>>> +  use->size = new_use_size;
>>>> +
>>>> +  current = &biggest_use[use->regno];
>>>> +
>>>> +  prev_biggest_use = *current;
>>>> +  *current = get_biggest_use (use->regno);
>>>> +
>>>> +  if (*current >= prev_biggest_use)
>>>> +    return;
>>>> +
>>>> +  add_to_wl (use->regno);
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>> +	     *current);
>>>> +
>>>> +}
>>>> +
>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>> +   propagations in NR_PROPAGATIONS.  */
>>>> +
>>>> +static void
>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>> +{
>>>> +  int ix;
>>>> +  prop_type *p;
>>>> +
>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>> +
>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>> +    {
>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>> +      propagate_to_use (biggest_use[regno], use);
>>>> +      ++(*nr_propagations);
>>>> +    }
>>>> +
>>>> +  propagated_size[regno] = biggest_use[regno];
>>>> +}
>>>> +
>>>> +/* Improve biggest_use array iteratively.  */
>>>> +
>>>> +static void
>>>> +propagate (void)
>>>> +{
>>>> +  int i;
>>>> +  int nr_propagations = 0;
>>>> +
>>>> +  /* Initialize work list.  */
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    add_to_wl (i);
>>>> +
>>>> +  /* Work the work list.  */
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "propagations: \n");
>>>> +  while (!VEC_empty (int, wl))
>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>> +}
>>>> +
>>>> +/* Check whether this is a sign/zero extension.  */
>>>> +
>>>> +static bool
>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>> +{
>>>> +  rtx src, op0;
>>>> +
>>>> +  /* Detect set of reg.  */
>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>> +    return false;
>>>> +
>>>> +  src = SET_SRC (PATTERN (insn));
>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>> +
>>>> +  if (!REG_P (*dest))
>>>> +    return false;
>>>> +
>>>> +  /* Detect sign or zero extension.  */
>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>> +    {
>>>> +      op0 = XEXP (src, 0);
>>>> +
>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>> +      if (GET_CODE (src) == AND)
>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>> +      else
>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>> +
>>>> +      if (GET_CODE (op0) == SUBREG)
>>>> +	{
>>>> +	  if (subreg_lsb (op0) != 0)
>>>> +	    return false;
>>>> +
>>>> +	  *inner = SUBREG_REG (op0);
>>>> +
>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>> +	    return false;
>>>> +
>>>> +	  return true;
>>>> +	}
>>>> +      else if (REG_P (op0))
>>>> +	{
>>>> +	  *inner = op0;
>>>> +
>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>> +	    return false;
>>>> +
>>>> +	  return true;
>>>> +	}
>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>> +	{
>>>> +	  *inner = XEXP (op0, 0);
>>>> +	  return true;
>>>> +	}
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Find extensions and store them in the extensions vector.  */
>>>> +
>>>> +static bool
>>>> +find_extensions (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx insn, dest, inner;
>>>> +  int preserved_size;
>>>> +
>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>> +  FOR_EACH_BB (bb)
>>>> +    FOR_BB_INSNS (bb, insn)
>>>> +      {
>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>> +	  continue;
>>>> +
>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>> +	  continue;
>>>> +
>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>> +
>>>> +	if (dump_file)
>>>> +	  fprintf (dump_file,
>>>> +		   "found extension %u with preserved size %d defining"
>>>> +		   " reg %d\n",
>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>> +      }
>>>> +
>>>> +  if (dump_file)
>>>> +    {
>>>> +      if (!VEC_empty (rtx, extensions))
>>>> +	fprintf (dump_file, "\n");
>>>> +      else
>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>> +    }
>>>> +
>>>> +  return !VEC_empty (rtx, extensions);
>>>> +}
>>>> +
>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>> +
>>>> +static bool
>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>> +{
>>>> +  int biggest_dest_use;
>>>> +
>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>> +    gcc_unreachable ();
>>>> +
>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>> +
>>>> +  if (biggest_dest_use == SKIP_REG)
>>>> +    return false;
>>>> +
>>>> +  if (*preserved_size < biggest_dest_use)
>>>> +    return false;
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>> +   redundant_extensions vector.  */
>>>> +
>>>> +static void
>>>> +find_redundant_extensions (void)
>>>> +{
>>>> +  rtx insn, dest, inner;
>>>> +  int ix;
>>>> +  bool found = false;
>>>> +  int preserved_size;
>>>> +
>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>> +      {
>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>> +
>>>> +	if (dump_file)
>>>> +	  fprintf (dump_file,
>>>> +		   "found redundant extension %u with preserved size %d"
>>>> +		   " defining reg %d\n",
>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>> +	found = true;
>>>> +      }
>>>> +
>>>> +  if (dump_file && found)
>>>> +    fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Reset promotion of subregs or REG.  */
>>>> +
>>>> +static void
>>>> +reset_promoted_subreg (rtx reg)
>>>> +{
>>>> +  int ix;
>>>> +  rtx subreg;
>>>> +
>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>> +    return;
>>>> +
>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>> +    {
>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>> +    }
>>>> +
>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>> +}
>>>> +
>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>> +   writes to DEST.  */
>>>> +
>>>> +static void
>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>> +{
>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>> +
>>>> +  /* Check whether replacement is needed.  */
>>>> +  if (dest != inner)
>>>> +    {
>>>> +      start_sequence ();
>>>> +
>>>> +      /* Determine the proper replacement operation.  */
>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>> +	{
>>>> +	  cp_src = inner;
>>>> +	  cp_dest = dest;
>>>> +	}
>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>> +	{
>>>> +	  emit_clobber (dest);
>>>> +	  cp_src = inner;
>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>> +	  cp_dest = dest;
>>>> +	}
>>>> +
>>>> +      emit_move_insn (cp_dest, cp_src);
>>>> +
>>>> +      seq = get_insns ();
>>>> +      end_sequence ();
>>>> +
>>>> +      /* If the replacement is not supported, bail out.  */
>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>> +	  return;
>>>> +
>>>> +      /* Insert the replacement.  */
>>>> +      emit_insn_before (seq, insn);
>>>> +    }
>>>> +
>>>> +  /* Note replacement/removal in the dump.  */
>>>> +  if (dump_file)
>>>> +    {
>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>> +      if (dest != inner)
>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>> +      else
>>>> +	fprintf (dump_file, "removed\n");
>>>> +    }
>>>> +
>>>> +  /* Remove the extension.  */
>>>> +  delete_insn (insn);
>>>> +
>>>> +  reset_promoted_subreg (dest);
>>>> +}
>>>> +
>>>> +/* Setup the variables at the start of the pass.  */
>>>> +
>>>> +static void
>>>> +init_pass (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>> +
>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>> +  for (i = 0; i < n_regs; i++)
>>>> +    {
>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>> +      propagated_size[i] = NONE;
>>>> +    }
>>>> +
>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>> +
>>>> +  wl = VEC_alloc (int, heap, 50);
>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>> +
>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    {
>>>> +      uses[i] = NULL;
>>>> +      props[i] = NULL;
>>>> +      in_wl[i] = false;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>> +
>>>> +static void
>>>> +remove_redundant_extensions (void)
>>>> +{
>>>> +  rtx insn, dest, inner;
>>>> +  int preserved_size;
>>>> +  int ix;
>>>> +
>>>> +  if (!find_extensions ())
>>>> +    return;
>>>> +
>>>> +  calculate_biggest_use ();
>>>> +
>>>> +  find_redundant_extensions ();
>>>> +
>>>> +  if (!VEC_empty (rtx, extensions))
>>>> +    {
>>>> +      propagate ();
>>>> +
>>>> +      find_redundant_extensions ();
>>>> +    }
>>>> +
>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>> +
>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>> +    {
>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>> +    }
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Free the variables at the end of the pass.  */
>>>> +
>>>> +static void
>>>> +finish_pass (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  XDELETEVEC (propagated_size);
>>>> +
>>>> +  VEC_free (rtx, heap, extensions);
>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>> +
>>>> +  VEC_free (int, heap, wl);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    {
>>>> +      if (uses[i] != NULL)
>>>> +	VEC_free (use_type, heap, uses[i]);
>>>> +
>>>> +      if (props[i] != NULL)
>>>> +	VEC_free (prop_type, heap, props[i]);
>>>> +    }
>>>> +
>>>> +  XDELETEVEC (uses);
>>>> +  XDELETEVEC (props);
>>>> +  XDELETEVEC (biggest_use);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    if (promoted_subreg[i] != NULL)
>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>> +  XDELETEVEC (promoted_subreg);
>>>> +}
>>>> +
>>>> +/* Remove redundant extensions.  */
>>>> +
>>>> +static unsigned int
>>>> +rest_of_handle_ee (void)
>>>> +{
>>>> +  n_regs = max_reg_num ();
>>>> +
>>>> +  init_pass ();
>>>> +  remove_redundant_extensions ();
>>>> +  finish_pass ();
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>> +
>>>> +static bool
>>>> +gate_handle_ee (void)
>>>> +{
>>>> +  return (optimize > 0 && flag_ee);
>>>> +}
>>>> +
>>>> +struct rtl_opt_pass pass_ee =
>>>> +{
>>>> + {
>>>> +  RTL_PASS,
>>>> +  "ee",                                 /* name */
>>>> +  gate_handle_ee,                       /* gate */
>>>> +  rest_of_handle_ee,                    /* execute */
>>>> +  NULL,                                 /* sub */
>>>> +  NULL,                                 /* next */
>>>> +  0,                                    /* static_pass_number */
>>>> +  TV_EE,                                /* tv_id */
>>>> +  0,                                    /* properties_required */
>>>> +  0,                                    /* properties_provided */
>>>> +  0,                                    /* properties_destroyed */
>>>> +  0,                                    /* todo_flags_start */
>>>> +  TODO_ggc_collect |
>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>> + }
>>>> +};
>>>> Index: gcc/common.opt
>>>> ===================================================================
>>>> --- gcc/common.opt (revision 189409)
>>>> +++ gcc/common.opt (working copy)
>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>> Perform DWARF2 duplicate elimination
>>>>
>>>> +fextension-elimination
>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>> +Perform extension elimination
>>>> +
>>>> fipa-sra
>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>> Perform interprocedural reduction of aggregates
>>>> Index: gcc/Makefile.in
>>>> ===================================================================
>>>> --- gcc/Makefile.in (revision 189409)
>>>> +++ gcc/Makefile.in (working copy)
>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>> 	dwarf2asm.o \
>>>> 	dwarf2cfi.o \
>>>> 	dwarf2out.o \
>>>> +	ee.o \
>>>> 	ebitmap.o \
>>>> 	emit-rtl.o \
>>>> 	et-forest.o \
>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>      $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>      intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>      $(DF_H) $(CFGLOOP_H)
>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>      $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>      $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>> Index: gcc/passes.c
>>>> ===================================================================
>>>> --- gcc/passes.c (revision 189409)
>>>> +++ gcc/passes.c (working copy)
>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>         NEXT_PASS (pass_initialize_regs);
>>>>         NEXT_PASS (pass_ud_rtl_dce);
>>>>         NEXT_PASS (pass_combine);
>>>> +      NEXT_PASS (pass_ee);
>>>>         NEXT_PASS (pass_if_after_combine);
>>>>         NEXT_PASS (pass_partition_blocks);
>>>>         NEXT_PASS (pass_regmove);
>>>
>>
>
Tom de Vries - July 13, 2012, 7:53 a.m.
On 12/07/12 14:04, Kenneth Zadeck wrote:
> you are on the right track with the example but combine will not get 
> this unless everything is in the same bb.
> the whole point of having a separate pass for doing extension 
> elimination is that it needs to be done over the entire function.
> 

There is a pass_ree, which does inter-bb combine targeted at extensions.
However, that pass is currently limited to combining extensions with the
definitions of the register it extends. The way your example sounds, you want
the reverse, where extensions are combined with all their uses.
I would say pass_ree is the natural place to add this and handle the example you
describe.

Thanks,
- Tom

> my example is also a little more complex because, since we are talking 
> about induction vars, you have an initial assignment outside of a loop, 
> and increment inside the loop and the test you describe at the bottom of 
> the loop.
> 
> I would point out that with respect to speed optimizations, the case i 
> am describing is in fact very important because getting code out of 
> loops is were the important gains are.   I believe that the ppc has a 
> some significant performance issues because of this kind of thing.
> 
> kenny
> 
> 
> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>> On 12/07/12 11:05, Tom de Vries wrote:
>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>> Tom,
>>>>
>>>> I have a problem with the approach that you have taken here.   I believe
>>>> that this could be a very useful addition to gcc so I am in general very
>>>> supportive, but i think you are missing an important case.
>>>>
>>>> My problem is that it the pass does not actually look at the target and
>>>> make any decisions based on that target.
>>>>
>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>> has a rich set of compare and branch instructions.  In particular, it
>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>> upstream optimizations that take int (SI mode) index variables generate
>>>> extension operations before doing 64 bit compare and branch
>>>> instructions, even though there are 32 bit comparison and branches on
>>>> the machine.     There are a lot of machines that can do more than one
>>>> size of comparison.
>>>>
>>> 	 This optimization pass, as it is currently written will not remove those
>>>> extensions because it believes that the length of the destination is the
>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>> Instead it needs to ask the port if there is a shorted compare and
>>>> branch instruction that does not cost more. in that case, those
>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>
>>>> There are many operations other than compare and branch where the pass
>>>> should be asking "can i shorten the target for free and therefore get
>>>> rid of the extension?"
>>> Kenneth,
>>>
>>> I'm not sure I understand the optimization you're talking about, in particular
>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>> is the same.
>>>
>>> Assuming it's the same, my understanding is that you're talking about an example
>>> like this:
>>> ...
>>>    (insn (set (reg:DI 5)
>>>               (zero_extend:DI (reg:SI 4))))
>>>
>>>    (jump_insn (set (pc)
>>>                    (if_then_else (eq (reg:DI 5)
>>>                                      (const_int 0))
>>>                                  (label_ref:DI 62)
>>>                                  (pc))))
>>>
>>>    ->
>>>
>>>    (jump_insn (set (pc)
>>>                    (if_then_else (eq (reg:SI 4)
>>>                                      (const_int 0))
>>>                                  (label_ref:DI 62)
>>>                                  (pc))))
>>>
>>> ...
>>> I would expect combine to optimize this.
>>>
>>> In case I got the example all backwards or it is a too simple one, please
>>> provide an rtl example that illustrates the optimization.
>>>
>>> Thanks,
>>> - Tom
>>>
>>>
>>>>   right shifts, rotates, and stores are not in
>>>> this class, but left shifts are as are all comparisons, compare and
>>>> branches, conditional moves.   There may even be machines that have this
>>>> for divide, but i do not know of any off the top of my head.
>>>>
>>>> What i am suggesting moves this pass into the target specific set of
>>>> optimizations rather than target independent set, but at where this pass
>>>> is to be put this is completely appropriate.    Any dest instruction
>>>> where all of the operands have been extended should be checked to see if
>>>> it was really necessary to use the longer form before doing the
>>>> propagation pass.
>>>>
>>>> kenny
>>>>
>>>>
>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>> I profiled the pass on spec2000:
>>>>>>>
>>>>>>>                      -mabi=32     -mabi=64
>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>> total   (usr time):   919.30       879.26
>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>
>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>
>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>> currently written, yours doesn't.
>>>>>>
>>>>> Eric,
>>>>>
>>>>> I've done the following:
>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>    instructions.
>>>>> - updated the patch to be applicable to current trunk
>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>    this thread), and added that one as test-case.
>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>    on the motivating example.
>>>>>
>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>
>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>
>>>>> OK for trunk?
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>
>>>>> 	* ee.c: New file.
>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>> 	* common.opt (fextension-elimination): New option.
>>>>> 	* Makefile.in (ee.o): New rule.
>>>>> 	* passes.c (pass_ee): Add it.
>>>>>
>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>> Index: gcc/tree-pass.h
>>>>> ===================================================================
>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>> +++ gcc/tree-pass.h (working copy)
>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>
>>>>> extern struct rtl_opt_pass pass_expand;
>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>> extern struct rtl_opt_pass pass_jump;
>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>> ===================================================================
>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>> @@ -5,19 +5,19 @@
>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>
>>>>> -NOMIPS16 int
>>>>> -f (int n, int i)
>>>>> +NOMIPS16 long int
>>>>> +f (long int n, long int i)
>>>>> {
>>>>> -  int s = 0;
>>>>> +  long int s = 0;
>>>>>     for (; i & 1; i++)
>>>>>       s += i;
>>>>>     return s;
>>>>> }
>>>>>
>>>>> -NOMIPS16 int
>>>>> -g (int n, int i)
>>>>> +NOMIPS16 long int
>>>>> +g (long int n, long int i)
>>>>> {
>>>>> -  int s = 0;
>>>>> +  long int s = 0;
>>>>>     for (i = 0; i < n; i++)
>>>>>       s += i;
>>>>>     return s;
>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>> @@ -0,0 +1,16 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +
>>>>> +unsigned char f(unsigned int a, int c)
>>>>> +{
>>>>> +  unsigned int b = a;
>>>>> +  if (c)
>>>>> +    b = a & 0x10ff;
>>>>> +  return b;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>> @@ -0,0 +1,13 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +
>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>> +{
>>>>> +  if (c)
>>>>> +    *z = 0;
>>>>> +  *p ^= (unsigned char)s;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>> @@ -0,0 +1,13 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +
>>>>> +void f (short d[2][2])
>>>>> +{
>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>> +  d[0][0] = d0 + d1;
>>>>> +      d[0][1] = d0 - d1;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>> @@ -0,0 +1,20 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>> +
>>>>> +void f(unsigned char * p, short *s, int c)
>>>>> +{
>>>>> +  short or = 0;
>>>>> +  while (c)
>>>>> +    {
>>>>> +      or = or | s[c];
>>>>> +      c --;
>>>>> +    }
>>>>> +  *p = (unsigned char)or;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>> @@ -0,0 +1,20 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>> +/* { dg-require-effective-target mips64 } */
>>>>> +
>>>>> +void f(unsigned char * p, short *s, int c)
>>>>> +{
>>>>> +  short or = 0;
>>>>> +  while (c)
>>>>> +    {
>>>>> +      or = or | s[c];
>>>>> +      c --;
>>>>> +    }
>>>>> +  *p = (unsigned char)or;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>> @@ -0,0 +1,13 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>> +/* { dg-require-effective-target mips64 } */
>>>>> +
>>>>> +unsigned int f(unsigned char byte)
>>>>> +{
>>>>> +  return byte << 25;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/opts.c
>>>>> ===================================================================
>>>>> --- gcc/opts.c (revision 189409)
>>>>> +++ gcc/opts.c (working copy)
>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>       { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>       { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>       { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>
>>>>>       /* -O3 optimizations.  */
>>>>>       { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>> Index: gcc/timevar.def
>>>>> ===================================================================
>>>>> --- gcc/timevar.def (revision 189409)
>>>>> +++ gcc/timevar.def (working copy)
>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>> Index: gcc/ee.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/ee.c (revision 0)
>>>>> @@ -0,0 +1,1190 @@
>>>>> +/* Redundant extension elimination.
>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>> +
>>>>> +This file is part of GCC.
>>>>> +
>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>> +the terms of the GNU General Public License as published by the Free
>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>> +version.
>>>>> +
>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>> +for more details.
>>>>> +
>>>>> +You should have received a copy of the GNU General Public License
>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +/*
>>>>> +
>>>>> +  MOTIVATING EXAMPLE
>>>>> +
>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>> +
>>>>> +    void f (short d[2][2])
>>>>> +    {
>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>> +      d[0][0] = d0 + d1;
>>>>> +      d[0][1] = d0 - d1;
>>>>> +    }
>>>>> +
>>>>> +  For MIPS, compilation results in the following insns.
>>>>> +
>>>>> +    (set (reg:SI 204)
>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>> +
>>>>> +    (set (reg:SI 205)
>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>> +
>>>>> +    (set (reg:SI 217)
>>>>> +         (plus:SI (reg:SI 205)
>>>>> +                  (reg:SI 204)))
>>>>> +
>>>>> +    (set (reg:SI 218)
>>>>> +         (minus:SI (reg:SI 204)
>>>>> +                   (reg:SI 205)))
>>>>> +
>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>> +
>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>> +                 (const_int 2 [0x2])))
>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>> +
>>>>> +
>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>> +
>>>>> +
>>>>> +  INTENDED EFFECT
>>>>> +
>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>> +  than the extension.
>>>>> +
>>>>> +
>>>>> +  IMPLEMENTATION
>>>>> +
>>>>> +  The pass scans at most two times over all instructions.
>>>>> +
>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>> +  done.
>>>>> +
>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>> +  operands of the insns defining the pseudo.
>>>>> +
>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>> +
>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>> +  which we may find more redundant extensions.
>>>>> +
>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>> +
>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>> +  of a paradoxical subreg instead.
>>>>> +
>>>>> +
>>>>> +  ILLUSTRATION OF PASS
>>>>> +
>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>> +
>>>>> +  We find the 2 extensions:
>>>>> +    found extension with preserved size 16 defining reg 204
>>>>> +    found extension with preserved size 16 defining reg 205
>>>>> +
>>>>> +  We calculate the biggests uses of a register:
>>>>> +    biggest_use
>>>>> +    reg 204: size 32
>>>>> +    reg 205: size 32
>>>>> +    reg 217: size 16
>>>>> +    reg 218: size 16
>>>>> +
>>>>> +  We propagate the biggest uses where possible:
>>>>> +    propagations
>>>>> +    205: 32 -> 16
>>>>> +    204: 32 -> 16
>>>>> +    214: 32 -> 16
>>>>> +    215: 32 -> 16
>>>>> +
>>>>> +  We conclude that the extensions are redundant:
>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>> +
>>>>> +  And we replace them with regcopies:
>>>>> +    (set (reg:SI 204)
>>>>> +        (reg:SI 213))
>>>>> +
>>>>> +    (set (reg:SI 205)
>>>>> +        (reg:SI 216))
>>>>> +
>>>>> +
>>>>> +  LIMITATIONS
>>>>> +
>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>> +  done.
>>>>> +
>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>> +  account, also the ones that are not uses of the extension.
>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>> +
>>>>> +#include "config.h"
>>>>> +#include "system.h"
>>>>> +#include "coretypes.h"
>>>>> +#include "tm.h"
>>>>> +#include "rtl.h"
>>>>> +#include "tree.h"
>>>>> +#include "tm_p.h"
>>>>> +#include "flags.h"
>>>>> +#include "regs.h"
>>>>> +#include "hard-reg-set.h"
>>>>> +#include "basic-block.h"
>>>>> +#include "insn-config.h"
>>>>> +#include "function.h"
>>>>> +#include "expr.h"
>>>>> +#include "insn-attr.h"
>>>>> +#include "recog.h"
>>>>> +#include "toplev.h"
>>>>> +#include "target.h"
>>>>> +#include "timevar.h"
>>>>> +#include "optabs.h"
>>>>> +#include "insn-codes.h"
>>>>> +#include "rtlhooks-def.h"
>>>>> +#include "output.h"
>>>>> +#include "params.h"
>>>>> +#include "timevar.h"
>>>>> +#include "tree-pass.h"
>>>>> +#include "cgraph.h"
>>>>> +#include "vec.h"
>>>>> +
>>>>> +#define SKIP_REG (-1)
>>>>> +#define NONE (-1)
>>>>> +
>>>>> +/* Number of registers at start of pass.  */
>>>>> +
>>>>> +static int n_regs;
>>>>> +
>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>> +
>>>>> +static int *biggest_use;
>>>>> +
>>>>> +/* Array to register the promoted subregs.  */
>>>>> +
>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>> +
>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>> +
>>>>> +static int *propagated_size;
>>>>> +
>>>>> +typedef struct use
>>>>> +{
>>>>> +  int regno;
>>>>> +  int size;
>>>>> +  int offset;
>>>>> +  rtx *use;
>>>>> +} use_type;
>>>>> +
>>>>> +DEF_VEC_O(use_type);
>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>> +
>>>>> +/* Vector to register the uses.  */
>>>>> +
>>>>> +static VEC (use_type,heap) **uses;
>>>>> +
>>>>> +typedef struct prop
>>>>> +{
>>>>> +  rtx set;
>>>>> +  int uses_regno;
>>>>> +  int uses_index;
>>>>> +} prop_type;
>>>>> +
>>>>> +DEF_VEC_O(prop_type);
>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>> +
>>>>> +/* Vector to register the propagations.  */
>>>>> +
>>>>> +static VEC (prop_type,heap) **props;
>>>>> +
>>>>> +/* Work list for propragation.  */
>>>>> +
>>>>> +static VEC (int,heap) *wl;
>>>>> +
>>>>> +/* Array to register what regs are in the work list.  */
>>>>> +
>>>>> +static bool *in_wl;
>>>>> +
>>>>> +/* Vector that contains the extensions in the function.  */
>>>>> +
>>>>> +static VEC (rtx,heap) *extensions;
>>>>> +
>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>> +   removed or replaced.  */
>>>>> +
>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>> +
>>>>> +/* Forward declaration.  */
>>>>> +
>>>>> +static void note_use (rtx *x, void *data);
>>>>> +static bool skip_reg_p (int regno);
>>>>> +static void register_prop (rtx set, use_type *use);
>>>>> +
>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>> +
>>>>> +static bool
>>>>> +promoted_subreg_p (rtx subreg)
>>>>> +{
>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>> +}
>>>>> +
>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>> +   promotion.  */
>>>>> +
>>>>> +static bool
>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>> +{
>>>>> +  int mre;
>>>>> +
>>>>> +  if (!promoted_subreg_p (subreg))
>>>>> +    return false;
>>>>> +
>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>> +  return mre != UNKNOWN;
>>>>> +}
>>>>> +
>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>> +   OFFSET.  Return true if successful.  */
>>>>> +
>>>>> +static bool
>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>> +{
>>>>> +  rtx reg;
>>>>> +
>>>>> +  if (REG_P (use))
>>>>> +    {
>>>>> +      *regno = REGNO (use);
>>>>> +      *offset = 0;
>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>> +      return true;
>>>>> +    }
>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>> +    {
>>>>> +      reg = SUBREG_REG (use);
>>>>> +
>>>>> +      if (!REG_P (reg))
>>>>> +	return false;
>>>>> +
>>>>> +      *regno = REGNO (reg);
>>>>> +
>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>> +	{
>>>>> +	  *offset = 0;
>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>> +	}
>>>>> +      else
>>>>> +	{
>>>>> +	  *offset = subreg_lsb (use);
>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>> +	}
>>>>> +
>>>>> +      return true;
>>>>> +    }
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>> +
>>>>> +static use_type *
>>>>> +new_use (unsigned int regno)
>>>>> +{
>>>>> +  if (uses[regno] == NULL)
>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>> +
>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>> +
>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>> +}
>>>>> +
>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>> +
>>>>> +static use_type *
>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>> +{
>>>>> +  int *current;
>>>>> +  use_type *p;
>>>>> +
>>>>> +  gcc_assert (size >= 0);
>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>> +
>>>>> +  if (skip_reg_p (regno))
>>>>> +    return NULL;
>>>>> +
>>>>> +  p = new_use (regno);
>>>>> +  p->regno = regno;
>>>>> +  p->size = size;
>>>>> +  p->offset = offset;
>>>>> +  p->use = use;
>>>>> +
>>>>> +  /* Update the bigest use.  */
>>>>> +  current = &biggest_use[regno];
>>>>> +  *current = MAX (*current, size);
>>>>> +
>>>>> +  return p;
>>>>> +}
>>>>> +
>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>> +
>>>>> +static void
>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>> +{
>>>>> +  const char *format_ptr;
>>>>> +  int i, j;
>>>>> +
>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>> +    if (format_ptr[i] == 'e')
>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>> +    else if (format_ptr[i] == 'E')
>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>> +}
>>>>> +
>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>> +
>>>>> +static rtx
>>>>> +get_set (rtx use, rtx pattern)
>>>>> +{
>>>>> +  rtx sub;
>>>>> +  int i;
>>>>> +
>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>> +    return pattern;
>>>>> +
>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>> +      {
>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>> +	  return sub;
>>>>> +      }
>>>>> +
>>>>> +  return NULL_RTX;
>>>>> +}
>>>>> +
>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>> +
>>>>> +static void
>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>> +{
>>>>> +  unsigned int i, smallest;
>>>>> +  int operand_size[2];
>>>>> +  int operand_offset[2];
>>>>> +  int used_size;
>>>>> +  unsigned int operand_regno[2];
>>>>> +  bool operand_reg[2];
>>>>> +  bool operand_ignore[2];
>>>>> +  use_type *p;
>>>>> +
>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>> +    {
>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>> +      operand_ignore[i] = false;
>>>>> +    }
>>>>> +
>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>> +    {
>>>>> +      used_size =
>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>> +    }
>>>>> +
>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>> +    {
>>>>> +      used_size =
>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>> +    }
>>>>> +
>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>> +
>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>> +      && operand_size[0] != operand_size[1])
>>>>> +    {
>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>> +
>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>> +    }
>>>>> +
>>>>> +  /* Register the operand use, if necessary.  */
>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>> +    if (!operand_reg[i])
>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>> +    else if (!operand_ignore[i])
>>>>> +      {
>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>> +			  &XEXP (use, i));
>>>>> +	register_prop (set, p);
>>>>> +      }
>>>>> +}
>>>>> +
>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>> +
>>>>> +static void
>>>>> +register_promoted_subreg (rtx subreg)
>>>>> +{
>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>> +
>>>>> +  if (promoted_subreg[index] == NULL)
>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>> +
>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>> +}
>>>>> +
>>>>> +/* Note promoted subregs in X.  */
>>>>> +
>>>>> +static int
>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> +  rtx subreg = *x;
>>>>> +
>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>> +    register_promoted_subreg (subreg);
>>>>> +
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>> +
>>>>> +static void
>>>>> +note_use (rtx *x, void *data)
>>>>> +{
>>>>> +  rtx use = *x;
>>>>> +  rtx pattern = (rtx)data;
>>>>> +  int use_size, use_offset;
>>>>> +  unsigned int use_regno;
>>>>> +  rtx set;
>>>>> +  use_type *p;
>>>>> +
>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>> +
>>>>> +  set = get_set (use, pattern);
>>>>> +
>>>>> +  switch (GET_CODE (use))
>>>>> +    {
>>>>> +    case REG:
>>>>> +    case SUBREG:
>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>> +	{
>>>>> +	  note_embedded_uses (use, pattern);
>>>>> +	  return;
>>>>> +	}
>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>> +      register_prop (set, p);
>>>>> +      return;
>>>>> +    case SIGN_EXTEND:
>>>>> +    case ZERO_EXTEND:
>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>> +	{
>>>>> +	  note_embedded_uses (use, pattern);
>>>>> +	  return;
>>>>> +	}
>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>> +      register_prop (set, p);
>>>>> +      return;
>>>>> +    case IOR:
>>>>> +    case AND:
>>>>> +    case XOR:
>>>>> +    case PLUS:
>>>>> +    case MINUS:
>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>> +      return;
>>>>> +    case NOT:
>>>>> +    case NEG:
>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>> +      return;
>>>>> +    case ASHIFT:
>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>> +	{
>>>>> +	  note_embedded_uses (use, pattern);
>>>>> +	  return;
>>>>> +	}
>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>> +			  use_offset, x);
>>>>> +      return;
>>>>> +    default:
>>>>> +      note_embedded_uses (use, pattern);
>>>>> +      return;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>> +
>>>>> +static bool
>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> +#ifdef EPILOGUE_USES
>>>>> +  if (EPILOGUE_USES (regno))
>>>>> +    return true;
>>>>> +#endif
>>>>> +
>>>>> +#ifdef EH_USES
>>>>> +  if (EH_USES (regno))
>>>>> +    return true;
>>>>> +#endif
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>> +
>>>>> +static bool
>>>>> +skip_reg_p (int regno)
>>>>> +{
>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>> +     We don't handle that properly.  */
>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>> +}
>>>>> +
>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>> +
>>>>> +static void
>>>>> +note_call_uses (rtx insn)
>>>>> +{
>>>>> +  rtx link, link_expr;
>>>>> +
>>>>> +  if (!CALL_P (insn))
>>>>> +    return;
>>>>> +
>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>> +    {
>>>>> +      link_expr = XEXP (link, 0);
>>>>> +
>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Dump the biggest uses found.  */
>>>>> +
>>>>> +static void
>>>>> +dump_biggest_use (void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  if (!dump_file)
>>>>> +    return;
>>>>> +
>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>> +
>>>>> +  for (i = 0; i < n_regs; i++)
>>>>> +    if (biggest_use[i] > 0)
>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>> +
>>>>> +  fprintf (dump_file, "\n");
>>>>> +}
>>>>> +
>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>> +
>>>>> +static void
>>>>> +calculate_biggest_use (void)
>>>>> +{
>>>>> +  basic_block bb;
>>>>> +  rtx insn;
>>>>> +
>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>> +  FOR_EACH_BB (bb)
>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>> +      {
>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>> +	  continue;
>>>>> +
>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>> +
>>>>> +	if (CALL_P (insn))
>>>>> +	  note_call_uses (insn);
>>>>> +      }
>>>>> +
>>>>> +  dump_biggest_use ();
>>>>> +}
>>>>> +
>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>> +
>>>>> +static void
>>>>> +register_prop (rtx set, use_type *use)
>>>>> +{
>>>>> +  prop_type *p;
>>>>> +  int regno;
>>>>> +
>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>> +    return;
>>>>> +
>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>> +    return;
>>>>> +
>>>>> +  regno = REGNO (SET_DEST (set));
>>>>> +
>>>>> +  if (skip_reg_p (regno))
>>>>> +    return;
>>>>> +
>>>>> +  if (props[regno] == NULL)
>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>> +
>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>> +  p->set = set;
>>>>> +  p->uses_regno = use->regno;
>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>> +}
>>>>> +
>>>>> +/* Add REGNO to the worklist.  */
>>>>> +
>>>>> +static void
>>>>> +add_to_wl (int regno)
>>>>> +{
>>>>> +  if (in_wl[regno])
>>>>> +    return;
>>>>> +
>>>>> +  if (biggest_use[regno] > 0
>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>> +    return;
>>>>> +
>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>> +    return;
>>>>> +
>>>>> +  if (propagated_size[regno] != NONE
>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>> +    return;
>>>>> +
>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>> +  in_wl[regno] = true;
>>>>> +}
>>>>> +
>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>> +
>>>>> +static int
>>>>> +pop_wl (void)
>>>>> +{
>>>>> +  int regno = VEC_pop (int, wl);
>>>>> +  in_wl[regno] = false;
>>>>> +  return regno;
>>>>> +}
>>>>> +
>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>> +
>>>>> +static int
>>>>> +propagate_size (int dest_size, use_type *p)
>>>>> +{
>>>>> +  if (dest_size == 0)
>>>>> +    return 0;
>>>>> +
>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>> +}
>>>>> +
>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>> +
>>>>> +static int
>>>>> +get_biggest_use (unsigned int regno)
>>>>> +{
>>>>> +  int ix;
>>>>> +  use_type *p;
>>>>> +  int max = 0;
>>>>> +
>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>> +    max = MAX (max, p->size);
>>>>> +
>>>>> +  return max;
>>>>> +}
>>>>> +
>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>> +
>>>>> +static void
>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>> +{
>>>>> +  int new_use_size;
>>>>> +  int prev_biggest_use;
>>>>> +  int *current;
>>>>> +
>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>> +
>>>>> +  if (new_use_size >= use->size)
>>>>> +    return;
>>>>> +
>>>>> +  use->size = new_use_size;
>>>>> +
>>>>> +  current = &biggest_use[use->regno];
>>>>> +
>>>>> +  prev_biggest_use = *current;
>>>>> +  *current = get_biggest_use (use->regno);
>>>>> +
>>>>> +  if (*current >= prev_biggest_use)
>>>>> +    return;
>>>>> +
>>>>> +  add_to_wl (use->regno);
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>> +	     *current);
>>>>> +
>>>>> +}
>>>>> +
>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>> +
>>>>> +static void
>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>> +{
>>>>> +  int ix;
>>>>> +  prop_type *p;
>>>>> +
>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>> +    {
>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>> +      ++(*nr_propagations);
>>>>> +    }
>>>>> +
>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>> +}
>>>>> +
>>>>> +/* Improve biggest_use array iteratively.  */
>>>>> +
>>>>> +static void
>>>>> +propagate (void)
>>>>> +{
>>>>> +  int i;
>>>>> +  int nr_propagations = 0;
>>>>> +
>>>>> +  /* Initialize work list.  */
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    add_to_wl (i);
>>>>> +
>>>>> +  /* Work the work list.  */
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>> +  while (!VEC_empty (int, wl))
>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>> +}
>>>>> +
>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>> +
>>>>> +static bool
>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>> +{
>>>>> +  rtx src, op0;
>>>>> +
>>>>> +  /* Detect set of reg.  */
>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>> +    return false;
>>>>> +
>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>> +
>>>>> +  if (!REG_P (*dest))
>>>>> +    return false;
>>>>> +
>>>>> +  /* Detect sign or zero extension.  */
>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>> +    {
>>>>> +      op0 = XEXP (src, 0);
>>>>> +
>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>> +      if (GET_CODE (src) == AND)
>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>> +      else
>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>> +
>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>> +	{
>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>> +	    return false;
>>>>> +
>>>>> +	  *inner = SUBREG_REG (op0);
>>>>> +
>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>> +	    return false;
>>>>> +
>>>>> +	  return true;
>>>>> +	}
>>>>> +      else if (REG_P (op0))
>>>>> +	{
>>>>> +	  *inner = op0;
>>>>> +
>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>> +	    return false;
>>>>> +
>>>>> +	  return true;
>>>>> +	}
>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>> +	{
>>>>> +	  *inner = XEXP (op0, 0);
>>>>> +	  return true;
>>>>> +	}
>>>>> +    }
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>> +
>>>>> +static bool
>>>>> +find_extensions (void)
>>>>> +{
>>>>> +  basic_block bb;
>>>>> +  rtx insn, dest, inner;
>>>>> +  int preserved_size;
>>>>> +
>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>> +  FOR_EACH_BB (bb)
>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>> +      {
>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>> +	  continue;
>>>>> +
>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>> +	  continue;
>>>>> +
>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>> +
>>>>> +	if (dump_file)
>>>>> +	  fprintf (dump_file,
>>>>> +		   "found extension %u with preserved size %d defining"
>>>>> +		   " reg %d\n",
>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>> +      }
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    {
>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>> +	fprintf (dump_file, "\n");
>>>>> +      else
>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>> +    }
>>>>> +
>>>>> +  return !VEC_empty (rtx, extensions);
>>>>> +}
>>>>> +
>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>> +
>>>>> +static bool
>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>> +{
>>>>> +  int biggest_dest_use;
>>>>> +
>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>> +    gcc_unreachable ();
>>>>> +
>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>> +
>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>> +    return false;
>>>>> +
>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>> +    return false;
>>>>> +
>>>>> +  return true;
>>>>> +}
>>>>> +
>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>> +   redundant_extensions vector.  */
>>>>> +
>>>>> +static void
>>>>> +find_redundant_extensions (void)
>>>>> +{
>>>>> +  rtx insn, dest, inner;
>>>>> +  int ix;
>>>>> +  bool found = false;
>>>>> +  int preserved_size;
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>> +      {
>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>> +
>>>>> +	if (dump_file)
>>>>> +	  fprintf (dump_file,
>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>> +		   " defining reg %d\n",
>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>> +	found = true;
>>>>> +      }
>>>>> +
>>>>> +  if (dump_file && found)
>>>>> +    fprintf (dump_file, "\n");
>>>>> +}
>>>>> +
>>>>> +/* Reset promotion of subregs or REG.  */
>>>>> +
>>>>> +static void
>>>>> +reset_promoted_subreg (rtx reg)
>>>>> +{
>>>>> +  int ix;
>>>>> +  rtx subreg;
>>>>> +
>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>> +    return;
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>> +    {
>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>> +    }
>>>>> +
>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>> +}
>>>>> +
>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>> +   writes to DEST.  */
>>>>> +
>>>>> +static void
>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>> +{
>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>> +
>>>>> +  /* Check whether replacement is needed.  */
>>>>> +  if (dest != inner)
>>>>> +    {
>>>>> +      start_sequence ();
>>>>> +
>>>>> +      /* Determine the proper replacement operation.  */
>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>> +	{
>>>>> +	  cp_src = inner;
>>>>> +	  cp_dest = dest;
>>>>> +	}
>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>> +	{
>>>>> +	  emit_clobber (dest);
>>>>> +	  cp_src = inner;
>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>> +	}
>>>>> +      else
>>>>> +	{
>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>> +	  cp_dest = dest;
>>>>> +	}
>>>>> +
>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>> +
>>>>> +      seq = get_insns ();
>>>>> +      end_sequence ();
>>>>> +
>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>> +	  return;
>>>>> +
>>>>> +      /* Insert the replacement.  */
>>>>> +      emit_insn_before (seq, insn);
>>>>> +    }
>>>>> +
>>>>> +  /* Note replacement/removal in the dump.  */
>>>>> +  if (dump_file)
>>>>> +    {
>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>> +      if (dest != inner)
>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>> +      else
>>>>> +	fprintf (dump_file, "removed\n");
>>>>> +    }
>>>>> +
>>>>> +  /* Remove the extension.  */
>>>>> +  delete_insn (insn);
>>>>> +
>>>>> +  reset_promoted_subreg (dest);
>>>>> +}
>>>>> +
>>>>> +/* Setup the variables at the start of the pass.  */
>>>>> +
>>>>> +static void
>>>>> +init_pass (void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>> +
>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>> +  for (i = 0; i < n_regs; i++)
>>>>> +    {
>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>> +      propagated_size[i] = NONE;
>>>>> +    }
>>>>> +
>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>> +
>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>> +
>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    {
>>>>> +      uses[i] = NULL;
>>>>> +      props[i] = NULL;
>>>>> +      in_wl[i] = false;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>> +
>>>>> +static void
>>>>> +remove_redundant_extensions (void)
>>>>> +{
>>>>> +  rtx insn, dest, inner;
>>>>> +  int preserved_size;
>>>>> +  int ix;
>>>>> +
>>>>> +  if (!find_extensions ())
>>>>> +    return;
>>>>> +
>>>>> +  calculate_biggest_use ();
>>>>> +
>>>>> +  find_redundant_extensions ();
>>>>> +
>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>> +    {
>>>>> +      propagate ();
>>>>> +
>>>>> +      find_redundant_extensions ();
>>>>> +    }
>>>>> +
>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>> +    {
>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>> +    }
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "\n");
>>>>> +}
>>>>> +
>>>>> +/* Free the variables at the end of the pass.  */
>>>>> +
>>>>> +static void
>>>>> +finish_pass (void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  XDELETEVEC (propagated_size);
>>>>> +
>>>>> +  VEC_free (rtx, heap, extensions);
>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>> +
>>>>> +  VEC_free (int, heap, wl);
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    {
>>>>> +      if (uses[i] != NULL)
>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>> +
>>>>> +      if (props[i] != NULL)
>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>> +    }
>>>>> +
>>>>> +  XDELETEVEC (uses);
>>>>> +  XDELETEVEC (props);
>>>>> +  XDELETEVEC (biggest_use);
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    if (promoted_subreg[i] != NULL)
>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>> +  XDELETEVEC (promoted_subreg);
>>>>> +}
>>>>> +
>>>>> +/* Remove redundant extensions.  */
>>>>> +
>>>>> +static unsigned int
>>>>> +rest_of_handle_ee (void)
>>>>> +{
>>>>> +  n_regs = max_reg_num ();
>>>>> +
>>>>> +  init_pass ();
>>>>> +  remove_redundant_extensions ();
>>>>> +  finish_pass ();
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>> +
>>>>> +static bool
>>>>> +gate_handle_ee (void)
>>>>> +{
>>>>> +  return (optimize > 0 && flag_ee);
>>>>> +}
>>>>> +
>>>>> +struct rtl_opt_pass pass_ee =
>>>>> +{
>>>>> + {
>>>>> +  RTL_PASS,
>>>>> +  "ee",                                 /* name */
>>>>> +  gate_handle_ee,                       /* gate */
>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>> +  NULL,                                 /* sub */
>>>>> +  NULL,                                 /* next */
>>>>> +  0,                                    /* static_pass_number */
>>>>> +  TV_EE,                                /* tv_id */
>>>>> +  0,                                    /* properties_required */
>>>>> +  0,                                    /* properties_provided */
>>>>> +  0,                                    /* properties_destroyed */
>>>>> +  0,                                    /* todo_flags_start */
>>>>> +  TODO_ggc_collect |
>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>> + }
>>>>> +};
>>>>> Index: gcc/common.opt
>>>>> ===================================================================
>>>>> --- gcc/common.opt (revision 189409)
>>>>> +++ gcc/common.opt (working copy)
>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>> Perform DWARF2 duplicate elimination
>>>>>
>>>>> +fextension-elimination
>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>> +Perform extension elimination
>>>>> +
>>>>> fipa-sra
>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>> Perform interprocedural reduction of aggregates
>>>>> Index: gcc/Makefile.in
>>>>> ===================================================================
>>>>> --- gcc/Makefile.in (revision 189409)
>>>>> +++ gcc/Makefile.in (working copy)
>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>> 	dwarf2asm.o \
>>>>> 	dwarf2cfi.o \
>>>>> 	dwarf2out.o \
>>>>> +	ee.o \
>>>>> 	ebitmap.o \
>>>>> 	emit-rtl.o \
>>>>> 	et-forest.o \
>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>      $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>      intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>      $(DF_H) $(CFGLOOP_H)
>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>      $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>      $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>> Index: gcc/passes.c
>>>>> ===================================================================
>>>>> --- gcc/passes.c (revision 189409)
>>>>> +++ gcc/passes.c (working copy)
>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>         NEXT_PASS (pass_initialize_regs);
>>>>>         NEXT_PASS (pass_ud_rtl_dce);
>>>>>         NEXT_PASS (pass_combine);
>>>>> +      NEXT_PASS (pass_ee);
>>>>>         NEXT_PASS (pass_if_after_combine);
>>>>>         NEXT_PASS (pass_partition_blocks);
>>>>>         NEXT_PASS (pass_regmove);
>>>>
>>>
>>
> 
>
Kenneth Zadeck - July 13, 2012, 11:38 a.m.
it really is not.

the problem is that sign extension removal is just a more difficult 
problem than what you are considering.  You have attacked a small part 
of the problem and have a good start but you really should consider the 
whole problem.

kenny


On 07/13/2012 03:53 AM, Tom de Vries wrote:
> On 12/07/12 14:04, Kenneth Zadeck wrote:
>> you are on the right track with the example but combine will not get
>> this unless everything is in the same bb.
>> the whole point of having a separate pass for doing extension
>> elimination is that it needs to be done over the entire function.
>>
> There is a pass_ree, which does inter-bb combine targeted at extensions.
> However, that pass is currently limited to combining extensions with the
> definitions of the register it extends. The way your example sounds, you want
> the reverse, where extensions are combined with all their uses.
> I would say pass_ree is the natural place to add this and handle the example you
> describe.
>
> Thanks,
> - Tom
>
>> my example is also a little more complex because, since we are talking
>> about induction vars, you have an initial assignment outside of a loop,
>> and increment inside the loop and the test you describe at the bottom of
>> the loop.
>>
>> I would point out that with respect to speed optimizations, the case i
>> am describing is in fact very important because getting code out of
>> loops is were the important gains are.   I believe that the ppc has a
>> some significant performance issues because of this kind of thing.
>>
>> kenny
>>
>>
>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>> Tom,
>>>>>
>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>> supportive, but i think you are missing an important case.
>>>>>
>>>>> My problem is that it the pass does not actually look at the target and
>>>>> make any decisions based on that target.
>>>>>
>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>> extension operations before doing 64 bit compare and branch
>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>> the machine.     There are a lot of machines that can do more than one
>>>>> size of comparison.
>>>>>
>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>> extensions because it believes that the length of the destination is the
>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>> branch instruction that does not cost more. in that case, those
>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>
>>>>> There are many operations other than compare and branch where the pass
>>>>> should be asking "can i shorten the target for free and therefore get
>>>>> rid of the extension?"
>>>> Kenneth,
>>>>
>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>> is the same.
>>>>
>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>> like this:
>>>> ...
>>>>     (insn (set (reg:DI 5)
>>>>                (zero_extend:DI (reg:SI 4))))
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:DI 5)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>>     ->
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:SI 4)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>> ...
>>>> I would expect combine to optimize this.
>>>>
>>>> In case I got the example all backwards or it is a too simple one, please
>>>> provide an rtl example that illustrates the optimization.
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>
>>>>>    right shifts, rotates, and stores are not in
>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>> branches, conditional moves.   There may even be machines that have this
>>>>> for divide, but i do not know of any off the top of my head.
>>>>>
>>>>> What i am suggesting moves this pass into the target specific set of
>>>>> optimizations rather than target independent set, but at where this pass
>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>> where all of the operands have been extended should be checked to see if
>>>>> it was really necessary to use the longer form before doing the
>>>>> propagation pass.
>>>>>
>>>>> kenny
>>>>>
>>>>>
>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>
>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>
>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>
>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>> currently written, yours doesn't.
>>>>>>>
>>>>>> Eric,
>>>>>>
>>>>>> I've done the following:
>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>     instructions.
>>>>>> - updated the patch to be applicable to current trunk
>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>     this thread), and added that one as test-case.
>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>     on the motivating example.
>>>>>>
>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>
>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>
>>>>>> OK for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>
>>>>>> 	* ee.c: New file.
>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>
>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>> Index: gcc/tree-pass.h
>>>>>> ===================================================================
>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>
>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>> @@ -5,19 +5,19 @@
>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -f (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +f (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (; i & 1; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> }
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -g (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +g (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (i = 0; i < n; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>> @@ -0,0 +1,16 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>> +{
>>>>>> +  unsigned int b = a;
>>>>>> +  if (c)
>>>>>> +    b = a & 0x10ff;
>>>>>> +  return b;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>> +{
>>>>>> +  if (c)
>>>>>> +    *z = 0;
>>>>>> +  *p ^= (unsigned char)s;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f (short d[2][2])
>>>>>> +{
>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>> +  d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +unsigned int f(unsigned char byte)
>>>>>> +{
>>>>>> +  return byte << 25;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/opts.c
>>>>>> ===================================================================
>>>>>> --- gcc/opts.c (revision 189409)
>>>>>> +++ gcc/opts.c (working copy)
>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>
>>>>>>        /* -O3 optimizations.  */
>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>> Index: gcc/timevar.def
>>>>>> ===================================================================
>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>> +++ gcc/timevar.def (working copy)
>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>> Index: gcc/ee.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/ee.c (revision 0)
>>>>>> @@ -0,0 +1,1190 @@
>>>>>> +/* Redundant extension elimination.
>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>> +
>>>>>> +This file is part of GCC.
>>>>>> +
>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>> +version.
>>>>>> +
>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>> +for more details.
>>>>>> +
>>>>>> +You should have received a copy of the GNU General Public License
>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>> +
>>>>>> +/*
>>>>>> +
>>>>>> +  MOTIVATING EXAMPLE
>>>>>> +
>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>> +
>>>>>> +    void f (short d[2][2])
>>>>>> +    {
>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>> +      d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +    }
>>>>>> +
>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>> +
>>>>>> +    (set (reg:SI 204)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 217)
>>>>>> +         (plus:SI (reg:SI 205)
>>>>>> +                  (reg:SI 204)))
>>>>>> +
>>>>>> +    (set (reg:SI 218)
>>>>>> +         (minus:SI (reg:SI 204)
>>>>>> +                   (reg:SI 205)))
>>>>>> +
>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>> +
>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>> +                 (const_int 2 [0x2])))
>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>> +
>>>>>> +
>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>> +
>>>>>> +
>>>>>> +  INTENDED EFFECT
>>>>>> +
>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>> +  than the extension.
>>>>>> +
>>>>>> +
>>>>>> +  IMPLEMENTATION
>>>>>> +
>>>>>> +  The pass scans at most two times over all instructions.
>>>>>> +
>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>> +  done.
>>>>>> +
>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>> +  operands of the insns defining the pseudo.
>>>>>> +
>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>> +
>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>> +  which we may find more redundant extensions.
>>>>>> +
>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>> +
>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>> +  of a paradoxical subreg instead.
>>>>>> +
>>>>>> +
>>>>>> +  ILLUSTRATION OF PASS
>>>>>> +
>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>> +
>>>>>> +  We find the 2 extensions:
>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>> +
>>>>>> +  We calculate the biggests uses of a register:
>>>>>> +    biggest_use
>>>>>> +    reg 204: size 32
>>>>>> +    reg 205: size 32
>>>>>> +    reg 217: size 16
>>>>>> +    reg 218: size 16
>>>>>> +
>>>>>> +  We propagate the biggest uses where possible:
>>>>>> +    propagations
>>>>>> +    205: 32 -> 16
>>>>>> +    204: 32 -> 16
>>>>>> +    214: 32 -> 16
>>>>>> +    215: 32 -> 16
>>>>>> +
>>>>>> +  We conclude that the extensions are redundant:
>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>> +
>>>>>> +  And we replace them with regcopies:
>>>>>> +    (set (reg:SI 204)
>>>>>> +        (reg:SI 213))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +        (reg:SI 216))
>>>>>> +
>>>>>> +
>>>>>> +  LIMITATIONS
>>>>>> +
>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>> +  done.
>>>>>> +
>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>> +
>>>>>> +#include "config.h"
>>>>>> +#include "system.h"
>>>>>> +#include "coretypes.h"
>>>>>> +#include "tm.h"
>>>>>> +#include "rtl.h"
>>>>>> +#include "tree.h"
>>>>>> +#include "tm_p.h"
>>>>>> +#include "flags.h"
>>>>>> +#include "regs.h"
>>>>>> +#include "hard-reg-set.h"
>>>>>> +#include "basic-block.h"
>>>>>> +#include "insn-config.h"
>>>>>> +#include "function.h"
>>>>>> +#include "expr.h"
>>>>>> +#include "insn-attr.h"
>>>>>> +#include "recog.h"
>>>>>> +#include "toplev.h"
>>>>>> +#include "target.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "optabs.h"
>>>>>> +#include "insn-codes.h"
>>>>>> +#include "rtlhooks-def.h"
>>>>>> +#include "output.h"
>>>>>> +#include "params.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "tree-pass.h"
>>>>>> +#include "cgraph.h"
>>>>>> +#include "vec.h"
>>>>>> +
>>>>>> +#define SKIP_REG (-1)
>>>>>> +#define NONE (-1)
>>>>>> +
>>>>>> +/* Number of registers at start of pass.  */
>>>>>> +
>>>>>> +static int n_regs;
>>>>>> +
>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>> +
>>>>>> +static int *biggest_use;
>>>>>> +
>>>>>> +/* Array to register the promoted subregs.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>> +
>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>> +
>>>>>> +static int *propagated_size;
>>>>>> +
>>>>>> +typedef struct use
>>>>>> +{
>>>>>> +  int regno;
>>>>>> +  int size;
>>>>>> +  int offset;
>>>>>> +  rtx *use;
>>>>>> +} use_type;
>>>>>> +
>>>>>> +DEF_VEC_O(use_type);
>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the uses.  */
>>>>>> +
>>>>>> +static VEC (use_type,heap) **uses;
>>>>>> +
>>>>>> +typedef struct prop
>>>>>> +{
>>>>>> +  rtx set;
>>>>>> +  int uses_regno;
>>>>>> +  int uses_index;
>>>>>> +} prop_type;
>>>>>> +
>>>>>> +DEF_VEC_O(prop_type);
>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the propagations.  */
>>>>>> +
>>>>>> +static VEC (prop_type,heap) **props;
>>>>>> +
>>>>>> +/* Work list for propragation.  */
>>>>>> +
>>>>>> +static VEC (int,heap) *wl;
>>>>>> +
>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>> +
>>>>>> +static bool *in_wl;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>> +   removed or replaced.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>> +
>>>>>> +/* Forward declaration.  */
>>>>>> +
>>>>>> +static void note_use (rtx *x, void *data);
>>>>>> +static bool skip_reg_p (int regno);
>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>> +   promotion.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  int mre;
>>>>>> +
>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>> +  return mre != UNKNOWN;
>>>>>> +}
>>>>>> +
>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>> +{
>>>>>> +  rtx reg;
>>>>>> +
>>>>>> +  if (REG_P (use))
>>>>>> +    {
>>>>>> +      *regno = REGNO (use);
>>>>>> +      *offset = 0;
>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>> +    {
>>>>>> +      reg = SUBREG_REG (use);
>>>>>> +
>>>>>> +      if (!REG_P (reg))
>>>>>> +	return false;
>>>>>> +
>>>>>> +      *regno = REGNO (reg);
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>> +	{
>>>>>> +	  *offset = 0;
>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  *offset = subreg_lsb (use);
>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +	}
>>>>>> +
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +new_use (unsigned int regno)
>>>>>> +{
>>>>>> +  if (uses[regno] == NULL)
>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>> +
>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>> +{
>>>>>> +  int *current;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  gcc_assert (size >= 0);
>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return NULL;
>>>>>> +
>>>>>> +  p = new_use (regno);
>>>>>> +  p->regno = regno;
>>>>>> +  p->size = size;
>>>>>> +  p->offset = offset;
>>>>>> +  p->use = use;
>>>>>> +
>>>>>> +  /* Update the bigest use.  */
>>>>>> +  current = &biggest_use[regno];
>>>>>> +  *current = MAX (*current, size);
>>>>>> +
>>>>>> +  return p;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  const char *format_ptr;
>>>>>> +  int i, j;
>>>>>> +
>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>> +    if (format_ptr[i] == 'e')
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>> +
>>>>>> +static rtx
>>>>>> +get_set (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  rtx sub;
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>> +    return pattern;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>> +      {
>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>> +	  return sub;
>>>>>> +      }
>>>>>> +
>>>>>> +  return NULL_RTX;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>> +{
>>>>>> +  unsigned int i, smallest;
>>>>>> +  int operand_size[2];
>>>>>> +  int operand_offset[2];
>>>>>> +  int used_size;
>>>>>> +  unsigned int operand_regno[2];
>>>>>> +  bool operand_reg[2];
>>>>>> +  bool operand_ignore[2];
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    {
>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>> +      operand_ignore[i] = false;
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>> +
>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>> +    {
>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    if (!operand_reg[i])
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (!operand_ignore[i])
>>>>>> +      {
>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>> +			  &XEXP (use, i));
>>>>>> +	register_prop (set, p);
>>>>>> +      }
>>>>>> +}
>>>>>> +
>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>> +{
>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>> +
>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note promoted subregs in X.  */
>>>>>> +
>>>>>> +static int
>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +  rtx subreg = *x;
>>>>>> +
>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>> +    register_promoted_subreg (subreg);
>>>>>> +
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_use (rtx *x, void *data)
>>>>>> +{
>>>>>> +  rtx use = *x;
>>>>>> +  rtx pattern = (rtx)data;
>>>>>> +  int use_size, use_offset;
>>>>>> +  unsigned int use_regno;
>>>>>> +  rtx set;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>> +
>>>>>> +  set = get_set (use, pattern);
>>>>>> +
>>>>>> +  switch (GET_CODE (use))
>>>>>> +    {
>>>>>> +    case REG:
>>>>>> +    case SUBREG:
>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case SIGN_EXTEND:
>>>>>> +    case ZERO_EXTEND:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case IOR:
>>>>>> +    case AND:
>>>>>> +    case XOR:
>>>>>> +    case PLUS:
>>>>>> +    case MINUS:
>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>> +      return;
>>>>>> +    case NOT:
>>>>>> +    case NEG:
>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>> +      return;
>>>>>> +    case ASHIFT:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>> +			  use_offset, x);
>>>>>> +      return;
>>>>>> +    default:
>>>>>> +      note_embedded_uses (use, pattern);
>>>>>> +      return;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +#ifdef EPILOGUE_USES
>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +#ifdef EH_USES
>>>>>> +  if (EH_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +skip_reg_p (int regno)
>>>>>> +{
>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>> +     We don't handle that properly.  */
>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_call_uses (rtx insn)
>>>>>> +{
>>>>>> +  rtx link, link_expr;
>>>>>> +
>>>>>> +  if (!CALL_P (insn))
>>>>>> +    return;
>>>>>> +
>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>> +    {
>>>>>> +      link_expr = XEXP (link, 0);
>>>>>> +
>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Dump the biggest uses found.  */
>>>>>> +
>>>>>> +static void
>>>>>> +dump_biggest_use (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (!dump_file)
>>>>>> +    return;
>>>>>> +
>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    if (biggest_use[i] > 0)
>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>> +
>>>>>> +  fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>> +
>>>>>> +static void
>>>>>> +calculate_biggest_use (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>> +
>>>>>> +	if (CALL_P (insn))
>>>>>> +	  note_call_uses (insn);
>>>>>> +      }
>>>>>> +
>>>>>> +  dump_biggest_use ();
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_prop (rtx set, use_type *use)
>>>>>> +{
>>>>>> +  prop_type *p;
>>>>>> +  int regno;
>>>>>> +
>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (props[regno] == NULL)
>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>> +  p->set = set;
>>>>>> +  p->uses_regno = use->regno;
>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>> +}
>>>>>> +
>>>>>> +/* Add REGNO to the worklist.  */
>>>>>> +
>>>>>> +static void
>>>>>> +add_to_wl (int regno)
>>>>>> +{
>>>>>> +  if (in_wl[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (biggest_use[regno] > 0
>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (propagated_size[regno] != NONE
>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>> +  in_wl[regno] = true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>> +
>>>>>> +static int
>>>>>> +pop_wl (void)
>>>>>> +{
>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>> +  in_wl[regno] = false;
>>>>>> +  return regno;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>> +
>>>>>> +static int
>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>> +{
>>>>>> +  if (dest_size == 0)
>>>>>> +    return 0;
>>>>>> +
>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>> +
>>>>>> +static int
>>>>>> +get_biggest_use (unsigned int regno)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  use_type *p;
>>>>>> +  int max = 0;
>>>>>> +
>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>> +    max = MAX (max, p->size);
>>>>>> +
>>>>>> +  return max;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>> +{
>>>>>> +  int new_use_size;
>>>>>> +  int prev_biggest_use;
>>>>>> +  int *current;
>>>>>> +
>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>> +
>>>>>> +  if (new_use_size >= use->size)
>>>>>> +    return;
>>>>>> +
>>>>>> +  use->size = new_use_size;
>>>>>> +
>>>>>> +  current = &biggest_use[use->regno];
>>>>>> +
>>>>>> +  prev_biggest_use = *current;
>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>> +
>>>>>> +  if (*current >= prev_biggest_use)
>>>>>> +    return;
>>>>>> +
>>>>>> +  add_to_wl (use->regno);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>> +	     *current);
>>>>>> +
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  prop_type *p;
>>>>>> +
>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>> +    {
>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>> +      ++(*nr_propagations);
>>>>>> +    }
>>>>>> +
>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>> +}
>>>>>> +
>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +  int nr_propagations = 0;
>>>>>> +
>>>>>> +  /* Initialize work list.  */
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    add_to_wl (i);
>>>>>> +
>>>>>> +  /* Work the work list.  */
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>> +  while (!VEC_empty (int, wl))
>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  rtx src, op0;
>>>>>> +
>>>>>> +  /* Detect set of reg.  */
>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>> +
>>>>>> +  if (!REG_P (*dest))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Detect sign or zero extension.  */
>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>> +    {
>>>>>> +      op0 = XEXP (src, 0);
>>>>>> +
>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>> +      if (GET_CODE (src) == AND)
>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>> +      else
>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>> +
>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>> +	{
>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (REG_P (op0))
>>>>>> +	{
>>>>>> +	  *inner = op0;
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>> +	{
>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +find_extensions (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>> +		   " reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>> +	fprintf (dump_file, "\n");
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  int biggest_dest_use;
>>>>>> +
>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>> +    gcc_unreachable ();
>>>>>> +
>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>> +
>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  return true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>> +   redundant_extensions vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +find_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int ix;
>>>>>> +  bool found = false;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +      {
>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>> +		   " defining reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +	found = true;
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file && found)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>> +
>>>>>> +static void
>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  rtx subreg;
>>>>>> +
>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>> +    {
>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>> +    }
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>> +   writes to DEST.  */
>>>>>> +
>>>>>> +static void
>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>> +{
>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>> +
>>>>>> +  /* Check whether replacement is needed.  */
>>>>>> +  if (dest != inner)
>>>>>> +    {
>>>>>> +      start_sequence ();
>>>>>> +
>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>> +	{
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>> +	{
>>>>>> +	  emit_clobber (dest);
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +
>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>> +
>>>>>> +      seq = get_insns ();
>>>>>> +      end_sequence ();
>>>>>> +
>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>> +	  return;
>>>>>> +
>>>>>> +      /* Insert the replacement.  */
>>>>>> +      emit_insn_before (seq, insn);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>> +      if (dest != inner)
>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Remove the extension.  */
>>>>>> +  delete_insn (insn);
>>>>>> +
>>>>>> +  reset_promoted_subreg (dest);
>>>>>> +}
>>>>>> +
>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +init_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>> +
>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    {
>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>> +      propagated_size[i] = NONE;
>>>>>> +    }
>>>>>> +
>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>> +
>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      uses[i] = NULL;
>>>>>> +      props[i] = NULL;
>>>>>> +      in_wl[i] = false;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>> +
>>>>>> +static void
>>>>>> +remove_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +  int ix;
>>>>>> +
>>>>>> +  if (!find_extensions ())
>>>>>> +    return;
>>>>>> +
>>>>>> +  calculate_biggest_use ();
>>>>>> +
>>>>>> +  find_redundant_extensions ();
>>>>>> +
>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>> +    {
>>>>>> +      propagate ();
>>>>>> +
>>>>>> +      find_redundant_extensions ();
>>>>>> +    }
>>>>>> +
>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>> +    {
>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>> +    }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +finish_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  XDELETEVEC (propagated_size);
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>> +
>>>>>> +  VEC_free (int, heap, wl);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      if (uses[i] != NULL)
>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>> +
>>>>>> +      if (props[i] != NULL)
>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>> +    }
>>>>>> +
>>>>>> +  XDELETEVEC (uses);
>>>>>> +  XDELETEVEC (props);
>>>>>> +  XDELETEVEC (biggest_use);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Remove redundant extensions.  */
>>>>>> +
>>>>>> +static unsigned int
>>>>>> +rest_of_handle_ee (void)
>>>>>> +{
>>>>>> +  n_regs = max_reg_num ();
>>>>>> +
>>>>>> +  init_pass ();
>>>>>> +  remove_redundant_extensions ();
>>>>>> +  finish_pass ();
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +gate_handle_ee (void)
>>>>>> +{
>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>> +}
>>>>>> +
>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>> +{
>>>>>> + {
>>>>>> +  RTL_PASS,
>>>>>> +  "ee",                                 /* name */
>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>> +  NULL,                                 /* sub */
>>>>>> +  NULL,                                 /* next */
>>>>>> +  0,                                    /* static_pass_number */
>>>>>> +  TV_EE,                                /* tv_id */
>>>>>> +  0,                                    /* properties_required */
>>>>>> +  0,                                    /* properties_provided */
>>>>>> +  0,                                    /* properties_destroyed */
>>>>>> +  0,                                    /* todo_flags_start */
>>>>>> +  TODO_ggc_collect |
>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>> + }
>>>>>> +};
>>>>>> Index: gcc/common.opt
>>>>>> ===================================================================
>>>>>> --- gcc/common.opt (revision 189409)
>>>>>> +++ gcc/common.opt (working copy)
>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>> Perform DWARF2 duplicate elimination
>>>>>>
>>>>>> +fextension-elimination
>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>> +Perform extension elimination
>>>>>> +
>>>>>> fipa-sra
>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>> Perform interprocedural reduction of aggregates
>>>>>> Index: gcc/Makefile.in
>>>>>> ===================================================================
>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>> 	dwarf2asm.o \
>>>>>> 	dwarf2cfi.o \
>>>>>> 	dwarf2out.o \
>>>>>> +	ee.o \
>>>>>> 	ebitmap.o \
>>>>>> 	emit-rtl.o \
>>>>>> 	et-forest.o \
>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>> Index: gcc/passes.c
>>>>>> ===================================================================
>>>>>> --- gcc/passes.c (revision 189409)
>>>>>> +++ gcc/passes.c (working copy)
>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>          NEXT_PASS (pass_combine);
>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>          NEXT_PASS (pass_regmove);
>>
>
Tom de Vries - July 13, 2012, 12:58 p.m.
On 13/07/12 13:38, Kenneth Zadeck wrote:
> it really is not.
> 

Kenneth,

I'm not quite sure I know what you mean by 'it', but I suppose you mean to say
that pass_ree is not the natural place to add the inter-bb combining of
extensions with all their uses.

Why do you think so?

> the problem is that sign extension removal is just a more difficult 
> problem than what you are considering.  You have attacked a small part 
> of the problem and have a good start but you really should consider the 
> whole problem.

Compilers are split up into passes that handle a well-defined part of the whole
problem of generating optimal and correct code. In my view, pass_ee does exactly
that.

The goal of the pass is not to remove extensions in all ways possible, but
focuses on the goal of detecting extensions that can be replaced by a regcopy,
and replace them.

If you have an example that shows a missing optimization where pass_ee could be
adapted or extended in a certain way to handle that, I would love to see that one.

Thanks,
- Tom

> 
> kenny
> 
> 
> On 07/13/2012 03:53 AM, Tom de Vries wrote:
>> On 12/07/12 14:04, Kenneth Zadeck wrote:
>>> you are on the right track with the example but combine will not get
>>> this unless everything is in the same bb.
>>> the whole point of having a separate pass for doing extension
>>> elimination is that it needs to be done over the entire function.
>>>
>> There is a pass_ree, which does inter-bb combine targeted at extensions.
>> However, that pass is currently limited to combining extensions with the
>> definitions of the register it extends. The way your example sounds, you want
>> the reverse, where extensions are combined with all their uses.
>> I would say pass_ree is the natural place to add this and handle the example you
>> describe.
>>
>> Thanks,
>> - Tom
>>
>>> my example is also a little more complex because, since we are talking
>>> about induction vars, you have an initial assignment outside of a loop,
>>> and increment inside the loop and the test you describe at the bottom of
>>> the loop.
>>>
>>> I would point out that with respect to speed optimizations, the case i
>>> am describing is in fact very important because getting code out of
>>> loops is were the important gains are.   I believe that the ppc has a
>>> some significant performance issues because of this kind of thing.
>>>
>>> kenny
>>>
>>>
>>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>>> Tom,
>>>>>>
>>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>>> supportive, but i think you are missing an important case.
>>>>>>
>>>>>> My problem is that it the pass does not actually look at the target and
>>>>>> make any decisions based on that target.
>>>>>>
>>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>>> extension operations before doing 64 bit compare and branch
>>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>>> the machine.     There are a lot of machines that can do more than one
>>>>>> size of comparison.
>>>>>>
>>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>>> extensions because it believes that the length of the destination is the
>>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>>> branch instruction that does not cost more. in that case, those
>>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>>
>>>>>> There are many operations other than compare and branch where the pass
>>>>>> should be asking "can i shorten the target for free and therefore get
>>>>>> rid of the extension?"
>>>>> Kenneth,
>>>>>
>>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>>> is the same.
>>>>>
>>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>>> like this:
>>>>> ...
>>>>>     (insn (set (reg:DI 5)
>>>>>                (zero_extend:DI (reg:SI 4))))
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:DI 5)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>>     ->
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:SI 4)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>> ...
>>>>> I would expect combine to optimize this.
>>>>>
>>>>> In case I got the example all backwards or it is a too simple one, please
>>>>> provide an rtl example that illustrates the optimization.
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>
>>>>>>    right shifts, rotates, and stores are not in
>>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>>> branches, conditional moves.   There may even be machines that have this
>>>>>> for divide, but i do not know of any off the top of my head.
>>>>>>
>>>>>> What i am suggesting moves this pass into the target specific set of
>>>>>> optimizations rather than target independent set, but at where this pass
>>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>>> where all of the operands have been extended should be checked to see if
>>>>>> it was really necessary to use the longer form before doing the
>>>>>> propagation pass.
>>>>>>
>>>>>> kenny
>>>>>>
>>>>>>
>>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>>
>>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>>
>>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>>
>>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>>> currently written, yours doesn't.
>>>>>>>>
>>>>>>> Eric,
>>>>>>>
>>>>>>> I've done the following:
>>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>>     instructions.
>>>>>>> - updated the patch to be applicable to current trunk
>>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>>     this thread), and added that one as test-case.
>>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>>     on the motivating example.
>>>>>>>
>>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>>
>>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Tom
>>>>>>>
>>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>>
>>>>>>> 	* ee.c: New file.
>>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>>
>>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>>> Index: gcc/tree-pass.h
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>>
>>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>>> @@ -5,19 +5,19 @@
>>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -f (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +f (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (; i & 1; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> }
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -g (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +g (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (i = 0; i < n; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>>> @@ -0,0 +1,16 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>>> +{
>>>>>>> +  unsigned int b = a;
>>>>>>> +  if (c)
>>>>>>> +    b = a & 0x10ff;
>>>>>>> +  return b;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>>> +{
>>>>>>> +  if (c)
>>>>>>> +    *z = 0;
>>>>>>> +  *p ^= (unsigned char)s;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f (short d[2][2])
>>>>>>> +{
>>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>>> +  d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +unsigned int f(unsigned char byte)
>>>>>>> +{
>>>>>>> +  return byte << 25;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/opts.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/opts.c (revision 189409)
>>>>>>> +++ gcc/opts.c (working copy)
>>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>>
>>>>>>>        /* -O3 optimizations.  */
>>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>>> Index: gcc/timevar.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>>> +++ gcc/timevar.def (working copy)
>>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>>> Index: gcc/ee.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/ee.c (revision 0)
>>>>>>> @@ -0,0 +1,1190 @@
>>>>>>> +/* Redundant extension elimination.
>>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>>> +
>>>>>>> +This file is part of GCC.
>>>>>>> +
>>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>>> +version.
>>>>>>> +
>>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>>> +for more details.
>>>>>>> +
>>>>>>> +You should have received a copy of the GNU General Public License
>>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>>> +
>>>>>>> +/*
>>>>>>> +
>>>>>>> +  MOTIVATING EXAMPLE
>>>>>>> +
>>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>>> +
>>>>>>> +    void f (short d[2][2])
>>>>>>> +    {
>>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>>> +      d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>>> +
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 217)
>>>>>>> +         (plus:SI (reg:SI 205)
>>>>>>> +                  (reg:SI 204)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 218)
>>>>>>> +         (minus:SI (reg:SI 204)
>>>>>>> +                   (reg:SI 205)))
>>>>>>> +
>>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>>> +
>>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>>> +                 (const_int 2 [0x2])))
>>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>>> +
>>>>>>> +
>>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>>> +
>>>>>>> +
>>>>>>> +  INTENDED EFFECT
>>>>>>> +
>>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>>> +  than the extension.
>>>>>>> +
>>>>>>> +
>>>>>>> +  IMPLEMENTATION
>>>>>>> +
>>>>>>> +  The pass scans at most two times over all instructions.
>>>>>>> +
>>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>>> +  operands of the insns defining the pseudo.
>>>>>>> +
>>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>>> +
>>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>>> +  which we may find more redundant extensions.
>>>>>>> +
>>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>>> +
>>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>>> +  of a paradoxical subreg instead.
>>>>>>> +
>>>>>>> +
>>>>>>> +  ILLUSTRATION OF PASS
>>>>>>> +
>>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>>> +
>>>>>>> +  We find the 2 extensions:
>>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>>> +
>>>>>>> +  We calculate the biggests uses of a register:
>>>>>>> +    biggest_use
>>>>>>> +    reg 204: size 32
>>>>>>> +    reg 205: size 32
>>>>>>> +    reg 217: size 16
>>>>>>> +    reg 218: size 16
>>>>>>> +
>>>>>>> +  We propagate the biggest uses where possible:
>>>>>>> +    propagations
>>>>>>> +    205: 32 -> 16
>>>>>>> +    204: 32 -> 16
>>>>>>> +    214: 32 -> 16
>>>>>>> +    215: 32 -> 16
>>>>>>> +
>>>>>>> +  We conclude that the extensions are redundant:
>>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>>> +
>>>>>>> +  And we replace them with regcopies:
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +        (reg:SI 213))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +        (reg:SI 216))
>>>>>>> +
>>>>>>> +
>>>>>>> +  LIMITATIONS
>>>>>>> +
>>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>>> +
>>>>>>> +#include "config.h"
>>>>>>> +#include "system.h"
>>>>>>> +#include "coretypes.h"
>>>>>>> +#include "tm.h"
>>>>>>> +#include "rtl.h"
>>>>>>> +#include "tree.h"
>>>>>>> +#include "tm_p.h"
>>>>>>> +#include "flags.h"
>>>>>>> +#include "regs.h"
>>>>>>> +#include "hard-reg-set.h"
>>>>>>> +#include "basic-block.h"
>>>>>>> +#include "insn-config.h"
>>>>>>> +#include "function.h"
>>>>>>> +#include "expr.h"
>>>>>>> +#include "insn-attr.h"
>>>>>>> +#include "recog.h"
>>>>>>> +#include "toplev.h"
>>>>>>> +#include "target.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "optabs.h"
>>>>>>> +#include "insn-codes.h"
>>>>>>> +#include "rtlhooks-def.h"
>>>>>>> +#include "output.h"
>>>>>>> +#include "params.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "tree-pass.h"
>>>>>>> +#include "cgraph.h"
>>>>>>> +#include "vec.h"
>>>>>>> +
>>>>>>> +#define SKIP_REG (-1)
>>>>>>> +#define NONE (-1)
>>>>>>> +
>>>>>>> +/* Number of registers at start of pass.  */
>>>>>>> +
>>>>>>> +static int n_regs;
>>>>>>> +
>>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>>> +
>>>>>>> +static int *biggest_use;
>>>>>>> +
>>>>>>> +/* Array to register the promoted subregs.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>>> +
>>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>>> +
>>>>>>> +static int *propagated_size;
>>>>>>> +
>>>>>>> +typedef struct use
>>>>>>> +{
>>>>>>> +  int regno;
>>>>>>> +  int size;
>>>>>>> +  int offset;
>>>>>>> +  rtx *use;
>>>>>>> +} use_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(use_type);
>>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the uses.  */
>>>>>>> +
>>>>>>> +static VEC (use_type,heap) **uses;
>>>>>>> +
>>>>>>> +typedef struct prop
>>>>>>> +{
>>>>>>> +  rtx set;
>>>>>>> +  int uses_regno;
>>>>>>> +  int uses_index;
>>>>>>> +} prop_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(prop_type);
>>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the propagations.  */
>>>>>>> +
>>>>>>> +static VEC (prop_type,heap) **props;
>>>>>>> +
>>>>>>> +/* Work list for propragation.  */
>>>>>>> +
>>>>>>> +static VEC (int,heap) *wl;
>>>>>>> +
>>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>>> +
>>>>>>> +static bool *in_wl;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>>> +   removed or replaced.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>>> +
>>>>>>> +/* Forward declaration.  */
>>>>>>> +
>>>>>>> +static void note_use (rtx *x, void *data);
>>>>>>> +static bool skip_reg_p (int regno);
>>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>>> +   promotion.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  int mre;
>>>>>>> +
>>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>>> +  return mre != UNKNOWN;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>>> +{
>>>>>>> +  rtx reg;
>>>>>>> +
>>>>>>> +  if (REG_P (use))
>>>>>>> +    {
>>>>>>> +      *regno = REGNO (use);
>>>>>>> +      *offset = 0;
>>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>>> +    {
>>>>>>> +      reg = SUBREG_REG (use);
>>>>>>> +
>>>>>>> +      if (!REG_P (reg))
>>>>>>> +	return false;
>>>>>>> +
>>>>>>> +      *regno = REGNO (reg);
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>>> +	{
>>>>>>> +	  *offset = 0;
>>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  *offset = subreg_lsb (use);
>>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +new_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  if (uses[regno] == NULL)
>>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>>> +
>>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>>> +{
>>>>>>> +  int *current;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (size >= 0);
>>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return NULL;
>>>>>>> +
>>>>>>> +  p = new_use (regno);
>>>>>>> +  p->regno = regno;
>>>>>>> +  p->size = size;
>>>>>>> +  p->offset = offset;
>>>>>>> +  p->use = use;
>>>>>>> +
>>>>>>> +  /* Update the bigest use.  */
>>>>>>> +  current = &biggest_use[regno];
>>>>>>> +  *current = MAX (*current, size);
>>>>>>> +
>>>>>>> +  return p;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  const char *format_ptr;
>>>>>>> +  int i, j;
>>>>>>> +
>>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>>> +    if (format_ptr[i] == 'e')
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>>> +
>>>>>>> +static rtx
>>>>>>> +get_set (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  rtx sub;
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>>> +    return pattern;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>>> +      {
>>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>>> +	  return sub;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  return NULL_RTX;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>>> +{
>>>>>>> +  unsigned int i, smallest;
>>>>>>> +  int operand_size[2];
>>>>>>> +  int operand_offset[2];
>>>>>>> +  int used_size;
>>>>>>> +  unsigned int operand_regno[2];
>>>>>>> +  bool operand_reg[2];
>>>>>>> +  bool operand_ignore[2];
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    {
>>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>>> +      operand_ignore[i] = false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>>> +
>>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>>> +    {
>>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    if (!operand_reg[i])
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (!operand_ignore[i])
>>>>>>> +      {
>>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>>> +			  &XEXP (use, i));
>>>>>>> +	register_prop (set, p);
>>>>>>> +      }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>>> +{
>>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>>> +
>>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note promoted subregs in X.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +  rtx subreg = *x;
>>>>>>> +
>>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>>> +    register_promoted_subreg (subreg);
>>>>>>> +
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_use (rtx *x, void *data)
>>>>>>> +{
>>>>>>> +  rtx use = *x;
>>>>>>> +  rtx pattern = (rtx)data;
>>>>>>> +  int use_size, use_offset;
>>>>>>> +  unsigned int use_regno;
>>>>>>> +  rtx set;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>>> +
>>>>>>> +  set = get_set (use, pattern);
>>>>>>> +
>>>>>>> +  switch (GET_CODE (use))
>>>>>>> +    {
>>>>>>> +    case REG:
>>>>>>> +    case SUBREG:
>>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case SIGN_EXTEND:
>>>>>>> +    case ZERO_EXTEND:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case IOR:
>>>>>>> +    case AND:
>>>>>>> +    case XOR:
>>>>>>> +    case PLUS:
>>>>>>> +    case MINUS:
>>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>>> +      return;
>>>>>>> +    case NOT:
>>>>>>> +    case NEG:
>>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>>> +      return;
>>>>>>> +    case ASHIFT:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>>> +			  use_offset, x);
>>>>>>> +      return;
>>>>>>> +    default:
>>>>>>> +      note_embedded_uses (use, pattern);
>>>>>>> +      return;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +#ifdef EPILOGUE_USES
>>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +#ifdef EH_USES
>>>>>>> +  if (EH_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +skip_reg_p (int regno)
>>>>>>> +{
>>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>>> +     We don't handle that properly.  */
>>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_call_uses (rtx insn)
>>>>>>> +{
>>>>>>> +  rtx link, link_expr;
>>>>>>> +
>>>>>>> +  if (!CALL_P (insn))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>>> +    {
>>>>>>> +      link_expr = XEXP (link, 0);
>>>>>>> +
>>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Dump the biggest uses found.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +dump_biggest_use (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (!dump_file)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    if (biggest_use[i] > 0)
>>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +calculate_biggest_use (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>>> +
>>>>>>> +	if (CALL_P (insn))
>>>>>>> +	  note_call_uses (insn);
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  dump_biggest_use ();
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_prop (rtx set, use_type *use)
>>>>>>> +{
>>>>>>> +  prop_type *p;
>>>>>>> +  int regno;
>>>>>>> +
>>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (props[regno] == NULL)
>>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>>> +  p->set = set;
>>>>>>> +  p->uses_regno = use->regno;
>>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Add REGNO to the worklist.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +add_to_wl (int regno)
>>>>>>> +{
>>>>>>> +  if (in_wl[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (biggest_use[regno] > 0
>>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (propagated_size[regno] != NONE
>>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>>> +  in_wl[regno] = true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +pop_wl (void)
>>>>>>> +{
>>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>>> +  in_wl[regno] = false;
>>>>>>> +  return regno;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>>> +{
>>>>>>> +  if (dest_size == 0)
>>>>>>> +    return 0;
>>>>>>> +
>>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +get_biggest_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  use_type *p;
>>>>>>> +  int max = 0;
>>>>>>> +
>>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>>> +    max = MAX (max, p->size);
>>>>>>> +
>>>>>>> +  return max;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>>> +{
>>>>>>> +  int new_use_size;
>>>>>>> +  int prev_biggest_use;
>>>>>>> +  int *current;
>>>>>>> +
>>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>>> +
>>>>>>> +  if (new_use_size >= use->size)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  use->size = new_use_size;
>>>>>>> +
>>>>>>> +  current = &biggest_use[use->regno];
>>>>>>> +
>>>>>>> +  prev_biggest_use = *current;
>>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>>> +
>>>>>>> +  if (*current >= prev_biggest_use)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  add_to_wl (use->regno);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>>> +	     *current);
>>>>>>> +
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  prop_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>>> +    {
>>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>>> +      ++(*nr_propagations);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +  int nr_propagations = 0;
>>>>>>> +
>>>>>>> +  /* Initialize work list.  */
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    add_to_wl (i);
>>>>>>> +
>>>>>>> +  /* Work the work list.  */
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>>> +  while (!VEC_empty (int, wl))
>>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  rtx src, op0;
>>>>>>> +
>>>>>>> +  /* Detect set of reg.  */
>>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>>> +
>>>>>>> +  if (!REG_P (*dest))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  /* Detect sign or zero extension.  */
>>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>>> +    {
>>>>>>> +      op0 = XEXP (src, 0);
>>>>>>> +
>>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>>> +      if (GET_CODE (src) == AND)
>>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>>> +      else
>>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>>> +
>>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>>> +	{
>>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (REG_P (op0))
>>>>>>> +	{
>>>>>>> +	  *inner = op0;
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>>> +	{
>>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +find_extensions (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>>> +		   " reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>>> +	fprintf (dump_file, "\n");
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  int biggest_dest_use;
>>>>>>> +
>>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>>> +    gcc_unreachable ();
>>>>>>> +
>>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>>> +
>>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>>> +   redundant_extensions vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +find_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int ix;
>>>>>>> +  bool found = false;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +      {
>>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>>> +		   " defining reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +	found = true;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file && found)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  rtx subreg;
>>>>>>> +
>>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>>> +    {
>>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>>> +   writes to DEST.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>>> +{
>>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>>> +
>>>>>>> +  /* Check whether replacement is needed.  */
>>>>>>> +  if (dest != inner)
>>>>>>> +    {
>>>>>>> +      start_sequence ();
>>>>>>> +
>>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>>> +	{
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>>> +	{
>>>>>>> +	  emit_clobber (dest);
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>>> +
>>>>>>> +      seq = get_insns ();
>>>>>>> +      end_sequence ();
>>>>>>> +
>>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>>> +	  return;
>>>>>>> +
>>>>>>> +      /* Insert the replacement.  */
>>>>>>> +      emit_insn_before (seq, insn);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>>> +      if (dest != inner)
>>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Remove the extension.  */
>>>>>>> +  delete_insn (insn);
>>>>>>> +
>>>>>>> +  reset_promoted_subreg (dest);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +init_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>>> +
>>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    {
>>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>>> +      propagated_size[i] = NONE;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>>> +
>>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      uses[i] = NULL;
>>>>>>> +      props[i] = NULL;
>>>>>>> +      in_wl[i] = false;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +remove_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +  int ix;
>>>>>>> +
>>>>>>> +  if (!find_extensions ())
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  calculate_biggest_use ();
>>>>>>> +
>>>>>>> +  find_redundant_extensions ();
>>>>>>> +
>>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>>> +    {
>>>>>>> +      propagate ();
>>>>>>> +
>>>>>>> +      find_redundant_extensions ();
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>>> +    {
>>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +finish_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  XDELETEVEC (propagated_size);
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>>> +
>>>>>>> +  VEC_free (int, heap, wl);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      if (uses[i] != NULL)
>>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>>> +
>>>>>>> +      if (props[i] != NULL)
>>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  XDELETEVEC (uses);
>>>>>>> +  XDELETEVEC (props);
>>>>>>> +  XDELETEVEC (biggest_use);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Remove redundant extensions.  */
>>>>>>> +
>>>>>>> +static unsigned int
>>>>>>> +rest_of_handle_ee (void)
>>>>>>> +{
>>>>>>> +  n_regs = max_reg_num ();
>>>>>>> +
>>>>>>> +  init_pass ();
>>>>>>> +  remove_redundant_extensions ();
>>>>>>> +  finish_pass ();
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +gate_handle_ee (void)
>>>>>>> +{
>>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>>> +}
>>>>>>> +
>>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>>> +{
>>>>>>> + {
>>>>>>> +  RTL_PASS,
>>>>>>> +  "ee",                                 /* name */
>>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>>> +  NULL,                                 /* sub */
>>>>>>> +  NULL,                                 /* next */
>>>>>>> +  0,                                    /* static_pass_number */
>>>>>>> +  TV_EE,                                /* tv_id */
>>>>>>> +  0,                                    /* properties_required */
>>>>>>> +  0,                                    /* properties_provided */
>>>>>>> +  0,                                    /* properties_destroyed */
>>>>>>> +  0,                                    /* todo_flags_start */
>>>>>>> +  TODO_ggc_collect |
>>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>>> + }
>>>>>>> +};
>>>>>>> Index: gcc/common.opt
>>>>>>> ===================================================================
>>>>>>> --- gcc/common.opt (revision 189409)
>>>>>>> +++ gcc/common.opt (working copy)
>>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>>> Perform DWARF2 duplicate elimination
>>>>>>>
>>>>>>> +fextension-elimination
>>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>>> +Perform extension elimination
>>>>>>> +
>>>>>>> fipa-sra
>>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>>> Perform interprocedural reduction of aggregates
>>>>>>> Index: gcc/Makefile.in
>>>>>>> ===================================================================
>>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>>> 	dwarf2asm.o \
>>>>>>> 	dwarf2cfi.o \
>>>>>>> 	dwarf2out.o \
>>>>>>> +	ee.o \
>>>>>>> 	ebitmap.o \
>>>>>>> 	emit-rtl.o \
>>>>>>> 	et-forest.o \
>>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>>> Index: gcc/passes.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/passes.c (revision 189409)
>>>>>>> +++ gcc/passes.c (working copy)
>>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>>          NEXT_PASS (pass_combine);
>>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>>          NEXT_PASS (pass_regmove);
>>>
>>
> 
>
Kenneth Zadeck - July 17, 2012, 3:16 p.m.
the pass does not handle induction variables, i.e. variables that feed 
into themselves.

kenny
On 07/13/2012 03:53 AM, Tom de Vries wrote:
> On 12/07/12 14:04, Kenneth Zadeck wrote:
>> you are on the right track with the example but combine will not get
>> this unless everything is in the same bb.
>> the whole point of having a separate pass for doing extension
>> elimination is that it needs to be done over the entire function.
>>
> There is a pass_ree, which does inter-bb combine targeted at extensions.
> However, that pass is currently limited to combining extensions with the
> definitions of the register it extends. The way your example sounds, you want
> the reverse, where extensions are combined with all their uses.
> I would say pass_ree is the natural place to add this and handle the example you
> describe.
>
> Thanks,
> - Tom
>
>> my example is also a little more complex because, since we are talking
>> about induction vars, you have an initial assignment outside of a loop,
>> and increment inside the loop and the test you describe at the bottom of
>> the loop.
>>
>> I would point out that with respect to speed optimizations, the case i
>> am describing is in fact very important because getting code out of
>> loops is were the important gains are.   I believe that the ppc has a
>> some significant performance issues because of this kind of thing.
>>
>> kenny
>>
>>
>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>> Tom,
>>>>>
>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>> supportive, but i think you are missing an important case.
>>>>>
>>>>> My problem is that it the pass does not actually look at the target and
>>>>> make any decisions based on that target.
>>>>>
>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>> extension operations before doing 64 bit compare and branch
>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>> the machine.     There are a lot of machines that can do more than one
>>>>> size of comparison.
>>>>>
>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>> extensions because it believes that the length of the destination is the
>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>> branch instruction that does not cost more. in that case, those
>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>
>>>>> There are many operations other than compare and branch where the pass
>>>>> should be asking "can i shorten the target for free and therefore get
>>>>> rid of the extension?"
>>>> Kenneth,
>>>>
>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>> is the same.
>>>>
>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>> like this:
>>>> ...
>>>>     (insn (set (reg:DI 5)
>>>>                (zero_extend:DI (reg:SI 4))))
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:DI 5)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>>     ->
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:SI 4)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>> ...
>>>> I would expect combine to optimize this.
>>>>
>>>> In case I got the example all backwards or it is a too simple one, please
>>>> provide an rtl example that illustrates the optimization.
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>
>>>>>    right shifts, rotates, and stores are not in
>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>> branches, conditional moves.   There may even be machines that have this
>>>>> for divide, but i do not know of any off the top of my head.
>>>>>
>>>>> What i am suggesting moves this pass into the target specific set of
>>>>> optimizations rather than target independent set, but at where this pass
>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>> where all of the operands have been extended should be checked to see if
>>>>> it was really necessary to use the longer form before doing the
>>>>> propagation pass.
>>>>>
>>>>> kenny
>>>>>
>>>>>
>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>
>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>
>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>
>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>> currently written, yours doesn't.
>>>>>>>
>>>>>> Eric,
>>>>>>
>>>>>> I've done the following:
>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>     instructions.
>>>>>> - updated the patch to be applicable to current trunk
>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>     this thread), and added that one as test-case.
>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>     on the motivating example.
>>>>>>
>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>
>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>
>>>>>> OK for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>
>>>>>> 	* ee.c: New file.
>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>
>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>> Index: gcc/tree-pass.h
>>>>>> ===================================================================
>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>
>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>> @@ -5,19 +5,19 @@
>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -f (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +f (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (; i & 1; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> }
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -g (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +g (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (i = 0; i < n; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>> @@ -0,0 +1,16 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>> +{
>>>>>> +  unsigned int b = a;
>>>>>> +  if (c)
>>>>>> +    b = a & 0x10ff;
>>>>>> +  return b;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>> +{
>>>>>> +  if (c)
>>>>>> +    *z = 0;
>>>>>> +  *p ^= (unsigned char)s;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f (short d[2][2])
>>>>>> +{
>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>> +  d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +unsigned int f(unsigned char byte)
>>>>>> +{
>>>>>> +  return byte << 25;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/opts.c
>>>>>> ===================================================================
>>>>>> --- gcc/opts.c (revision 189409)
>>>>>> +++ gcc/opts.c (working copy)
>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>
>>>>>>        /* -O3 optimizations.  */
>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>> Index: gcc/timevar.def
>>>>>> ===================================================================
>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>> +++ gcc/timevar.def (working copy)
>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>> Index: gcc/ee.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/ee.c (revision 0)
>>>>>> @@ -0,0 +1,1190 @@
>>>>>> +/* Redundant extension elimination.
>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>> +
>>>>>> +This file is part of GCC.
>>>>>> +
>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>> +version.
>>>>>> +
>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>> +for more details.
>>>>>> +
>>>>>> +You should have received a copy of the GNU General Public License
>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>> +
>>>>>> +/*
>>>>>> +
>>>>>> +  MOTIVATING EXAMPLE
>>>>>> +
>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>> +
>>>>>> +    void f (short d[2][2])
>>>>>> +    {
>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>> +      d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +    }
>>>>>> +
>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>> +
>>>>>> +    (set (reg:SI 204)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 217)
>>>>>> +         (plus:SI (reg:SI 205)
>>>>>> +                  (reg:SI 204)))
>>>>>> +
>>>>>> +    (set (reg:SI 218)
>>>>>> +         (minus:SI (reg:SI 204)
>>>>>> +                   (reg:SI 205)))
>>>>>> +
>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>> +
>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>> +                 (const_int 2 [0x2])))
>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>> +
>>>>>> +
>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>> +
>>>>>> +
>>>>>> +  INTENDED EFFECT
>>>>>> +
>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>> +  than the extension.
>>>>>> +
>>>>>> +
>>>>>> +  IMPLEMENTATION
>>>>>> +
>>>>>> +  The pass scans at most two times over all instructions.
>>>>>> +
>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>> +  done.
>>>>>> +
>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>> +  operands of the insns defining the pseudo.
>>>>>> +
>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>> +
>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>> +  which we may find more redundant extensions.
>>>>>> +
>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>> +
>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>> +  of a paradoxical subreg instead.
>>>>>> +
>>>>>> +
>>>>>> +  ILLUSTRATION OF PASS
>>>>>> +
>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>> +
>>>>>> +  We find the 2 extensions:
>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>> +
>>>>>> +  We calculate the biggests uses of a register:
>>>>>> +    biggest_use
>>>>>> +    reg 204: size 32
>>>>>> +    reg 205: size 32
>>>>>> +    reg 217: size 16
>>>>>> +    reg 218: size 16
>>>>>> +
>>>>>> +  We propagate the biggest uses where possible:
>>>>>> +    propagations
>>>>>> +    205: 32 -> 16
>>>>>> +    204: 32 -> 16
>>>>>> +    214: 32 -> 16
>>>>>> +    215: 32 -> 16
>>>>>> +
>>>>>> +  We conclude that the extensions are redundant:
>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>> +
>>>>>> +  And we replace them with regcopies:
>>>>>> +    (set (reg:SI 204)
>>>>>> +        (reg:SI 213))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +        (reg:SI 216))
>>>>>> +
>>>>>> +
>>>>>> +  LIMITATIONS
>>>>>> +
>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>> +  done.
>>>>>> +
>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>> +
>>>>>> +#include "config.h"
>>>>>> +#include "system.h"
>>>>>> +#include "coretypes.h"
>>>>>> +#include "tm.h"
>>>>>> +#include "rtl.h"
>>>>>> +#include "tree.h"
>>>>>> +#include "tm_p.h"
>>>>>> +#include "flags.h"
>>>>>> +#include "regs.h"
>>>>>> +#include "hard-reg-set.h"
>>>>>> +#include "basic-block.h"
>>>>>> +#include "insn-config.h"
>>>>>> +#include "function.h"
>>>>>> +#include "expr.h"
>>>>>> +#include "insn-attr.h"
>>>>>> +#include "recog.h"
>>>>>> +#include "toplev.h"
>>>>>> +#include "target.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "optabs.h"
>>>>>> +#include "insn-codes.h"
>>>>>> +#include "rtlhooks-def.h"
>>>>>> +#include "output.h"
>>>>>> +#include "params.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "tree-pass.h"
>>>>>> +#include "cgraph.h"
>>>>>> +#include "vec.h"
>>>>>> +
>>>>>> +#define SKIP_REG (-1)
>>>>>> +#define NONE (-1)
>>>>>> +
>>>>>> +/* Number of registers at start of pass.  */
>>>>>> +
>>>>>> +static int n_regs;
>>>>>> +
>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>> +
>>>>>> +static int *biggest_use;
>>>>>> +
>>>>>> +/* Array to register the promoted subregs.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>> +
>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>> +
>>>>>> +static int *propagated_size;
>>>>>> +
>>>>>> +typedef struct use
>>>>>> +{
>>>>>> +  int regno;
>>>>>> +  int size;
>>>>>> +  int offset;
>>>>>> +  rtx *use;
>>>>>> +} use_type;
>>>>>> +
>>>>>> +DEF_VEC_O(use_type);
>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the uses.  */
>>>>>> +
>>>>>> +static VEC (use_type,heap) **uses;
>>>>>> +
>>>>>> +typedef struct prop
>>>>>> +{
>>>>>> +  rtx set;
>>>>>> +  int uses_regno;
>>>>>> +  int uses_index;
>>>>>> +} prop_type;
>>>>>> +
>>>>>> +DEF_VEC_O(prop_type);
>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the propagations.  */
>>>>>> +
>>>>>> +static VEC (prop_type,heap) **props;
>>>>>> +
>>>>>> +/* Work list for propragation.  */
>>>>>> +
>>>>>> +static VEC (int,heap) *wl;
>>>>>> +
>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>> +
>>>>>> +static bool *in_wl;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>> +   removed or replaced.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>> +
>>>>>> +/* Forward declaration.  */
>>>>>> +
>>>>>> +static void note_use (rtx *x, void *data);
>>>>>> +static bool skip_reg_p (int regno);
>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>> +   promotion.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  int mre;
>>>>>> +
>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>> +  return mre != UNKNOWN;
>>>>>> +}
>>>>>> +
>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>> +{
>>>>>> +  rtx reg;
>>>>>> +
>>>>>> +  if (REG_P (use))
>>>>>> +    {
>>>>>> +      *regno = REGNO (use);
>>>>>> +      *offset = 0;
>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>> +    {
>>>>>> +      reg = SUBREG_REG (use);
>>>>>> +
>>>>>> +      if (!REG_P (reg))
>>>>>> +	return false;
>>>>>> +
>>>>>> +      *regno = REGNO (reg);
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>> +	{
>>>>>> +	  *offset = 0;
>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  *offset = subreg_lsb (use);
>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +	}
>>>>>> +
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +new_use (unsigned int regno)
>>>>>> +{
>>>>>> +  if (uses[regno] == NULL)
>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>> +
>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>> +{
>>>>>> +  int *current;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  gcc_assert (size >= 0);
>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return NULL;
>>>>>> +
>>>>>> +  p = new_use (regno);
>>>>>> +  p->regno = regno;
>>>>>> +  p->size = size;
>>>>>> +  p->offset = offset;
>>>>>> +  p->use = use;
>>>>>> +
>>>>>> +  /* Update the bigest use.  */
>>>>>> +  current = &biggest_use[regno];
>>>>>> +  *current = MAX (*current, size);
>>>>>> +
>>>>>> +  return p;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  const char *format_ptr;
>>>>>> +  int i, j;
>>>>>> +
>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>> +    if (format_ptr[i] == 'e')
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>> +
>>>>>> +static rtx
>>>>>> +get_set (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  rtx sub;
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>> +    return pattern;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>> +      {
>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>> +	  return sub;
>>>>>> +      }
>>>>>> +
>>>>>> +  return NULL_RTX;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>> +{
>>>>>> +  unsigned int i, smallest;
>>>>>> +  int operand_size[2];
>>>>>> +  int operand_offset[2];
>>>>>> +  int used_size;
>>>>>> +  unsigned int operand_regno[2];
>>>>>> +  bool operand_reg[2];
>>>>>> +  bool operand_ignore[2];
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    {
>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>> +      operand_ignore[i] = false;
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>> +
>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>> +    {
>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    if (!operand_reg[i])
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (!operand_ignore[i])
>>>>>> +      {
>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>> +			  &XEXP (use, i));
>>>>>> +	register_prop (set, p);
>>>>>> +      }
>>>>>> +}
>>>>>> +
>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>> +{
>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>> +
>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note promoted subregs in X.  */
>>>>>> +
>>>>>> +static int
>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +  rtx subreg = *x;
>>>>>> +
>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>> +    register_promoted_subreg (subreg);
>>>>>> +
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_use (rtx *x, void *data)
>>>>>> +{
>>>>>> +  rtx use = *x;
>>>>>> +  rtx pattern = (rtx)data;
>>>>>> +  int use_size, use_offset;
>>>>>> +  unsigned int use_regno;
>>>>>> +  rtx set;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>> +
>>>>>> +  set = get_set (use, pattern);
>>>>>> +
>>>>>> +  switch (GET_CODE (use))
>>>>>> +    {
>>>>>> +    case REG:
>>>>>> +    case SUBREG:
>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case SIGN_EXTEND:
>>>>>> +    case ZERO_EXTEND:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case IOR:
>>>>>> +    case AND:
>>>>>> +    case XOR:
>>>>>> +    case PLUS:
>>>>>> +    case MINUS:
>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>> +      return;
>>>>>> +    case NOT:
>>>>>> +    case NEG:
>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>> +      return;
>>>>>> +    case ASHIFT:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>> +			  use_offset, x);
>>>>>> +      return;
>>>>>> +    default:
>>>>>> +      note_embedded_uses (use, pattern);
>>>>>> +      return;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +#ifdef EPILOGUE_USES
>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +#ifdef EH_USES
>>>>>> +  if (EH_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +skip_reg_p (int regno)
>>>>>> +{
>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>> +     We don't handle that properly.  */
>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_call_uses (rtx insn)
>>>>>> +{
>>>>>> +  rtx link, link_expr;
>>>>>> +
>>>>>> +  if (!CALL_P (insn))
>>>>>> +    return;
>>>>>> +
>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>> +    {
>>>>>> +      link_expr = XEXP (link, 0);
>>>>>> +
>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Dump the biggest uses found.  */
>>>>>> +
>>>>>> +static void
>>>>>> +dump_biggest_use (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (!dump_file)
>>>>>> +    return;
>>>>>> +
>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    if (biggest_use[i] > 0)
>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>> +
>>>>>> +  fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>> +
>>>>>> +static void
>>>>>> +calculate_biggest_use (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>> +
>>>>>> +	if (CALL_P (insn))
>>>>>> +	  note_call_uses (insn);
>>>>>> +      }
>>>>>> +
>>>>>> +  dump_biggest_use ();
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_prop (rtx set, use_type *use)
>>>>>> +{
>>>>>> +  prop_type *p;
>>>>>> +  int regno;
>>>>>> +
>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (props[regno] == NULL)
>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>> +  p->set = set;
>>>>>> +  p->uses_regno = use->regno;
>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>> +}
>>>>>> +
>>>>>> +/* Add REGNO to the worklist.  */
>>>>>> +
>>>>>> +static void
>>>>>> +add_to_wl (int regno)
>>>>>> +{
>>>>>> +  if (in_wl[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (biggest_use[regno] > 0
>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (propagated_size[regno] != NONE
>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>> +  in_wl[regno] = true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>> +
>>>>>> +static int
>>>>>> +pop_wl (void)
>>>>>> +{
>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>> +  in_wl[regno] = false;
>>>>>> +  return regno;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>> +
>>>>>> +static int
>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>> +{
>>>>>> +  if (dest_size == 0)
>>>>>> +    return 0;
>>>>>> +
>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>> +
>>>>>> +static int
>>>>>> +get_biggest_use (unsigned int regno)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  use_type *p;
>>>>>> +  int max = 0;
>>>>>> +
>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>> +    max = MAX (max, p->size);
>>>>>> +
>>>>>> +  return max;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>> +{
>>>>>> +  int new_use_size;
>>>>>> +  int prev_biggest_use;
>>>>>> +  int *current;
>>>>>> +
>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>> +
>>>>>> +  if (new_use_size >= use->size)
>>>>>> +    return;
>>>>>> +
>>>>>> +  use->size = new_use_size;
>>>>>> +
>>>>>> +  current = &biggest_use[use->regno];
>>>>>> +
>>>>>> +  prev_biggest_use = *current;
>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>> +
>>>>>> +  if (*current >= prev_biggest_use)
>>>>>> +    return;
>>>>>> +
>>>>>> +  add_to_wl (use->regno);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>> +	     *current);
>>>>>> +
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  prop_type *p;
>>>>>> +
>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>> +    {
>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>> +      ++(*nr_propagations);
>>>>>> +    }
>>>>>> +
>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>> +}
>>>>>> +
>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +  int nr_propagations = 0;
>>>>>> +
>>>>>> +  /* Initialize work list.  */
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    add_to_wl (i);
>>>>>> +
>>>>>> +  /* Work the work list.  */
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>> +  while (!VEC_empty (int, wl))
>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  rtx src, op0;
>>>>>> +
>>>>>> +  /* Detect set of reg.  */
>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>> +
>>>>>> +  if (!REG_P (*dest))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Detect sign or zero extension.  */
>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>> +    {
>>>>>> +      op0 = XEXP (src, 0);
>>>>>> +
>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>> +      if (GET_CODE (src) == AND)
>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>> +      else
>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>> +
>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>> +	{
>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (REG_P (op0))
>>>>>> +	{
>>>>>> +	  *inner = op0;
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>> +	{
>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +find_extensions (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>> +		   " reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>> +	fprintf (dump_file, "\n");
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  int biggest_dest_use;
>>>>>> +
>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>> +    gcc_unreachable ();
>>>>>> +
>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>> +
>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  return true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>> +   redundant_extensions vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +find_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int ix;
>>>>>> +  bool found = false;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +      {
>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>> +		   " defining reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +	found = true;
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file && found)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>> +
>>>>>> +static void
>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  rtx subreg;
>>>>>> +
>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>> +    {
>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>> +    }
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>> +   writes to DEST.  */
>>>>>> +
>>>>>> +static void
>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>> +{
>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>> +
>>>>>> +  /* Check whether replacement is needed.  */
>>>>>> +  if (dest != inner)
>>>>>> +    {
>>>>>> +      start_sequence ();
>>>>>> +
>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>> +	{
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>> +	{
>>>>>> +	  emit_clobber (dest);
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +
>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>> +
>>>>>> +      seq = get_insns ();
>>>>>> +      end_sequence ();
>>>>>> +
>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>> +	  return;
>>>>>> +
>>>>>> +      /* Insert the replacement.  */
>>>>>> +      emit_insn_before (seq, insn);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>> +      if (dest != inner)
>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Remove the extension.  */
>>>>>> +  delete_insn (insn);
>>>>>> +
>>>>>> +  reset_promoted_subreg (dest);
>>>>>> +}
>>>>>> +
>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +init_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>> +
>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    {
>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>> +      propagated_size[i] = NONE;
>>>>>> +    }
>>>>>> +
>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>> +
>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      uses[i] = NULL;
>>>>>> +      props[i] = NULL;
>>>>>> +      in_wl[i] = false;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>> +
>>>>>> +static void
>>>>>> +remove_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +  int ix;
>>>>>> +
>>>>>> +  if (!find_extensions ())
>>>>>> +    return;
>>>>>> +
>>>>>> +  calculate_biggest_use ();
>>>>>> +
>>>>>> +  find_redundant_extensions ();
>>>>>> +
>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>> +    {
>>>>>> +      propagate ();
>>>>>> +
>>>>>> +      find_redundant_extensions ();
>>>>>> +    }
>>>>>> +
>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>> +    {
>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>> +    }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +finish_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  XDELETEVEC (propagated_size);
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>> +
>>>>>> +  VEC_free (int, heap, wl);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      if (uses[i] != NULL)
>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>> +
>>>>>> +      if (props[i] != NULL)
>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>> +    }
>>>>>> +
>>>>>> +  XDELETEVEC (uses);
>>>>>> +  XDELETEVEC (props);
>>>>>> +  XDELETEVEC (biggest_use);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Remove redundant extensions.  */
>>>>>> +
>>>>>> +static unsigned int
>>>>>> +rest_of_handle_ee (void)
>>>>>> +{
>>>>>> +  n_regs = max_reg_num ();
>>>>>> +
>>>>>> +  init_pass ();
>>>>>> +  remove_redundant_extensions ();
>>>>>> +  finish_pass ();
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +gate_handle_ee (void)
>>>>>> +{
>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>> +}
>>>>>> +
>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>> +{
>>>>>> + {
>>>>>> +  RTL_PASS,
>>>>>> +  "ee",                                 /* name */
>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>> +  NULL,                                 /* sub */
>>>>>> +  NULL,                                 /* next */
>>>>>> +  0,                                    /* static_pass_number */
>>>>>> +  TV_EE,                                /* tv_id */
>>>>>> +  0,                                    /* properties_required */
>>>>>> +  0,                                    /* properties_provided */
>>>>>> +  0,                                    /* properties_destroyed */
>>>>>> +  0,                                    /* todo_flags_start */
>>>>>> +  TODO_ggc_collect |
>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>> + }
>>>>>> +};
>>>>>> Index: gcc/common.opt
>>>>>> ===================================================================
>>>>>> --- gcc/common.opt (revision 189409)
>>>>>> +++ gcc/common.opt (working copy)
>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>> Perform DWARF2 duplicate elimination
>>>>>>
>>>>>> +fextension-elimination
>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>> +Perform extension elimination
>>>>>> +
>>>>>> fipa-sra
>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>> Perform interprocedural reduction of aggregates
>>>>>> Index: gcc/Makefile.in
>>>>>> ===================================================================
>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>> 	dwarf2asm.o \
>>>>>> 	dwarf2cfi.o \
>>>>>> 	dwarf2out.o \
>>>>>> +	ee.o \
>>>>>> 	ebitmap.o \
>>>>>> 	emit-rtl.o \
>>>>>> 	et-forest.o \
>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>> Index: gcc/passes.c
>>>>>> ===================================================================
>>>>>> --- gcc/passes.c (revision 189409)
>>>>>> +++ gcc/passes.c (working copy)
>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>          NEXT_PASS (pass_combine);
>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>          NEXT_PASS (pass_regmove);
>>
>
Tom de Vries - July 20, 2012, 6:41 p.m.
On 17/07/12 17:16, Kenneth Zadeck wrote:
> the pass does not handle induction variables, i.e. variables that feed 
> into themselves.
> 

Kenny,

I know of two types of redundant extensions:
1. extensions that are redundant because the bits they are extending are already
   extended.
2. extensions that are redundant because the bits they are extending are never
   used.

The two types can be detected by different propagations. The first type can be
addressed by doing value-range propagation. The second type can be addressed by
a backward unused-bits propagation.

pass_ee implements such a backward unused-bits propagation.

I've created a simple example, which I hope represents the example you're
concerned about:
...
unsigned int
f (unsigned short n, unsigned int *x)
{
  unsigned int sum = 0;
  unsigned short s;

  for (s = 0; s < n ; s += 1)
    {
      sum += *x;
      x++;
    }

  return sum;
}
...

Assembly for mips:
...
	andi	$4,$4,0xffff
	move	$3,$0
	j	$L2
	move	$2,$0
$L3:
	lw	$7,0($5)
	andi	$3,$6,0xffff
	addu	$2,$2,$7
	addiu	$5,$5,4
$L2:
	bne	$3,$4,$L3
	addiu	$6,$3,1
	j	$31
	nop
...

The extend 'andi $3,$6,0xffff' is redundant, because it zero-extends a value
that is already zero-extended (so the first type of redundancy). This is not
something that value range propagation by itself can figure out though.

In the rtl representation below, vrp can figure out that reg 233 is always zero
extented, as a consequence of it's definitions. But it can't say that for sure
about reg 240.

The only way to figure that out, is by doing induction variable analysis. If we
assume that induction variable analysis is able to figure out the range of regs
233 an 240, then removing the redundant extension can be done by either:
- vrp. In this example, this can be limited to simply applying the value
  range of reg 240 to its use, which doesn't really require a full vrp
  solution. I think that in most test-cases like this, other redundant
  extensions will also be direct uses of the induction variables, and can be
  removed like this.
- canonical induction variable insertion (replacing the induction variables
  computations with a new canonical and simple induction variable), but that
  doesn't works for other extensions which use the induction variable.

Rtl representation for the test-case:
...
(note 46 0 40 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 42 40 41 2 (set (reg/v/f:SI 238 [ x ])
        (reg:SI 5 $5 [ x ])))

(insn 41 42 43 2 (set (reg/v:SI 236 [ n+-2 ])
        (zero_extend:SI (reg:HI 4 $4 [ n ]))))

(insn 44 43 45 2 (set (reg/v:SI 233 [ s+-2 ])
        (const_int 0 [0])))

(insn 45 44 58 2 (set (reg/v:SI 232 [ sum ])
        (const_int 0 [0])))

(jump_insn 72 45 73 2 (set (pc)
        (label_ref 56))
 -> 56)

(code_label 58 45 50 3 3 "" [1 uses])

(note 50 58 51 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(insn 51 50 52 3 (set (reg:SI 239)
        (mem:SI (reg/v/f:SI 238 [ x ]) )))

(insn 52 51 53 3 (set (reg/v:SI 232 [ sum ])
        (plus:SI (reg/v:SI 232 [ sum ])
            (reg:SI 239))))

(insn 53 52 54 3 (set (reg/v/f:SI 238 [ x ])
        (plus:SI (reg/v/f:SI 238 [ x ])
            (const_int 4 [0x4]))))

(insn 54 53 55 3 (set (reg:SI 240)
        (plus:SI (reg/v:SI 233 [ s+-2 ])
            (const_int 1 [0x1]))))

(insn 55 54 56 3 (set (reg/v:SI 233 [ s+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 240) 2))))

(code_label 56 55 57 4 2 "" [1 uses])

(note 57 56 59 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(jump_insn 59 57 60 4 (set (pc)
	(if_then_else (ne (reg/v:SI 233 [ s+-2 ])
                (reg/v:SI 236 [ n+-2 ]))
            (label_ref 58)
            (pc)))
 -> 58)

(note 60 59 65 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 65 60 68 5 (set (reg/i:SI 2 $2)
        (reg/v:SI 232 [ sum ])))
...

A tree-level loop index promotion pass was proposed three years back (
http://gcc.gnu.org/ml/gcc-patches/2009-04/msg01860.html ) that does optimize
this example as well. It does induction variable analysis, promotes the mode of
the induction variable and rewrites the uses, which prevents the extend from
being generated at expand.

In conclusion:
- to remove the redundant extension of type 1 in this test-case at rtl level,
  induction variable analysis is needed, in combination with (simple) vrp.
- pass_ee implements backward unused-bits propagation, which targets the
  redundant extension of type 2. This example, and more generally extensions
  which are redundant because they use range-restricted induction variables are
  out of scope for backward unused-bits propagation.

Thanks,
- Tom

> kenny
> On 07/13/2012 03:53 AM, Tom de Vries wrote:
>> On 12/07/12 14:04, Kenneth Zadeck wrote:
>>> you are on the right track with the example but combine will not get
>>> this unless everything is in the same bb.
>>> the whole point of having a separate pass for doing extension
>>> elimination is that it needs to be done over the entire function.
>>>
>> There is a pass_ree, which does inter-bb combine targeted at extensions.
>> However, that pass is currently limited to combining extensions with the
>> definitions of the register it extends. The way your example sounds, you want
>> the reverse, where extensions are combined with all their uses.
>> I would say pass_ree is the natural place to add this and handle the example you
>> describe.
>>
>> Thanks,
>> - Tom
>>
>>> my example is also a little more complex because, since we are talking
>>> about induction vars, you have an initial assignment outside of a loop,
>>> and increment inside the loop and the test you describe at the bottom of
>>> the loop.
>>>
>>> I would point out that with respect to speed optimizations, the case i
>>> am describing is in fact very important because getting code out of
>>> loops is were the important gains are.   I believe that the ppc has a
>>> some significant performance issues because of this kind of thing.
>>>
>>> kenny
>>>
>>>
>>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>>> Tom,
>>>>>>
>>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>>> supportive, but i think you are missing an important case.
>>>>>>
>>>>>> My problem is that it the pass does not actually look at the target and
>>>>>> make any decisions based on that target.
>>>>>>
>>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>>> extension operations before doing 64 bit compare and branch
>>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>>> the machine.     There are a lot of machines that can do more than one
>>>>>> size of comparison.
>>>>>>
>>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>>> extensions because it believes that the length of the destination is the
>>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>>> branch instruction that does not cost more. in that case, those
>>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>>
>>>>>> There are many operations other than compare and branch where the pass
>>>>>> should be asking "can i shorten the target for free and therefore get
>>>>>> rid of the extension?"
>>>>> Kenneth,
>>>>>
>>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>>> is the same.
>>>>>
>>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>>> like this:
>>>>> ...
>>>>>     (insn (set (reg:DI 5)
>>>>>                (zero_extend:DI (reg:SI 4))))
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:DI 5)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>>     ->
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:SI 4)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>> ...
>>>>> I would expect combine to optimize this.
>>>>>
>>>>> In case I got the example all backwards or it is a too simple one, please
>>>>> provide an rtl example that illustrates the optimization.
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>
>>>>>>    right shifts, rotates, and stores are not in
>>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>>> branches, conditional moves.   There may even be machines that have this
>>>>>> for divide, but i do not know of any off the top of my head.
>>>>>>
>>>>>> What i am suggesting moves this pass into the target specific set of
>>>>>> optimizations rather than target independent set, but at where this pass
>>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>>> where all of the operands have been extended should be checked to see if
>>>>>> it was really necessary to use the longer form before doing the
>>>>>> propagation pass.
>>>>>>
>>>>>> kenny
>>>>>>
>>>>>>
>>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>>
>>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>>
>>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>>
>>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>>> currently written, yours doesn't.
>>>>>>>>
>>>>>>> Eric,
>>>>>>>
>>>>>>> I've done the following:
>>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>>     instructions.
>>>>>>> - updated the patch to be applicable to current trunk
>>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>>     this thread), and added that one as test-case.
>>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>>     on the motivating example.
>>>>>>>
>>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>>
>>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Tom
>>>>>>>
>>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>>
>>>>>>> 	* ee.c: New file.
>>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>>
>>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>>> Index: gcc/tree-pass.h
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>>
>>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>>> @@ -5,19 +5,19 @@
>>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -f (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +f (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (; i & 1; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> }
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -g (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +g (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (i = 0; i < n; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>>> @@ -0,0 +1,16 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>>> +{
>>>>>>> +  unsigned int b = a;
>>>>>>> +  if (c)
>>>>>>> +    b = a & 0x10ff;
>>>>>>> +  return b;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>>> +{
>>>>>>> +  if (c)
>>>>>>> +    *z = 0;
>>>>>>> +  *p ^= (unsigned char)s;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f (short d[2][2])
>>>>>>> +{
>>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>>> +  d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +unsigned int f(unsigned char byte)
>>>>>>> +{
>>>>>>> +  return byte << 25;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/opts.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/opts.c (revision 189409)
>>>>>>> +++ gcc/opts.c (working copy)
>>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>>
>>>>>>>        /* -O3 optimizations.  */
>>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>>> Index: gcc/timevar.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>>> +++ gcc/timevar.def (working copy)
>>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>>> Index: gcc/ee.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/ee.c (revision 0)
>>>>>>> @@ -0,0 +1,1190 @@
>>>>>>> +/* Redundant extension elimination.
>>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>>> +
>>>>>>> +This file is part of GCC.
>>>>>>> +
>>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>>> +version.
>>>>>>> +
>>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>>> +for more details.
>>>>>>> +
>>>>>>> +You should have received a copy of the GNU General Public License
>>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>>> +
>>>>>>> +/*
>>>>>>> +
>>>>>>> +  MOTIVATING EXAMPLE
>>>>>>> +
>>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>>> +
>>>>>>> +    void f (short d[2][2])
>>>>>>> +    {
>>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>>> +      d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>>> +
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 217)
>>>>>>> +         (plus:SI (reg:SI 205)
>>>>>>> +                  (reg:SI 204)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 218)
>>>>>>> +         (minus:SI (reg:SI 204)
>>>>>>> +                   (reg:SI 205)))
>>>>>>> +
>>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>>> +
>>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>>> +                 (const_int 2 [0x2])))
>>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>>> +
>>>>>>> +
>>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>>> +
>>>>>>> +
>>>>>>> +  INTENDED EFFECT
>>>>>>> +
>>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>>> +  than the extension.
>>>>>>> +
>>>>>>> +
>>>>>>> +  IMPLEMENTATION
>>>>>>> +
>>>>>>> +  The pass scans at most two times over all instructions.
>>>>>>> +
>>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>>> +  operands of the insns defining the pseudo.
>>>>>>> +
>>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>>> +
>>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>>> +  which we may find more redundant extensions.
>>>>>>> +
>>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>>> +
>>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>>> +  of a paradoxical subreg instead.
>>>>>>> +
>>>>>>> +
>>>>>>> +  ILLUSTRATION OF PASS
>>>>>>> +
>>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>>> +
>>>>>>> +  We find the 2 extensions:
>>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>>> +
>>>>>>> +  We calculate the biggests uses of a register:
>>>>>>> +    biggest_use
>>>>>>> +    reg 204: size 32
>>>>>>> +    reg 205: size 32
>>>>>>> +    reg 217: size 16
>>>>>>> +    reg 218: size 16
>>>>>>> +
>>>>>>> +  We propagate the biggest uses where possible:
>>>>>>> +    propagations
>>>>>>> +    205: 32 -> 16
>>>>>>> +    204: 32 -> 16
>>>>>>> +    214: 32 -> 16
>>>>>>> +    215: 32 -> 16
>>>>>>> +
>>>>>>> +  We conclude that the extensions are redundant:
>>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>>> +
>>>>>>> +  And we replace them with regcopies:
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +        (reg:SI 213))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +        (reg:SI 216))
>>>>>>> +
>>>>>>> +
>>>>>>> +  LIMITATIONS
>>>>>>> +
>>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>>> +
>>>>>>> +#include "config.h"
>>>>>>> +#include "system.h"
>>>>>>> +#include "coretypes.h"
>>>>>>> +#include "tm.h"
>>>>>>> +#include "rtl.h"
>>>>>>> +#include "tree.h"
>>>>>>> +#include "tm_p.h"
>>>>>>> +#include "flags.h"
>>>>>>> +#include "regs.h"
>>>>>>> +#include "hard-reg-set.h"
>>>>>>> +#include "basic-block.h"
>>>>>>> +#include "insn-config.h"
>>>>>>> +#include "function.h"
>>>>>>> +#include "expr.h"
>>>>>>> +#include "insn-attr.h"
>>>>>>> +#include "recog.h"
>>>>>>> +#include "toplev.h"
>>>>>>> +#include "target.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "optabs.h"
>>>>>>> +#include "insn-codes.h"
>>>>>>> +#include "rtlhooks-def.h"
>>>>>>> +#include "output.h"
>>>>>>> +#include "params.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "tree-pass.h"
>>>>>>> +#include "cgraph.h"
>>>>>>> +#include "vec.h"
>>>>>>> +
>>>>>>> +#define SKIP_REG (-1)
>>>>>>> +#define NONE (-1)
>>>>>>> +
>>>>>>> +/* Number of registers at start of pass.  */
>>>>>>> +
>>>>>>> +static int n_regs;
>>>>>>> +
>>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>>> +
>>>>>>> +static int *biggest_use;
>>>>>>> +
>>>>>>> +/* Array to register the promoted subregs.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>>> +
>>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>>> +
>>>>>>> +static int *propagated_size;
>>>>>>> +
>>>>>>> +typedef struct use
>>>>>>> +{
>>>>>>> +  int regno;
>>>>>>> +  int size;
>>>>>>> +  int offset;
>>>>>>> +  rtx *use;
>>>>>>> +} use_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(use_type);
>>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the uses.  */
>>>>>>> +
>>>>>>> +static VEC (use_type,heap) **uses;
>>>>>>> +
>>>>>>> +typedef struct prop
>>>>>>> +{
>>>>>>> +  rtx set;
>>>>>>> +  int uses_regno;
>>>>>>> +  int uses_index;
>>>>>>> +} prop_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(prop_type);
>>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the propagations.  */
>>>>>>> +
>>>>>>> +static VEC (prop_type,heap) **props;
>>>>>>> +
>>>>>>> +/* Work list for propragation.  */
>>>>>>> +
>>>>>>> +static VEC (int,heap) *wl;
>>>>>>> +
>>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>>> +
>>>>>>> +static bool *in_wl;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>>> +   removed or replaced.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>>> +
>>>>>>> +/* Forward declaration.  */
>>>>>>> +
>>>>>>> +static void note_use (rtx *x, void *data);
>>>>>>> +static bool skip_reg_p (int regno);
>>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>>> +   promotion.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  int mre;
>>>>>>> +
>>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>>> +  return mre != UNKNOWN;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>>> +{
>>>>>>> +  rtx reg;
>>>>>>> +
>>>>>>> +  if (REG_P (use))
>>>>>>> +    {
>>>>>>> +      *regno = REGNO (use);
>>>>>>> +      *offset = 0;
>>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>>> +    {
>>>>>>> +      reg = SUBREG_REG (use);
>>>>>>> +
>>>>>>> +      if (!REG_P (reg))
>>>>>>> +	return false;
>>>>>>> +
>>>>>>> +      *regno = REGNO (reg);
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>>> +	{
>>>>>>> +	  *offset = 0;
>>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  *offset = subreg_lsb (use);
>>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +new_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  if (uses[regno] == NULL)
>>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>>> +
>>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>>> +{
>>>>>>> +  int *current;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (size >= 0);
>>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return NULL;
>>>>>>> +
>>>>>>> +  p = new_use (regno);
>>>>>>> +  p->regno = regno;
>>>>>>> +  p->size = size;
>>>>>>> +  p->offset = offset;
>>>>>>> +  p->use = use;
>>>>>>> +
>>>>>>> +  /* Update the bigest use.  */
>>>>>>> +  current = &biggest_use[regno];
>>>>>>> +  *current = MAX (*current, size);
>>>>>>> +
>>>>>>> +  return p;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  const char *format_ptr;
>>>>>>> +  int i, j;
>>>>>>> +
>>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>>> +    if (format_ptr[i] == 'e')
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>>> +
>>>>>>> +static rtx
>>>>>>> +get_set (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  rtx sub;
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>>> +    return pattern;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>>> +      {
>>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>>> +	  return sub;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  return NULL_RTX;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>>> +{
>>>>>>> +  unsigned int i, smallest;
>>>>>>> +  int operand_size[2];
>>>>>>> +  int operand_offset[2];
>>>>>>> +  int used_size;
>>>>>>> +  unsigned int operand_regno[2];
>>>>>>> +  bool operand_reg[2];
>>>>>>> +  bool operand_ignore[2];
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    {
>>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>>> +      operand_ignore[i] = false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>>> +
>>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>>> +    {
>>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    if (!operand_reg[i])
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (!operand_ignore[i])
>>>>>>> +      {
>>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>>> +			  &XEXP (use, i));
>>>>>>> +	register_prop (set, p);
>>>>>>> +      }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>>> +{
>>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>>> +
>>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note promoted subregs in X.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +  rtx subreg = *x;
>>>>>>> +
>>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>>> +    register_promoted_subreg (subreg);
>>>>>>> +
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_use (rtx *x, void *data)
>>>>>>> +{
>>>>>>> +  rtx use = *x;
>>>>>>> +  rtx pattern = (rtx)data;
>>>>>>> +  int use_size, use_offset;
>>>>>>> +  unsigned int use_regno;
>>>>>>> +  rtx set;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>>> +
>>>>>>> +  set = get_set (use, pattern);
>>>>>>> +
>>>>>>> +  switch (GET_CODE (use))
>>>>>>> +    {
>>>>>>> +    case REG:
>>>>>>> +    case SUBREG:
>>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case SIGN_EXTEND:
>>>>>>> +    case ZERO_EXTEND:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case IOR:
>>>>>>> +    case AND:
>>>>>>> +    case XOR:
>>>>>>> +    case PLUS:
>>>>>>> +    case MINUS:
>>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>>> +      return;
>>>>>>> +    case NOT:
>>>>>>> +    case NEG:
>>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>>> +      return;
>>>>>>> +    case ASHIFT:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>>> +			  use_offset, x);
>>>>>>> +      return;
>>>>>>> +    default:
>>>>>>> +      note_embedded_uses (use, pattern);
>>>>>>> +      return;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +#ifdef EPILOGUE_USES
>>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +#ifdef EH_USES
>>>>>>> +  if (EH_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +skip_reg_p (int regno)
>>>>>>> +{
>>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>>> +     We don't handle that properly.  */
>>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_call_uses (rtx insn)
>>>>>>> +{
>>>>>>> +  rtx link, link_expr;
>>>>>>> +
>>>>>>> +  if (!CALL_P (insn))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>>> +    {
>>>>>>> +      link_expr = XEXP (link, 0);
>>>>>>> +
>>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Dump the biggest uses found.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +dump_biggest_use (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (!dump_file)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    if (biggest_use[i] > 0)
>>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +calculate_biggest_use (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>>> +
>>>>>>> +	if (CALL_P (insn))
>>>>>>> +	  note_call_uses (insn);
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  dump_biggest_use ();
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_prop (rtx set, use_type *use)
>>>>>>> +{
>>>>>>> +  prop_type *p;
>>>>>>> +  int regno;
>>>>>>> +
>>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (props[regno] == NULL)
>>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>>> +  p->set = set;
>>>>>>> +  p->uses_regno = use->regno;
>>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Add REGNO to the worklist.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +add_to_wl (int regno)
>>>>>>> +{
>>>>>>> +  if (in_wl[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (biggest_use[regno] > 0
>>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (propagated_size[regno] != NONE
>>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>>> +  in_wl[regno] = true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +pop_wl (void)
>>>>>>> +{
>>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>>> +  in_wl[regno] = false;
>>>>>>> +  return regno;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>>> +{
>>>>>>> +  if (dest_size == 0)
>>>>>>> +    return 0;
>>>>>>> +
>>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +get_biggest_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  use_type *p;
>>>>>>> +  int max = 0;
>>>>>>> +
>>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>>> +    max = MAX (max, p->size);
>>>>>>> +
>>>>>>> +  return max;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>>> +{
>>>>>>> +  int new_use_size;
>>>>>>> +  int prev_biggest_use;
>>>>>>> +  int *current;
>>>>>>> +
>>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>>> +
>>>>>>> +  if (new_use_size >= use->size)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  use->size = new_use_size;
>>>>>>> +
>>>>>>> +  current = &biggest_use[use->regno];
>>>>>>> +
>>>>>>> +  prev_biggest_use = *current;
>>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>>> +
>>>>>>> +  if (*current >= prev_biggest_use)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  add_to_wl (use->regno);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>>> +	     *current);
>>>>>>> +
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  prop_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>>> +    {
>>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>>> +      ++(*nr_propagations);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +  int nr_propagations = 0;
>>>>>>> +
>>>>>>> +  /* Initialize work list.  */
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    add_to_wl (i);
>>>>>>> +
>>>>>>> +  /* Work the work list.  */
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>>> +  while (!VEC_empty (int, wl))
>>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  rtx src, op0;
>>>>>>> +
>>>>>>> +  /* Detect set of reg.  */
>>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>>> +
>>>>>>> +  if (!REG_P (*dest))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  /* Detect sign or zero extension.  */
>>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>>> +    {
>>>>>>> +      op0 = XEXP (src, 0);
>>>>>>> +
>>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>>> +      if (GET_CODE (src) == AND)
>>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>>> +      else
>>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>>> +
>>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>>> +	{
>>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (REG_P (op0))
>>>>>>> +	{
>>>>>>> +	  *inner = op0;
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>>> +	{
>>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +find_extensions (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>>> +		   " reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>>> +	fprintf (dump_file, "\n");
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  int biggest_dest_use;
>>>>>>> +
>>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>>> +    gcc_unreachable ();
>>>>>>> +
>>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>>> +
>>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>>> +   redundant_extensions vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +find_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int ix;
>>>>>>> +  bool found = false;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +      {
>>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>>> +		   " defining reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +	found = true;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file && found)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  rtx subreg;
>>>>>>> +
>>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>>> +    {
>>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>>> +   writes to DEST.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>>> +{
>>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>>> +
>>>>>>> +  /* Check whether replacement is needed.  */
>>>>>>> +  if (dest != inner)
>>>>>>> +    {
>>>>>>> +      start_sequence ();
>>>>>>> +
>>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>>> +	{
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>>> +	{
>>>>>>> +	  emit_clobber (dest);
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>>> +
>>>>>>> +      seq = get_insns ();
>>>>>>> +      end_sequence ();
>>>>>>> +
>>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>>> +	  return;
>>>>>>> +
>>>>>>> +      /* Insert the replacement.  */
>>>>>>> +      emit_insn_before (seq, insn);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>>> +      if (dest != inner)
>>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Remove the extension.  */
>>>>>>> +  delete_insn (insn);
>>>>>>> +
>>>>>>> +  reset_promoted_subreg (dest);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +init_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>>> +
>>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    {
>>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>>> +      propagated_size[i] = NONE;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>>> +
>>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      uses[i] = NULL;
>>>>>>> +      props[i] = NULL;
>>>>>>> +      in_wl[i] = false;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +remove_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +  int ix;
>>>>>>> +
>>>>>>> +  if (!find_extensions ())
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  calculate_biggest_use ();
>>>>>>> +
>>>>>>> +  find_redundant_extensions ();
>>>>>>> +
>>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>>> +    {
>>>>>>> +      propagate ();
>>>>>>> +
>>>>>>> +      find_redundant_extensions ();
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>>> +    {
>>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +finish_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  XDELETEVEC (propagated_size);
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>>> +
>>>>>>> +  VEC_free (int, heap, wl);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      if (uses[i] != NULL)
>>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>>> +
>>>>>>> +      if (props[i] != NULL)
>>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  XDELETEVEC (uses);
>>>>>>> +  XDELETEVEC (props);
>>>>>>> +  XDELETEVEC (biggest_use);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Remove redundant extensions.  */
>>>>>>> +
>>>>>>> +static unsigned int
>>>>>>> +rest_of_handle_ee (void)
>>>>>>> +{
>>>>>>> +  n_regs = max_reg_num ();
>>>>>>> +
>>>>>>> +  init_pass ();
>>>>>>> +  remove_redundant_extensions ();
>>>>>>> +  finish_pass ();
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +gate_handle_ee (void)
>>>>>>> +{
>>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>>> +}
>>>>>>> +
>>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>>> +{
>>>>>>> + {
>>>>>>> +  RTL_PASS,
>>>>>>> +  "ee",                                 /* name */
>>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>>> +  NULL,                                 /* sub */
>>>>>>> +  NULL,                                 /* next */
>>>>>>> +  0,                                    /* static_pass_number */
>>>>>>> +  TV_EE,                                /* tv_id */
>>>>>>> +  0,                                    /* properties_required */
>>>>>>> +  0,                                    /* properties_provided */
>>>>>>> +  0,                                    /* properties_destroyed */
>>>>>>> +  0,                                    /* todo_flags_start */
>>>>>>> +  TODO_ggc_collect |
>>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>>> + }
>>>>>>> +};
>>>>>>> Index: gcc/common.opt
>>>>>>> ===================================================================
>>>>>>> --- gcc/common.opt (revision 189409)
>>>>>>> +++ gcc/common.opt (working copy)
>>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>>> Perform DWARF2 duplicate elimination
>>>>>>>
>>>>>>> +fextension-elimination
>>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>>> +Perform extension elimination
>>>>>>> +
>>>>>>> fipa-sra
>>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>>> Perform interprocedural reduction of aggregates
>>>>>>> Index: gcc/Makefile.in
>>>>>>> ===================================================================
>>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>>> 	dwarf2asm.o \
>>>>>>> 	dwarf2cfi.o \
>>>>>>> 	dwarf2out.o \
>>>>>>> +	ee.o \
>>>>>>> 	ebitmap.o \
>>>>>>> 	emit-rtl.o \
>>>>>>> 	et-forest.o \
>>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>>> Index: gcc/passes.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/passes.c (revision 189409)
>>>>>>> +++ gcc/passes.c (working copy)
>>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>>          NEXT_PASS (pass_combine);
>>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>>          NEXT_PASS (pass_regmove);
>>>
>>
> 
>
Tom de Vries - Aug. 20, 2012, 1:40 p.m.
On 11/07/12 12:30, Tom de Vries wrote:
> On 13/11/10 10:50, Eric Botcazou wrote:
>>> I profiled the pass on spec2000:
>>>
>>>                     -mabi=32     -mabi=64
>>> ee-pass (usr time):     0.70         1.16
>>> total   (usr time):   919.30       879.26
>>> ee-pass        (%):     0.08         0.13
>>>
>>> The pass takes 0.13% or less of the total usr runtime.
>>
>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>
>>> Is it necessary to improve the runtime of this pass?
>>
>> I've already given my opinion about the implementation.  The other passes in 
>> the compiler try hard not to rescan everything when a single bit changes; as 
>> currently written, yours doesn't.
>>
> 
> Eric,
> 
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
> 
> bootstrapped and reg-tested on x86_64 and i686.
> 
> build and reg-tested on mips, mips64, and arm.
> 
> OK for trunk?
> 

Eric,

does the new patch meet your concerns related to rescanning?

If so, OK for trunk?

Thanks,
- Tom


> Thanks,
> - Tom
> 
> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
> 
> 	* ee.c: New file.
> 	* tree-pass.h (pass_ee): Declare.
> 	* opts.c ( default_options_table): Set flag_ee at -O2.
> 	* timevar.def (TV_EE): New timevar.
> 	* common.opt (fextension-elimination): New option.
> 	* Makefile.in (ee.o): New rule.
> 	* passes.c (pass_ee): Add it.
> 
> 	* gcc.dg/extend-1.c: New test.
> 	* gcc.dg/extend-2.c: Same.
> 	* gcc.dg/extend-2-64.c: Same.
> 	* gcc.dg/extend-3.c: Same.
> 	* gcc.dg/extend-4.c: Same.
> 	* gcc.dg/extend-5.c: Same.
> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>

Patch

Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h (revision 189409)
+++ gcc/tree-pass.h (working copy)
@@ -483,6 +483,7 @@  extern struct gimple_opt_pass pass_fixup
 
 extern struct rtl_opt_pass pass_expand;
 extern struct rtl_opt_pass pass_instantiate_virtual_regs;
+extern struct rtl_opt_pass pass_ee;
 extern struct rtl_opt_pass pass_rtl_fwprop;
 extern struct rtl_opt_pass pass_rtl_fwprop_addr;
 extern struct rtl_opt_pass pass_jump;
Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
===================================================================
--- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
+++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
@@ -5,19 +5,19 @@ 
 /* { dg-final { scan-assembler "\tbnel\t" } } */
 /* { dg-final { scan-assembler-not "\tbne\t" } } */
 
-NOMIPS16 int
-f (int n, int i)
+NOMIPS16 long int
+f (long int n, long int i)
 {
-  int s = 0;
+  long int s = 0;
   for (; i & 1; i++)
     s += i;
   return s;
 }
 
-NOMIPS16 int
-g (int n, int i)
+NOMIPS16 long int
+g (long int n, long int i)
 {
-  int s = 0;
+  long int s = 0;
   for (i = 0; i < n; i++)
     s += i;
   return s;
Index: gcc/testsuite/gcc.dg/extend-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+unsigned char f(unsigned int a, int c)
+{
+  unsigned int b = a;
+  if (c)
+    b = a & 0x10ff;
+  return b;
+}
+
+/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-1.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+void f(unsigned char * p, short s, int c, int *z)
+{
+  if (c)
+    *z = 0;
+  *p ^= (unsigned char)s;
+}
+
+/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
Index: gcc/testsuite/gcc.dg/extend-5.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+void f (short d[2][2])
+{
+  int d0 = d[0][0] + d[0][1];
+  int d1 = d[1][0] + d[1][1];
+  d[0][0] = d0 + d1;
+      d[0][1] = d0 - d1;
+}
+
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
Index: gcc/testsuite/gcc.dg/extend-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+/* { dg-require-effective-target ilp32 } */
+
+void f(unsigned char * p, short *s, int c)
+{
+  short or = 0;
+  while (c)
+    {
+      or = or | s[c];
+      c --;
+    }
+  *p = (unsigned char)or;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-2-64.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
+/* { dg-require-effective-target mips64 } */
+
+void f(unsigned char * p, short *s, int c)
+{
+  short or = 0;
+  while (c)
+    {
+      or = or | s[c];
+      c --;
+    }
+  *p = (unsigned char)or;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
+/* { dg-require-effective-target mips64 } */
+
+unsigned int f(unsigned char byte)
+{
+  return byte << 25;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/opts.c
===================================================================
--- gcc/opts.c (revision 189409)
+++ gcc/opts.c (working copy)
@@ -490,6 +490,7 @@  static const struct default_options defa
     { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def (revision 189409)
+++ gcc/timevar.def (working copy)
@@ -201,6 +201,7 @@  DEFTIMEVAR (TV_POST_EXPAND	     , "post 
 DEFTIMEVAR (TV_VARCONST              , "varconst")
 DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
 DEFTIMEVAR (TV_JUMP                  , "jump")
+DEFTIMEVAR (TV_EE                    , "extension elimination")
 DEFTIMEVAR (TV_FWPROP                , "forward prop")
 DEFTIMEVAR (TV_CSE                   , "CSE")
 DEFTIMEVAR (TV_DCE                   , "dead code elimination")
Index: gcc/ee.c
===================================================================
--- /dev/null (new file)
+++ gcc/ee.c (revision 0)
@@ -0,0 +1,1190 @@ 
+/* Redundant extension elimination.
+   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
+   Contributed by Tom de Vries (tom@codesourcery.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/*
+
+  MOTIVATING EXAMPLE
+
+  The motivating example for this pass is the example from PR 40893:
+
+    void f (short d[2][2])
+    {
+      int d0 = d[0][0] + d[0][1];
+      int d1 = d[1][0] + d[1][1];
+      d[0][0] = d0 + d1;
+      d[0][1] = d0 - d1;
+    }
+
+  For MIPS, compilation results in the following insns.
+
+    (set (reg:SI 204)
+         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
+
+    (set (reg:SI 205)
+         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
+
+    (set (reg:SI 217)
+         (plus:SI (reg:SI 205)
+                  (reg:SI 204)))
+
+    (set (reg:SI 218)
+         (minus:SI (reg:SI 204)
+                   (reg:SI 205)))
+
+    (set (mem:HI (reg/v/f:SI 210))
+         (subreg:HI (reg:SI 217) 2))
+
+    (set (mem:HI (plus:SI (reg/v/f:SI 210)
+                 (const_int 2 [0x2])))
+         (subreg:HI (reg:SI 218) 2))
+
+
+  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
+  are the only uses.  And the plus and minus operators belong to the class of
+  operators where a bit in the result is only influenced by same-or-less
+  significant bitss in the operands, so the plus and minus insns only use the
+  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
+  204 and 205, so the zero_extends are redundant.
+
+
+  INTENDED EFFECT
+
+  This pass works by removing sign/zero-extensions, or replacing them with
+  regcopies.  The idea there is that the regcopy might be eliminated by a later
+  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
+  than the extension.
+
+
+  IMPLEMENTATION
+
+  The pass scans at most two times over all instructions.
+
+  The first scan collects all extensions.  If there are no extensions, we're
+  done.
+
+  The second scan registers all uses of a reg in the biggest_use array.
+  Additionally, it registers how the use size of a pseudo is propagated to the
+  operands of the insns defining the pseudo.
+
+  The biggest_use array now contains the size in bits of the biggest use
+  of each reg, which allows us to find redundant extensions.
+
+  If there are still non-redundant extensions left, we use the propagation
+  information in an iterative fashion to improve the biggest_use array, after
+  which we may find more redundant extensions.
+
+  Finally, redundant extensions are deleted or replaced.
+
+  In case that the src and dest reg of the replacement are not of the same size,
+  we do not replace with a normal regcopy, but with a truncate or with the copy
+  of a paradoxical subreg instead.
+
+
+  ILLUSTRATION OF PASS
+
+  The dump of the pass shows us how the pass works on the motivating example.
+
+  We find the 2 extensions:
+    found extension with preserved size 16 defining reg 204
+    found extension with preserved size 16 defining reg 205
+
+  We calculate the biggests uses of a register:
+    biggest_use
+    reg 204: size 32
+    reg 205: size 32
+    reg 217: size 16
+    reg 218: size 16
+
+  We propagate the biggest uses where possible:
+    propagations
+    205: 32 -> 16
+    204: 32 -> 16
+    214: 32 -> 16
+    215: 32 -> 16
+
+  We conclude that the extensions are redundant:
+    found redundant extension with preserved size 16 defining reg 205
+    found redundant extension with preserved size 16 defining reg 204
+
+  And we replace them with regcopies:
+    (set (reg:SI 204)
+        (reg:SI 213))
+
+    (set (reg:SI 205)
+        (reg:SI 216))
+
+
+  LIMITATIONS
+
+  The scope of the analysis is limited to an extension and its uses.  The other
+  type of analysis (related to the defs of the operand of an extension) is not
+  done.
+
+  Furthermore, we do the analysis of biggest use per reg.  So when determining
+  whether an extension is redundant, we take all uses of a dest reg into
+  account, also the ones that are not uses of the extension.
+  The consideration is that using use-def chains will give a more precise
+  analysis, but is much more expensive in terms of runtime.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "tm_p.h"
+#include "flags.h"
+#include "regs.h"
+#include "hard-reg-set.h"
+#include "basic-block.h"
+#include "insn-config.h"
+#include "function.h"
+#include "expr.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "toplev.h"
+#include "target.h"
+#include "timevar.h"
+#include "optabs.h"
+#include "insn-codes.h"
+#include "rtlhooks-def.h"
+#include "output.h"
+#include "params.h"
+#include "timevar.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "vec.h"
+
+#define SKIP_REG (-1)
+#define NONE (-1)
+
+/* Number of registers at start of pass.  */
+
+static int n_regs;
+
+/* Array to register the biggest use of a reg, in bits.  */
+
+static int *biggest_use;
+
+/* Array to register the promoted subregs.  */
+
+static VEC (rtx,heap) **promoted_subreg;
+
+/* Array to register for a reg what the last propagated size is.  */
+
+static int *propagated_size;
+
+typedef struct use
+{
+  int regno;
+  int size;
+  int offset;
+  rtx *use;
+} use_type;
+
+DEF_VEC_O(use_type);
+DEF_VEC_ALLOC_O(use_type,heap);
+
+/* Vector to register the uses.  */
+
+static VEC (use_type,heap) **uses;
+
+typedef struct prop
+{
+  rtx set;
+  int uses_regno;
+  int uses_index;
+} prop_type;
+
+DEF_VEC_O(prop_type);
+DEF_VEC_ALLOC_O(prop_type,heap);
+
+/* Vector to register the propagations.  */
+
+static VEC (prop_type,heap) **props;
+
+/* Work list for propragation.  */
+
+static VEC (int,heap) *wl;
+
+/* Array to register what regs are in the work list.  */
+
+static bool *in_wl;
+
+/* Vector that contains the extensions in the function.  */
+
+static VEC (rtx,heap) *extensions;
+
+/* Vector that contains the extensions in the function that are going to be
+   removed or replaced.  */
+
+static VEC (rtx,heap) *redundant_extensions;
+
+/* Forward declaration.  */
+
+static void note_use (rtx *x, void *data);
+static bool skip_reg_p (int regno);
+static void register_prop (rtx set, use_type *use);
+
+/* Check whether SUBREG is a promoted subreg.  */
+
+static bool
+promoted_subreg_p (rtx subreg)
+{
+  return (GET_CODE (subreg) == SUBREG
+	  && SUBREG_PROMOTED_VAR_P (subreg));
+}
+
+/* Check whether SUBREG is a promoted subreg for which we cannot reset the
+   promotion.  */
+
+static bool
+fixed_promoted_subreg_p (rtx subreg)
+{
+  int mre;
+
+  if (!promoted_subreg_p (subreg))
+    return false;
+
+  mre = targetm.mode_rep_extended (GET_MODE (subreg),
+				   GET_MODE (SUBREG_REG (subreg)));
+  return mre != UNKNOWN;
+}
+
+/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
+   OFFSET.  Return true if successful.  */
+
+static bool
+reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
+{
+  rtx reg;
+
+  if (REG_P (use))
+    {
+      *regno = REGNO (use);
+      *offset = 0;
+      *size = GET_MODE_BITSIZE (GET_MODE (use));
+      return true;
+    }
+  else if (GET_CODE (use) == SUBREG)
+    {
+      reg = SUBREG_REG (use);
+
+      if (!REG_P (reg))
+	return false;
+
+      *regno = REGNO (reg);
+
+      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
+	{
+	  *offset = 0;
+	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
+	}
+      else
+	{
+	  *offset = subreg_lsb (use);
+	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
+	}
+
+      return true;
+    }
+
+  return false;
+}
+
+/* Create a new empty entry in the uses[REGNO] vector.  */
+
+static use_type *
+new_use (unsigned int regno)
+{
+  if (uses[regno] == NULL)
+    uses[regno] = VEC_alloc (use_type, heap, 4);
+
+  VEC_safe_push (use_type, heap, uses[regno], NULL);
+
+  return VEC_last (use_type, uses[regno]);
+}
+
+/* Register a USE of reg REGNO with SIZE and OFFSET.  */
+
+static use_type *
+register_use (int size, unsigned int regno, int offset, rtx *use)
+{
+  int *current;
+  use_type *p;
+
+  gcc_assert (size >= 0);
+  gcc_assert (regno < (unsigned int)n_regs);
+
+  if (skip_reg_p (regno))
+    return NULL;
+
+  p = new_use (regno);
+  p->regno = regno;
+  p->size = size;
+  p->offset = offset;
+  p->use = use;
+
+  /* Update the bigest use.  */
+  current = &biggest_use[regno];
+  *current = MAX (*current, size);
+
+  return p;
+}
+
+/* Handle embedded uses in USE, which is a part of PATTERN.  */
+
+static void
+note_embedded_uses (rtx use, rtx pattern)
+{
+  const char *format_ptr;
+  int i, j;
+
+  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
+  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
+    if (format_ptr[i] == 'e')
+      note_use (&XEXP (use, i), pattern);
+    else if (format_ptr[i] == 'E')
+      for (j = 0; j < XVECLEN (use, i); j++)
+	note_use (&XVECEXP (use, i, j), pattern);
+}
+
+/* Get the set in PATTERN that has USE as its src operand.  */
+
+static rtx
+get_set (rtx use, rtx pattern)
+{
+  rtx sub;
+  int i;
+
+  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
+    return pattern;
+
+  if (GET_CODE (pattern) == PARALLEL)
+    for (i = 0; i < XVECLEN (pattern, 0); ++i)
+      {
+	sub = XVECEXP (pattern, 0, i);
+	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
+	  return sub;
+      }
+
+  return NULL_RTX;
+}
+
+/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
+   a part of PATTERN.  In this context restricted means that a bit in
+   an operand influences only the same bit or more significant bits in the
+   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
+
+static void
+note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
+{
+  unsigned int i, smallest;
+  int operand_size[2];
+  int operand_offset[2];
+  int used_size;
+  unsigned int operand_regno[2];
+  bool operand_reg[2];
+  bool operand_ignore[2];
+  use_type *p;
+
+  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
+  for (i = 0; i < nr_operands; ++i)
+    {
+      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
+				  &operand_regno[i], &operand_offset[i]);
+      operand_ignore[i] = false;
+    }
+
+  /* Handle case of reg and-masked with const.  */
+  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
+    {
+      used_size =
+	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
+      operand_size[0] = MIN (operand_size[0], used_size);
+    }
+
+  /* Handle case of reg or-masked with const.  */
+  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
+    {
+      used_size =
+	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
+      operand_size[0] = MIN (operand_size[0], used_size);
+    }
+
+  /* Ignore the use of a in 'a = a + b'.  */
+  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
+  if (set != NULL_RTX && REG_P (SET_DEST (set)))
+    for (i = 0; i < nr_operands; ++i)
+      operand_ignore[i] = (operand_reg[i]
+			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
+
+  /* Handle the case a reg is combined with don't care bits.  */
+  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
+      && operand_size[0] != operand_size[1])
+    {
+      smallest = operand_size[0] > operand_size[1];
+
+      if (paradoxical_subreg_p (XEXP (use, smallest)))
+	operand_size[1 - smallest] = operand_size[smallest];
+    }
+
+  /* Register the operand use, if necessary.  */
+  for (i = 0; i < nr_operands; ++i)
+    if (!operand_reg[i])
+      note_use (&XEXP (use, i), pattern);
+    else if (!operand_ignore[i])
+      {
+	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
+			  &XEXP (use, i));
+	register_prop (set, p);
+      }
+}
+
+/* Register promoted SUBREG in promoted_subreg.  */
+
+static void
+register_promoted_subreg (rtx subreg)
+{
+  int index = REGNO (SUBREG_REG (subreg));
+
+  if (promoted_subreg[index] == NULL)
+    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
+
+  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
+}
+
+/* Note promoted subregs in X.  */
+
+static int
+note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
+{
+  rtx subreg = *x;
+
+  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
+      && REG_P (SUBREG_REG (subreg)))
+    register_promoted_subreg (subreg);
+
+  return 0;
+}
+
+/* Handle use X in pattern DATA noted by note_uses.  */
+
+static void
+note_use (rtx *x, void *data)
+{
+  rtx use = *x;
+  rtx pattern = (rtx)data;
+  int use_size, use_offset;
+  unsigned int use_regno;
+  rtx set;
+  use_type *p;
+
+  for_each_rtx (x, note_promoted_subreg, NULL);
+
+  set = get_set (use, pattern);
+
+  switch (GET_CODE (use))
+    {
+    case REG:
+    case SUBREG:
+      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
+	{
+	  note_embedded_uses (use, pattern);
+	  return;
+	}
+      p = register_use (use_size, use_regno, use_offset, x);
+      register_prop (set, p);
+      return;
+    case SIGN_EXTEND:
+    case ZERO_EXTEND:
+      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
+	{
+	  note_embedded_uses (use, pattern);
+	  return;
+	}
+      p = register_use (use_size, use_regno, use_offset, x);
+      register_prop (set, p);
+      return;
+    case IOR:
+    case AND:
+    case XOR:
+    case PLUS:
+    case MINUS:
+      note_restricted_op_use (set, use, 2, pattern);
+      return;
+    case NOT:
+    case NEG:
+      note_restricted_op_use (set, use, 1, pattern);
+      return;
+    case ASHIFT:
+      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
+	  || !CONST_INT_P (XEXP (use, 1))
+	  || INTVAL (XEXP (use, 1)) <= 0
+	  || paradoxical_subreg_p (XEXP (use, 0)))
+	{
+	  note_embedded_uses (use, pattern);
+	  return;
+	}
+      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
+			  use_offset, x);
+      return;
+    default:
+      note_embedded_uses (use, pattern);
+      return;
+    }
+}
+
+/* Check whether reg REGNO is implicitly used.  */
+
+static bool
+implicit_use_p (int regno ATTRIBUTE_UNUSED)
+{
+#ifdef EPILOGUE_USES
+  if (EPILOGUE_USES (regno))
+    return true;
+#endif
+
+#ifdef EH_USES
+  if (EH_USES (regno))
+    return true;
+#endif
+
+  return false;
+}
+
+/* Check whether reg REGNO should be skipped in analysis.  */
+
+static bool
+skip_reg_p (int regno)
+{
+  /* TODO: handle hard registers.  The problem with hard registers is that
+     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
+     We don't handle that properly.  */
+  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
+}
+
+/* Note the uses of argument registers in call INSN.  */
+
+static void
+note_call_uses (rtx insn)
+{
+  rtx link, link_expr;
+
+  if (!CALL_P (insn))
+    return;
+
+  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
+    {
+      link_expr = XEXP (link, 0);
+
+      if (GET_CODE (link_expr) == USE)
+	note_use (&XEXP (link_expr, 0), link);
+    }
+}
+
+/* Dump the biggest uses found.  */
+
+static void
+dump_biggest_use (void)
+{
+  int i;
+
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "biggest_use:\n");
+
+  for (i = 0; i < n_regs; i++)
+    if (biggest_use[i] > 0)
+      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
+
+  fprintf (dump_file, "\n");
+}
+
+/* Calculate the biggest use mode for all regs.  */
+
+static void
+calculate_biggest_use (void)
+{
+  basic_block bb;
+  rtx insn;
+
+  /* For all insns, call note_use for each use in insn.  */
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      {
+	if (!NONDEBUG_INSN_P (insn))
+	  continue;
+
+	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
+
+	if (CALL_P (insn))
+	  note_call_uses (insn);
+      }
+
+  dump_biggest_use ();
+}
+
+/* Register a propagation USE in SET in the props vector.  */
+
+static void
+register_prop (rtx set, use_type *use)
+{
+  prop_type *p;
+  int regno;
+
+  if (set == NULL_RTX || use == NULL)
+    return;
+
+  if (!REG_P (SET_DEST (set)))
+    return;
+
+  regno = REGNO (SET_DEST (set));
+
+  if (skip_reg_p (regno))
+    return;
+
+  if (props[regno] == NULL)
+    props[regno] = VEC_alloc (prop_type, heap, 4);
+
+  VEC_safe_push (prop_type, heap, props[regno], NULL);
+  p = VEC_last (prop_type, props[regno]);
+  p->set = set;
+  p->uses_regno = use->regno;
+  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
+}
+
+/* Add REGNO to the worklist.  */
+
+static void
+add_to_wl (int regno)
+{
+  if (in_wl[regno])
+    return;
+
+  if (biggest_use[regno] > 0
+      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
+    return;
+
+  if (VEC_empty (prop_type, props[regno]))
+    return;
+
+  if (propagated_size[regno] != NONE
+      && propagated_size[regno] == biggest_use[regno])
+    return;
+
+  VEC_safe_push (int, heap, wl, regno);
+  in_wl[regno] = true;
+}
+
+/* Pop a reg from the worklist and return it.  */
+
+static int
+pop_wl (void)
+{
+  int regno = VEC_pop (int, wl);
+  in_wl[regno] = false;
+  return regno;
+}
+
+/* Propagate the use size DEST_SIZE of a reg to use P.  */
+
+static int
+propagate_size (int dest_size, use_type *p)
+{
+  if (dest_size == 0)
+    return 0;
+
+  return p->offset + MIN (p->size - p->offset, dest_size);
+}
+
+/* Get the biggest use of REGNO from the uses vector.  */
+
+static int
+get_biggest_use (unsigned int regno)
+{
+  int ix;
+  use_type *p;
+  int max = 0;
+
+  gcc_assert (uses[regno] != NULL);
+
+  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
+    max = MAX (max, p->size);
+
+  return max;
+}
+
+/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
+
+static void
+propagate_to_use (int dest_size, use_type *use)
+{
+  int new_use_size;
+  int prev_biggest_use;
+  int *current;
+
+  new_use_size = propagate_size (dest_size, use);
+
+  if (new_use_size >= use->size)
+    return;
+
+  use->size = new_use_size;
+
+  current = &biggest_use[use->regno];
+
+  prev_biggest_use = *current;
+  *current = get_biggest_use (use->regno);
+
+  if (*current >= prev_biggest_use)
+    return;
+
+  add_to_wl (use->regno);
+
+  if (dump_file)
+    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
+	     *current);
+
+}
+
+/* Propagate the biggest use of a reg REGNO to all its uses, and note
+   propagations in NR_PROPAGATIONS.  */
+
+static void
+propagate_to_uses (int regno, int *nr_propagations)
+{
+  int ix;
+  prop_type *p;
+
+  gcc_assert (!(propagated_size[regno] == NONE
+		&& propagated_size[regno] == biggest_use[regno]));
+
+  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
+    {
+      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
+      propagate_to_use (biggest_use[regno], use);
+      ++(*nr_propagations);
+    }
+
+  propagated_size[regno] = biggest_use[regno];
+}
+
+/* Improve biggest_use array iteratively.  */
+
+static void
+propagate (void)
+{
+  int i;
+  int nr_propagations = 0;
+
+  /* Initialize work list.  */
+
+  for (i = 0; i < n_regs; ++i)
+    add_to_wl (i);
+
+  /* Work the work list.  */
+
+  if (dump_file)
+    fprintf (dump_file, "propagations: \n");
+  while (!VEC_empty (int, wl))
+    propagate_to_uses (pop_wl (), &nr_propagations);
+
+  if (dump_file)
+    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
+}
+
+/* Check whether this is a sign/zero extension.  */
+
+static bool
+extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
+{
+  rtx src, op0;
+
+  /* Detect set of reg.  */
+  if (GET_CODE (PATTERN (insn)) != SET)
+    return false;
+
+  src = SET_SRC (PATTERN (insn));
+  *dest = SET_DEST (PATTERN (insn));
+
+  if (!REG_P (*dest))
+    return false;
+
+  /* Detect sign or zero extension.  */
+  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
+      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
+    {
+      op0 = XEXP (src, 0);
+
+      /* Determine amount of least significant bits preserved by operation.  */
+      if (GET_CODE (src) == AND)
+	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
+      else
+	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
+
+      if (GET_CODE (op0) == SUBREG)
+	{
+	  if (subreg_lsb (op0) != 0)
+	    return false;
+
+	  *inner = SUBREG_REG (op0);
+
+	  if (GET_MODE_CLASS (GET_MODE (*dest))
+	      != GET_MODE_CLASS (GET_MODE (*inner)))
+	    return false;
+
+	  return true;
+	}
+      else if (REG_P (op0))
+	{
+	  *inner = op0;
+
+	  if (GET_MODE_CLASS (GET_MODE (*dest))
+	      != GET_MODE_CLASS (GET_MODE (*inner)))
+	    return false;
+
+	  return true;
+	}
+      else if (GET_CODE (op0) == TRUNCATE)
+	{
+	  *inner = XEXP (op0, 0);
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* Find extensions and store them in the extensions vector.  */
+
+static bool
+find_extensions (void)
+{
+  basic_block bb;
+  rtx insn, dest, inner;
+  int preserved_size;
+
+  /* For all insns, call note_use for each use in insn.  */
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      {
+	if (!NONDEBUG_INSN_P (insn))
+	  continue;
+
+	if (!extension_p (insn, &dest, &inner, &preserved_size))
+	  continue;
+
+	VEC_safe_push (rtx, heap, extensions, insn);
+
+	if (dump_file)
+	  fprintf (dump_file,
+		   "found extension %u with preserved size %d defining"
+		   " reg %d\n",
+		   INSN_UID (insn), preserved_size, REGNO (dest));
+      }
+
+  if (dump_file)
+    {
+      if (!VEC_empty (rtx, extensions))
+	fprintf (dump_file, "\n");
+      else
+	fprintf (dump_file, "no extensions found.\n");
+    }
+
+  return !VEC_empty (rtx, extensions);
+}
+
+/* Check whether this is a redundant sign/zero extension.  */
+
+static bool
+redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
+{
+  int biggest_dest_use;
+
+  if (!extension_p (insn, dest, inner, preserved_size))
+    gcc_unreachable ();
+
+  biggest_dest_use = biggest_use[REGNO (*dest)];
+
+  if (biggest_dest_use == SKIP_REG)
+    return false;
+
+  if (*preserved_size < biggest_dest_use)
+    return false;
+
+  return true;
+}
+
+/* Find the redundant extensions in the extensions vector and move them to the
+   redundant_extensions vector.  */
+
+static void
+find_redundant_extensions (void)
+{
+  rtx insn, dest, inner;
+  int ix;
+  bool found = false;
+  int preserved_size;
+
+  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
+    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
+      {
+	VEC_safe_push (rtx, heap, redundant_extensions, insn);
+	VEC_unordered_remove (rtx, extensions, ix);
+
+	if (dump_file)
+	  fprintf (dump_file,
+		   "found redundant extension %u with preserved size %d"
+		   " defining reg %d\n",
+		   INSN_UID (insn), preserved_size, REGNO (dest));
+	found = true;
+      }
+
+  if (dump_file && found)
+    fprintf (dump_file, "\n");
+}
+
+/* Reset promotion of subregs or REG.  */
+
+static void
+reset_promoted_subreg (rtx reg)
+{
+  int ix;
+  rtx subreg;
+
+  if (promoted_subreg[REGNO (reg)] == NULL)
+    return;
+
+  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
+    {
+      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
+      SUBREG_PROMOTED_VAR_P (subreg) = 0;
+    }
+
+  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
+}
+
+/* Try to remove or replace the redundant extension INSN which extends INNER and
+   writes to DEST.  */
+
+static void
+try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
+{
+  rtx cp_src, cp_dest, seq = NULL_RTX, one;
+
+  /* Check whether replacement is needed.  */
+  if (dest != inner)
+    {
+      start_sequence ();
+
+      /* Determine the proper replacement operation.  */
+      if (GET_MODE (dest) == GET_MODE (inner))
+	{
+	  cp_src = inner;
+	  cp_dest = dest;
+	}
+      else if (GET_MODE_SIZE (GET_MODE (dest))
+	       > GET_MODE_SIZE (GET_MODE (inner)))
+	{
+	  emit_clobber (dest);
+	  cp_src = inner;
+	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
+	}
+      else
+	{
+	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
+	  cp_dest = dest;
+	}
+
+      emit_move_insn (cp_dest, cp_src);
+
+      seq = get_insns ();
+      end_sequence ();
+
+      /* If the replacement is not supported, bail out.  */
+      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
+	if (recog_memoized (on