Patchwork [RFC] Expose target addressing modes before expand

login
register
mail settings
Submitter William J. Schmidt
Date Feb. 4, 2011, 10:02 p.m.
Message ID <1296856964.3135.25.camel@gnopaine>
Download mbox | patch
Permalink /patch/81958/
State New
Headers show

Comments

William J. Schmidt - Feb. 4, 2011, 10:02 p.m.
Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
regression for powerpc targets.  The root of the problem is changes in
address canonicalization a couple of years back.  Discussion in the PR
suggests some form of exposure of target addressing modes prior to the
expand phase.  Steven Bosscher (CCed) suggested this be done in a late
GIMPLE pass, using logic similar to what IVopts does for loops.

Following is an early prototype of such a pass.  I've verified that it
restores the regressed code to its earlier, more compact form, and most
of the addressing sequences on powerpc appear to be better or the same
as before.  There are at least some complex expressions where early
exposure produces worse code in the form of extra multiplies; I have
some ideas for how to tune that down so the shift-add machinery for
multiplies isn't derailed.  There are still 16 test cases that regress
on rev 169834 for powerpc, and I haven't examined other targets yet.
Clearly this is not yet ready, but before investing time into further
refinement, I'd like to get commentary on the general approach, as well
as any specific concerns.  Please bear in mind that I am new to GCC, so
I'm still familiarizing myself with the infrastructure and am likely to
miss some things that are obvious to the initiated...

The general approach is to rely on existing machinery used by
tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
induction variable optimizations.  The new logic is pretty simple, but
the interactions with later phases may not be.  For now, I've placed the
new pass in passes.c as the last -O1 tree-ssa pass.

In addition to general comments, I'd be interested in answers to some
specific questions:

1) Are there concerns with early exposure of target addressing nodes on
other targets?  Should the pass be gated to only operate on a subset of
targets?
2) Is the pass placement appropriate?
3) Have I missed any necessary fix-ups after modifying the GIMPLE code?

Thanks very much for your insights!

Bill


[gcc]
2011-02-04  William Schmidt  <wschmidt@linux.vnet.ibm.com>

        * tree.h (copy_ref_info): New prototype of existing function.
        * tree-pass.h (pass_tree_addr_mode): New declaration.
        * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
        Remove static keyword.
        * timevar.def (TV_TREE_ADDR_MODE): New timevar definition.
        * tree-ssa-address.c (addr_to_parts): Change to handle null
        iv_cand parameter.
        * tree-ssa-addr-mode.c:  New.
        * tree-flow.h (may_be_unaligned_p): New prototype of existing
        function.
        * Makefile.in: Add build step for tree-ssa-addr-mode.o.
        * passes.c: Add pass for pass_tree_addr_mode.
Steven Bosscher - Feb. 7, 2011, 12:18 p.m.
Hello Bill,

> Following is an early prototype of such a pass.

Looks quite interesting. It seems to me that this is not the right
time for GCC 4.6, but it looks like a step in the right direction for
GCC 4.7.


> 1) Are there concerns with early exposure of target addressing nodes on
> other targets?  Should the pass be gated to only operate on a subset of
> targets?

In general, it would be best if the pass Just Works (tm) for all
targets, just like IVopts does.


> 2) Is the pass placement appropriate?

It depends on how the combination of statements ends up producing dead
code. This pass should be "as late as possible" in the pipeline but it
should not leave a lot of dead code around.


>        * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
>        Remove static keyword.

Maybe those functions belong in tree.c or tree-dfa.c now?


>        * tree-ssa-address.c (addr_to_parts): Change to handle null
>        iv_cand parameter.

You should add a comment about this, expaining why that case should be handled.


> +extern void copy_ref_info (tree, tree);
> +

No need for this extern declaration.

> +static bool
> +tree_contains_mem_ref_p (tree base)

There must always be a comment before each function, explaining the
purpose, inputs and outputs.


> +/* Expose target addressing modes in EXPR.  Logic extracted and
> +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
> +   opportunities. */
> +
> +static bool
> +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
> +{
> +  tree base = *expr, mem_ref;
> +  struct pointer_map_t *cache;
> +  aff_tree aff;
> +
> +  if (REFERENCE_CLASS_P (*expr))
> +    {
> +      /* If the expression is a virtual method table reference, don't attempt
> +        this.  Incorrect code generation occurs; an example test case is
> +        testsuite/g++.dg/opt/pr47366.C. */
> +      if (TREE_CODE (base) == COMPONENT_REF
> +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
> +       return false;

Can you explain how this leads to wrong code?


> +
> +      base = unshare_expr (base);

You may want to postpone unsharing until you know you can make a
useful transformation, otherwise you may create unnecessary garbage.


> +
> +      /*
> +      if (TREE_CODE (base) == TARGET_MEM_REF)
> +       {
> +         tree type = build_pointer_type (TREE_TYPE (base));
> +         base = tree_mem_ref_addr (type, base);
> +       }
> +      */
> +      /* If the addressing mode is already exposed, let it be.
> +         #### I am not certain this doesn't miss out on some
> +         opportunities, so just in case I'm leaving the above
> +         code commented out in place for now.  But I think all this
> +         does is make things more explicit that CSE can catch
> +         anyhow, and in some cases it produces expressions that
> +         the may-aliasing infrastructure can't handle. */
> +      if (tree_contains_mem_ref_p (base))
> +       return false;

I think you should not touch already existing TARGET_MEM_REFs, because
we have to assume some other pass (IVopts) already tried hard to find
a good addressing mode. If that TARGET_MEM_REF is not the best choice,
then this should be fixed elsewhere.

Even if there is a MEM_REF, I think you want to expose TARGET_MEM_REF.
If you are going to expose memory addressing modes before expand for
all cases, you may as well do so for all memory references so that
expand only has to deal with TARGET_MEM_REFs.


> +      else
> +       {
> +         /* Check that the base expression is addressable. */
> +         if (may_be_nonaddressable_p (base))
> +           return false;

When can the address of a MEM be non-addressable??
Note, two spaces before the "*/".


> +         /* On strict alignment platforms, check that the base expression
> +            is sufficiently aligned.  There is no additional "step"
> +             information required, so specify single-byte alignment
> +             for that parameter.  */
> +         if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
> +           return false;

This case should somehow be expressed in a TARGET_MEM_REF also, if
"expand" can handle it. So I would try to rewrite this kind of memory
reference to a TARGET_MEM_REF too.


> +      /* Ensure the memory reference was not built with a base expression
> +        of zero, which confuses the may-aliasing in subsequent dead
> +        store elimination.  Assume this isn't a useful replacement when
> +        this occurs. */
> +      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
> +       return false;

Same here: Any replacement in this pass would be a useful replacement.


> +  /* Bitfields are being ignored for now, as they are in induction
> +     variable optimization. */
> +  if (subcode == BIT_FIELD_REF)
> +    return;

Right. BIT_FIELD_REFs are a very special case that GCC has trouble
enough with even without TARGET_MEM_REFs. But a nice TODO or FIXME
here would be OK :-)


> +  lhs = gimple_assign_lhs_ptr (stmt);
> +  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
> +
> +  rhs1 = gimple_assign_rhs1_ptr (stmt);
> +  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
> +
> +  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
> +    {
> +      rhs2 = gimple_assign_rhs2_ptr (stmt);
> +      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
> +    }
> +
> +  if (rhs_class == GIMPLE_TERNARY_RHS)
> +    {
> +      rhs3 = gimple_assign_rhs3_ptr (stmt);
> +      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
> +    }

Perhaps iterate over gimple_num_ops instead?


> +static unsigned int
> +tree_ssa_addr_mode (void)
> +{
> +  basic_block bb;
> +  bool chicken_switch = false;
> +
> +  /* #### Temporary debug aid; set to true to disable this phase in debugger. */
> +  if (chicken_switch)
> +    return 0;

Are you familiar with debug counters (dbgcnt.*)?


> +  FOR_EACH_BB (bb)
> +    tree_ssa_addr_mode_block (bb);

Perhaps ignore basic blocks that are part of a loop? Otherwise perhaps
you undo some of the choices made by IVopts. I don't know how IVopts
handles memory references that do not depend on an IV, though, so
maybe you have to look at loop blocks also. On the other hand, if a
memory reference in a loop does not depend on an IV, it's probably
already hoisted out of the loop...  Another knob to play with if/when
you're going to tune this pass :-)


> +      /* Note: pass_tree_addr_mode should be as close to pass_expand as
> +        possible.  Particularly must be after DCE since DCE has
> +         trouble seeing through some of the generated mem_refs. */
> +      NEXT_PASS (pass_tree_addr_mode);

...generated TARGET_MEM_REFs.  */
Note, two spaces before the "*/" :-)

Is it still true that DCE has problems with TARGET_MEM_REFs? And even
if this is so: Is it a problem here? As far as I understand, the pass
combines outputs of different non-memory statements that together
compute an address. If outputs of some statements are combined, those
statements may become dead code. But if there is no dead code before
the new pass_tree_addr_mode, the only dead code afterwards should be
writes to unused SSA names. Perhaps under those conditions, DCE will
just work??

Hoping this will make it into GCC 4.7 in one form or another,

Ciao!
Steven
Richard Guenther - Feb. 7, 2011, 1:21 p.m.
On Fri, 4 Feb 2011, William J. Schmidt wrote:

> Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
> regression for powerpc targets.  The root of the problem is changes in
> address canonicalization a couple of years back.  Discussion in the PR
> suggests some form of exposure of target addressing modes prior to the
> expand phase.  Steven Bosscher (CCed) suggested this be done in a late
> GIMPLE pass, using logic similar to what IVopts does for loops.
> 
> Following is an early prototype of such a pass.  I've verified that it
> restores the regressed code to its earlier, more compact form, and most
> of the addressing sequences on powerpc appear to be better or the same
> as before.  There are at least some complex expressions where early
> exposure produces worse code in the form of extra multiplies; I have
> some ideas for how to tune that down so the shift-add machinery for
> multiplies isn't derailed.  There are still 16 test cases that regress
> on rev 169834 for powerpc, and I haven't examined other targets yet.
> Clearly this is not yet ready, but before investing time into further
> refinement, I'd like to get commentary on the general approach, as well
> as any specific concerns.  Please bear in mind that I am new to GCC, so
> I'm still familiarizing myself with the infrastructure and am likely to
> miss some things that are obvious to the initiated...
> 
> The general approach is to rely on existing machinery used by
> tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
> induction variable optimizations.  The new logic is pretty simple, but
> the interactions with later phases may not be.  For now, I've placed the
> new pass in passes.c as the last -O1 tree-ssa pass.
> 
> In addition to general comments, I'd be interested in answers to some
> specific questions:
> 
> 1) Are there concerns with early exposure of target addressing nodes on
> other targets?  Should the pass be gated to only operate on a subset of
> targets?
> 2) Is the pass placement appropriate?
> 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
> 
> Thanks very much for your insights!

Comments inline.

> Bill
> 
> 
> [gcc]
> 2011-02-04  William Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
>         * tree.h (copy_ref_info): New prototype of existing function.
>         * tree-pass.h (pass_tree_addr_mode): New declaration.
>         * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
>         Remove static keyword.
>         * timevar.def (TV_TREE_ADDR_MODE): New timevar definition.
>         * tree-ssa-address.c (addr_to_parts): Change to handle null
>         iv_cand parameter.
>         * tree-ssa-addr-mode.c:  New.
>         * tree-flow.h (may_be_unaligned_p): New prototype of existing
>         function.
>         * Makefile.in: Add build step for tree-ssa-addr-mode.o.
>         * passes.c: Add pass for pass_tree_addr_mode.
> 
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  (revision 169834)
> +++ gcc/tree.h  (working copy)
> @@ -5580,6 +5580,9 @@
>  extern tree tree_mem_ref_addr (tree, tree);
>  extern void copy_mem_ref_info (tree, tree);
>  
> +/* In tree-ssa-loop-ivopts.c */
> +extern void copy_ref_info (tree, tree);
> +
>  /* In tree-vrp.c */
>  extern bool ssa_name_nonnegative_p (const_tree);
>  
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h     (revision 169834)
> +++ gcc/tree-pass.h     (working copy)
> @@ -385,6 +385,7 @@
>  extern struct gimple_opt_pass pass_parallelize_loops;
>  extern struct gimple_opt_pass pass_loop_prefetch;
>  extern struct gimple_opt_pass pass_iv_optimize;
> +extern struct gimple_opt_pass pass_tree_addr_mode;
>  extern struct gimple_opt_pass pass_tree_loop_done;
>  extern struct gimple_opt_pass pass_ch;
>  extern struct gimple_opt_pass pass_ccp;
> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c  (revision 169834)
> +++ gcc/tree-ssa-loop-ivopts.c  (working copy)
> @@ -1604,7 +1604,7 @@
>  
>  /* Returns true if memory reference REF with step STEP may be unaligned.  */
>  
> -static bool
> +bool
>  may_be_unaligned_p (tree ref, tree step)
>  {
>    tree base;
> @@ -5916,7 +5916,7 @@
>  
>  /* Copies the reference information from OLD_REF to NEW_REF.  */
>  
> -static void
> +void
>  copy_ref_info (tree new_ref, tree old_ref)
>  {
>    tree new_ptr_base = NULL_TREE;
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def     (revision 169834)
> +++ gcc/timevar.def     (working copy)
> @@ -236,6 +236,7 @@
>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
> +DEFTIMEVAR (TV_TREE_ADDR_MODE        , "expose addressing modes")

I think we should name this pass lower-address or similar, it is
really doing target specific lowering of memory accesses.

>  /* Everything else in rest_of_compilation not included above.  */
>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
> Index: gcc/tree-ssa-address.c
> ===================================================================
> --- gcc/tree-ssa-address.c      (revision 169834)
> +++ gcc/tree-ssa-address.c      (working copy)
> @@ -631,7 +631,7 @@
>       is <= 2 -- in that case, no loop invariant code motion can be
>       exposed.  */
>  
> -  if (!base_hint && (addr->n > 2))
> +  if (!base_hint && (addr->n > 2) && iv_cand)
>      move_variant_to_index (parts, addr, iv_cand);
>  
>    /* First move the most expensive feasible multiplication
> Index: gcc/tree-ssa-addr-mode.c
> ===================================================================
> --- gcc/tree-ssa-addr-mode.c    (revision 0)
> +++ gcc/tree-ssa-addr-mode.c    (revision 0)
> @@ -0,0 +1,251 @@
> +/* Expose target addressing modes in gimple representation.
> +   Copyright (C) 2011
> +   Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* This pass performs a scan over the gimple representation to expose
> +   target machine addressing in reference-class expressions.  This is
> +   necessary for some targets for which the current address
> +   canonicalization is suboptimal.  Induction variable optimizations
> +   already expose target addressing for many (but not all) expressions
> +   in loops, so this has most benefit in non-loop code.  Algorithms
> +   used here are simplifications of those used in the induction
> +   variable optimizations. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "basic-block.h"
> +#include "tree-dump.h"
> +#include "tree-pass.h"
> +#include "timevar.h"
> +#include "cfgloop.h"
> +#include "gimple.h"
> +#include "tree-flow.h"
> +#include "tree-affine.h"
> +
> +extern void copy_ref_info (tree, tree);
> +
> +static bool
> +tree_contains_mem_ref_p (tree base)
> +{
> +  unsigned int i;
> +  enum tree_code code;
> +
> +  if (!base)
> +    return false;
> +
> +  code = TREE_CODE (base);
> +  if (code == TARGET_MEM_REF || code == MEM_REF)
> +    return true;
> +
> +  for (i = 0; i < TREE_CODE_LENGTH (code); i++)
> +    if (tree_contains_mem_ref_p (TREE_OPERAND (base, i)))
> +      return true;
> +
> +  return false;
> +}

I think you don't need this function, but see below.

> +/* Expose target addressing modes in EXPR.  Logic extracted and
> +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
> +   opportunities. */
> +
> +static bool
> +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
> +{
> +  tree base = *expr, mem_ref;
> +  struct pointer_map_t *cache;
> +  aff_tree aff;
> +
> +  if (REFERENCE_CLASS_P (*expr))

I personally prefer indentation like

     if (!REFERENCE_CLASS_P (*expr))
       return false;

> +    {
> +      /* If the expression is a virtual method table reference, don't attempt
> +        this.  Incorrect code generation occurs; an example test case is
> +        testsuite/g++.dg/opt/pr47366.C. */
> +      if (TREE_CODE (base) == COMPONENT_REF
> +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
> +       return false;

;)  It probably confuses Martins devirtualization code.

> +      base = unshare_expr (base);

Ick you don't need this I think.

> +      /*
> +      if (TREE_CODE (base) == TARGET_MEM_REF)
> +       {
> +         tree type = build_pointer_type (TREE_TYPE (base));
> +         base = tree_mem_ref_addr (type, base);
> +       }
> +      */
> +      /* If the addressing mode is already exposed, let it be.
> +         #### I am not certain this doesn't miss out on some 
> +         opportunities, so just in case I'm leaving the above
> +         code commented out in place for now.  But I think all this
> +         does is make things more explicit that CSE can catch
> +         anyhow, and in some cases it produces expressions that
> +         the may-aliasing infrastructure can't handle. */
> +      if (tree_contains_mem_ref_p (base))

TREE_CODE (base) == TARGET_MEM_REF || TREE_CODE (base) == MEM_REF

sub-operands can never be complex.  I _think_ you want to not
bail out for MEM_REFs, though.

> +       return false;
> +      else

no need for else {

> +       {
> +         /* Check that the base expression is addressable. */
> +         if (may_be_nonaddressable_p (base))
> +           return false;

Correct (the access can include a V_C_E or a bitfield-ref, in which
case we cannot compute the address of the base of the outermost
access -- the variable name "base" is probably misleading here).

What we want to do during memory access lowering is decompose the
access into a part that we can always take the address of,
like for a bitfield access a.b.c into a.b and .c, eventually
re-write a.b and re-attach .c.  I have plans to re-surrect a lowering
pass I developed some years ago for 4.7 (it's on the mem-ref branch).

> +         /* On strict alignment platforms, check that the base expression
> +            is sufficiently aligned.  There is no additional "step"
> +             information required, so specify single-byte alignment
> +             for that parameter.  */
> +         if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
> +           return false;
> +
> +         base = build_fold_addr_expr (base);

This is probably where you require unshare_expr ...

> +       }
> +
> +      /* Express the address of the original expression as an affine
> +         combination of trees, looking through SSA defs. */
> +      cache = pointer_map_create ();
> +      tree_to_aff_combination_expand (base, TREE_TYPE (base), &aff, &cache);
> +      pointer_map_destroy (cache);

... but here the result shouldn't require it anymore.  It's better
to make a wrapper/worker of tree_to_aff_combination_expand that
can handle non-addresses.  Probably generally useful (did you run
into actual problems w/o unshare_expr?).

> +
> +      /* It is possible for aff to have no elements, in which case do
> +        nothing.  Example: lhs = MEM[(struct foo *) 0B].bar does not
> +         produce an affine combination. */
> +      if (!aff.n)
> +       return false;
> +
> +      /* Replace the input expression with an equivalent memory reference
> +        legal on the target machine. */
> +      mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff, 
> +                                reference_alias_ptr_type (*expr),
> +                               NULL_TREE, NULL_TREE, speed);

Doing the above will un-CSE a lot of dependent computations.  I think
you want to limit the SSA walking of tree_to_aff_combination_expand
or invent some mechanism to re-use existing values from the IL.

That said - the above needs lots of work ;)  Like it misses a
cost function - we should try to re-use base or index registers
across memory accesses, like IVOPTs does in loops.  It probably
is enough to collect candidates on a per-BB basis for this.

> +      /* Ensure the memory reference was not built with a base expression
> +        of zero, which confuses the may-aliasing in subsequent dead
> +        store elimination.  Assume this isn't a useful replacement when
> +        this occurs. */
> +      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
> +       return false;

Huh, that would be a bug in create_mem_ref I suppose.  Or in alias 
analysis.

> +      /* Finish the replacement. */
> +      copy_ref_info (mem_ref, *expr);
> +      *expr = mem_ref;
> +      return true;
> +    }
> +  return false;
> +}
> +
> +/* Expose target addressing modes in STMT, which must be an assignment. */
> +
> +static void
> +tree_ssa_addr_mode_stmt (gimple stmt, gimple_stmt_iterator *gsi, bool speed)
> +{
> +  tree *lhs, *rhs1, *rhs2, *rhs3;
> +  enum tree_code subcode = gimple_assign_rhs_code (stmt);
> +  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (subcode);
> +  bool changed;
> +
> +  /* Bitfields are being ignored for now, as they are in induction
> +     variable optimization. */
> +  if (subcode == BIT_FIELD_REF)
> +    return;

You can have BIT_FIELD_REFs on the lhs as well.

Note that COMPONENT_REFs with a DECL_BIT_FIELD FIELD_DECL are also
bit-field accesses.

It's probably easier to move the checks into tree_ssa_addr_mode_tree
and operate on TREE_CODE of the reference.

> +  lhs = gimple_assign_lhs_ptr (stmt);
> +  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
> +
> +  rhs1 = gimple_assign_rhs1_ptr (stmt);
> +  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;

No need for all the code below - there are never multiple memory
accesses (you are interested in) in a single statement.

> +  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
> +    {
> +      rhs2 = gimple_assign_rhs2_ptr (stmt);
> +      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
> +    }
> +
> +  if (rhs_class == GIMPLE_TERNARY_RHS)
> +    {
> +      rhs3 = gimple_assign_rhs3_ptr (stmt);
> +      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
> +    }
> +
> +  if (changed)
> +    update_stmt (stmt);
> +}
> +
> +/* Process the statements in block BB. */
> +
> +static void
> +tree_ssa_addr_mode_block (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +
> +  bool speed = optimize_bb_for_speed_p (bb);
> +
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple stmt = gsi_stmt (gsi);
> +
> +      /* #### Avoiding volatile memory refs like tree-ssa-loop-ivopts.c
> +         does, but need to revisit whether that's necessary. */

Do you want to iterate over only memory accesses or also over
address computations?  I suppose you are only interested in
scalar memory accesses (memory accesses with some non-BLKmode).
In that case you get all relevant stms with

         if (gimple_vuse (stmt)
             && gimple_assign_single_p (stmt))

> +      if (is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
> +       {
> +         tree_ssa_addr_mode_stmt (stmt, &gsi, speed);
> +       }
> +    }
> +}

I think you got the easy pieces done - now the hard part of
creating target dependent addressing that actually helps is missing.
And to not do too much un-CSE that will never be cleaned up afterwards.

Richard.

> +/* Main entry point for exposing target addressing modes not caught
> +   by tree-ssa-loop-ivopts.c. */
> +
> +static unsigned int
> +tree_ssa_addr_mode (void)
> +{
> +  basic_block bb;
> +  bool chicken_switch = false;
> +
> +  /* #### Temporary debug aid; set to true to disable this phase in debugger. */
> +  if (chicken_switch)
> +    return 0;
> +
> +  FOR_EACH_BB (bb)
> +    tree_ssa_addr_mode_block (bb);
> +
> +  return 0;
> +}
> +
> +struct gimple_opt_pass pass_tree_addr_mode =
> +{
> + {
> +  GIMPLE_PASS,
> +  "addrmode",                          /* name */
> +  NULL,                                        /* gate */
> +  tree_ssa_addr_mode,                  /* execute */
> +  NULL,                                        /* sub */
> +  NULL,                                        /* next */
> +  0,                                   /* static_pass_number */
> +  TV_TREE_ADDR_MODE,                   /* tv_id */
> +  PROP_cfg,                            /* properties_required */
> +  0,                                   /* properties_provided */
> +  0,                                   /* properties_destroyed */
> +  0,                                   /* todo_flags_start */
> +  /* #### Probably can remove TODO_rebuild_alias. */
> +  TODO_dump_func | TODO_update_ssa |
> +    TODO_rebuild_alias                         /* todo_flags_finish */
> + }
> +};
> Index: gcc/tree-flow.h
> ===================================================================
> --- gcc/tree-flow.h     (revision 169834)
> +++ gcc/tree-flow.h     (working copy)
> @@ -808,6 +808,7 @@
>                                       addr_space_t);
>  unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
>  bool may_be_nonaddressable_p (tree expr);
> +bool may_be_unaligned_p (tree ref, tree step);
>  
>  /* In tree-ssa-threadupdate.c.  */
>  extern bool thread_through_all_blocks (bool);
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     (revision 169834)
> +++ gcc/Makefile.in     (working copy)
> @@ -1389,6 +1389,7 @@
>         tree-sra.o \
>         tree-switch-conversion.o \
>         tree-ssa-address.o \
> +       tree-ssa-addr-mode.o \
>         tree-ssa-alias.o \
>         tree-ssa-ccp.o \
>         tree-ssa-coalesce.o \
> @@ -2556,6 +2557,9 @@
>     $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
>     $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h $(TARGET_H) \
>     tree-pretty-print.h
> +tree-ssa-addr-mode.o : tree-ssa-addr-mode.c $(CONFIG_H) $(SYSTEM_H) \
> +   coretypes.h $(TREE_H) $(BASIC_BLOCK_H) $(TREE_DUMP_H) $(TREE_PASS_H) \
> +   $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) $(TREE_FLOW_H) $(TREE_AFFINE_H)
>  tree-ssa-loop-niter.o : tree-ssa-loop-niter.c $(TREE_FLOW_H) $(CONFIG_H) \
>     $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(PARAMS_H) \
>     $(TREE_INLINE_H) output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c        (revision 169834)
> +++ gcc/passes.c        (working copy)
> @@ -944,6 +944,10 @@
>        NEXT_PASS (pass_rename_ssa_copies);
>        NEXT_PASS (pass_uncprop);
>        NEXT_PASS (pass_local_pure_const);
> +      /* Note: pass_tree_addr_mode should be as close to pass_expand as
> +        possible.  Particularly must be after DCE since DCE has
> +         trouble seeing through some of the generated mem_refs. */
> +      NEXT_PASS (pass_tree_addr_mode);
>      }
>    NEXT_PASS (pass_lower_complex_O0);
>    NEXT_PASS (pass_cleanup_eh);
> 
> 
>
Jeff Law - Feb. 7, 2011, 4:17 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/04/11 15:02, William J. Schmidt wrote:
> Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
> regression for powerpc targets.  The root of the problem is changes in
> address canonicalization a couple of years back.  Discussion in the PR
> suggests some form of exposure of target addressing modes prior to the
> expand phase.  Steven Bosscher (CCed) suggested this be done in a late
> GIMPLE pass, using logic similar to what IVopts does for loops.
> 
> Following is an early prototype of such a pass.  I've verified that it
> restores the regressed code to its earlier, more compact form, and most
> of the addressing sequences on powerpc appear to be better or the same
> as before.  There are at least some complex expressions where early
> exposure produces worse code in the form of extra multiplies; I have
> some ideas for how to tune that down so the shift-add machinery for
> multiplies isn't derailed.  There are still 16 test cases that regress
> on rev 169834 for powerpc, and I haven't examined other targets yet.
> Clearly this is not yet ready, but before investing time into further
> refinement, I'd like to get commentary on the general approach, as well
> as any specific concerns.  Please bear in mind that I am new to GCC, so
> I'm still familiarizing myself with the infrastructure and am likely to
> miss some things that are obvious to the initiated...
> 
> The general approach is to rely on existing machinery used by
> tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
> induction variable optimizations.  The new logic is pretty simple, but
> the interactions with later phases may not be.  For now, I've placed the
> new pass in passes.c as the last -O1 tree-ssa pass.
> 
> In addition to general comments, I'd be interested in answers to some
> specific questions:
> 
> 1) Are there concerns with early exposure of target addressing nodes on
> other targets?  Should the pass be gated to only operate on a subset of
> targets?
> 2) Is the pass placement appropriate?
> 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
> 
> Thanks very much for your insights!
I haven't looked at the patch, but I will note my one major concern here
is the increased exposure of target characteristics into gimple.

One of the major problems we were trying to solve with gimple was the
inability of seasoned developers to predict how a change to the
optimizers would affect not just the target the developer was working
on, but all the other targets the GCC project cares about.

Gimple makes this problem a lot easier to deal with because most of the
target dependencies don't show up until we lower into RTL (yes, there
are exceptions, of course).  The net result is developers can reasonably
predict how their change will affect the resulting gimple across a wide
range of platforms (we still have trouble predicting how the RTL will be
affected obviously).


Your patch takes gimple in a different direction and my concern is that
5-10 years from now we'll have the same problem we had with RTL all over
again in gimple.

So I'd like to see several of the seasoned developers chime in with
their thoughts on this issue.

Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNUBssAAoJEBRtltQi2kC7QXcH/1g54S8ZVBpeZe7/dUJ8Atbk
8jcDYWPDQuRxp9gpSJNo+ijPdA/MulQ904R8C0zUXrZl+noGabh9AlIvWB0z08V6
jIsn19RS3ChYVpTtvEDmcSOSP3MbeYFWqcU/Rcp3FLiKTUKcCXvYAmkJumGh6CM0
zFTR0ixDQEvjuY1u+bFvg7s6cpN9RhkE9eJbXIGMWi/xy0V9sL3jr5RbH80tiPqH
0aFK256Ppy8Wb3U/xBpWcaYKWXzmEY0A6agQXhX3Yfy6R27EsO/pWwFkIuhFaSUA
aYJQ1H3mKuOwn65Z+qQwtgb6aKBnwYhbZAi+iSRmPL6pPH4u+eOmn4Kq000HP9E=
=djbg
-----END PGP SIGNATURE-----
Richard Guenther - Feb. 7, 2011, 4:33 p.m.
On Mon, 7 Feb 2011, Jeff Law wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/04/11 15:02, William J. Schmidt wrote:
> > Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
> > regression for powerpc targets.  The root of the problem is changes in
> > address canonicalization a couple of years back.  Discussion in the PR
> > suggests some form of exposure of target addressing modes prior to the
> > expand phase.  Steven Bosscher (CCed) suggested this be done in a late
> > GIMPLE pass, using logic similar to what IVopts does for loops.
> > 
> > Following is an early prototype of such a pass.  I've verified that it
> > restores the regressed code to its earlier, more compact form, and most
> > of the addressing sequences on powerpc appear to be better or the same
> > as before.  There are at least some complex expressions where early
> > exposure produces worse code in the form of extra multiplies; I have
> > some ideas for how to tune that down so the shift-add machinery for
> > multiplies isn't derailed.  There are still 16 test cases that regress
> > on rev 169834 for powerpc, and I haven't examined other targets yet.
> > Clearly this is not yet ready, but before investing time into further
> > refinement, I'd like to get commentary on the general approach, as well
> > as any specific concerns.  Please bear in mind that I am new to GCC, so
> > I'm still familiarizing myself with the infrastructure and am likely to
> > miss some things that are obvious to the initiated...
> > 
> > The general approach is to rely on existing machinery used by
> > tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
> > induction variable optimizations.  The new logic is pretty simple, but
> > the interactions with later phases may not be.  For now, I've placed the
> > new pass in passes.c as the last -O1 tree-ssa pass.
> > 
> > In addition to general comments, I'd be interested in answers to some
> > specific questions:
> > 
> > 1) Are there concerns with early exposure of target addressing nodes on
> > other targets?  Should the pass be gated to only operate on a subset of
> > targets?
> > 2) Is the pass placement appropriate?
> > 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
> > 
> > Thanks very much for your insights!
> I haven't looked at the patch, but I will note my one major concern here
> is the increased exposure of target characteristics into gimple.
> 
> One of the major problems we were trying to solve with gimple was the
> inability of seasoned developers to predict how a change to the
> optimizers would affect not just the target the developer was working
> on, but all the other targets the GCC project cares about.
> 
> Gimple makes this problem a lot easier to deal with because most of the
> target dependencies don't show up until we lower into RTL (yes, there
> are exceptions, of course).  The net result is developers can reasonably
> predict how their change will affect the resulting gimple across a wide
> range of platforms (we still have trouble predicting how the RTL will be
> affected obviously).
> 
> 
> Your patch takes gimple in a different direction and my concern is that
> 5-10 years from now we'll have the same problem we had with RTL all over
> again in gimple.
> 
> So I'd like to see several of the seasoned developers chime in with
> their thoughts on this issue.

I think we agreed on (on some of the last GCC Summits) to move work
from the RTL pipeline up to a lowered GIMPLE (with some target
specifics).  Eventually the goal of some people was to run register
allocation when we're still in GIMPLE SSA form.

The main concern I have with the lowering process is that we currently
leak the lowered form into generic SSA optimizers via IPA inlining,
thus defining a point in the pass pipeline where we "lower" GIMPLE
further is not possible right now.

I want to address that concern (for 4.7) by doing IPA inlining
of function bodies that are in state before our loop optimizers,
and having that as a good point where target specific information
leaks into the IL anyway (considering vectorization, IV optimization,
prefetching, ...).

I also plan to lower bitfield component references (and classical
bit-field-refs) to read-modify-write instruction sequences.  That
will introduce other target dependencies considering alignment
and word sizes.

Thus, I am not concerned about target dependent stuff too much
as long as statements do not become invalid but merely sub-optimal
for a target, at least up to a definite point in the pass pipeline.

Richard.
Steven Bosscher - Feb. 7, 2011, 5:02 p.m.
On Mon, Feb 7, 2011 at 5:17 PM, Jeff Law <law@redhat.com> wrote:
> I haven't looked at the patch, but I will note my one major concern here
> is the increased exposure of target characteristics into gimple.

Your concern is understandable, but I think there are some big
differences with low GIMPLE and RTL.

First of all, the GIMPLE format itself is still target independent, as
it should be. The operands of something target specific may differ
between targets, but the form of, say, TARGET_MEM_REF is always the
same. In RTL, AFAIU, most problems are caused by the unpredictability
of the form of an RTX.

Second, we have started going down this path long ago. IVopts is "old"
already, and there are other target specific things in GIMPLE for some
time now (FMA, vectorizer, and so on).

Finally, I seem to recall this was a desirable thing, even, as long as
the lowering from completely target independent to something more
target specific is gradual enough. Zadeck had a presentation about
this at one of the GCC summits, and it has been discussed a couple of
times since than.

That's not to say that your concerns are unfounded. We have to be
careful that things don't "leak" to places where they should not be.
For example, it seems to me that things like TARGET_MEM_REF should
never get exposed to "high GIMPLE" passes via IPA, like what Richi
described. That means we probably should set some boundaries in the
passes pipeline to define different states of lowered GIMPLE (much
like WHIRL) and enforce them with the GIMPLE checking pass.

My $0.02 (not corrected for inflation).

Ciao!
Steven
William J. Schmidt - Feb. 7, 2011, 5:19 p.m.
Hi Richi,

Thanks very much for the quick and thorough response!  Comments inline 
below.

Richard Guenther wrote:
> On Fri, 4 Feb 2011, William J. Schmidt wrote:
>
>   
>> Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
>> regression for powerpc targets.  The root of the problem is changes in
>> address canonicalization a couple of years back.  Discussion in the PR
>> suggests some form of exposure of target addressing modes prior to the
>> expand phase.  Steven Bosscher (CCed) suggested this be done in a late
>> GIMPLE pass, using logic similar to what IVopts does for loops.
>>
>> Following is an early prototype of such a pass.  I've verified that it
>> restores the regressed code to its earlier, more compact form, and most
>> of the addressing sequences on powerpc appear to be better or the same
>> as before.  There are at least some complex expressions where early
>> exposure produces worse code in the form of extra multiplies; I have
>> some ideas for how to tune that down so the shift-add machinery for
>> multiplies isn't derailed.  There are still 16 test cases that regress
>> on rev 169834 for powerpc, and I haven't examined other targets yet.
>> Clearly this is not yet ready, but before investing time into further
>> refinement, I'd like to get commentary on the general approach, as well
>> as any specific concerns.  Please bear in mind that I am new to GCC, so
>> I'm still familiarizing myself with the infrastructure and am likely to
>> miss some things that are obvious to the initiated...
>>
>> The general approach is to rely on existing machinery used by
>> tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
>> induction variable optimizations.  The new logic is pretty simple, but
>> the interactions with later phases may not be.  For now, I've placed the
>> new pass in passes.c as the last -O1 tree-ssa pass.
>>
>> In addition to general comments, I'd be interested in answers to some
>> specific questions:
>>
>> 1) Are there concerns with early exposure of target addressing nodes on
>> other targets?  Should the pass be gated to only operate on a subset of
>> targets?
>> 2) Is the pass placement appropriate?
>> 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
>>
>> Thanks very much for your insights!
>>     
>
> Comments inline.
>
>   
>> Bill
>>
>>
>> [gcc]
>> 2011-02-04  William Schmidt  <wschmidt@linux.vnet.ibm.com>
>>
>>         * tree.h (copy_ref_info): New prototype of existing function.
>>         * tree-pass.h (pass_tree_addr_mode): New declaration.
>>         * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
>>         Remove static keyword.
>>         * timevar.def (TV_TREE_ADDR_MODE): New timevar definition.
>>         * tree-ssa-address.c (addr_to_parts): Change to handle null
>>         iv_cand parameter.
>>         * tree-ssa-addr-mode.c:  New.
>>         * tree-flow.h (may_be_unaligned_p): New prototype of existing
>>         function.
>>         * Makefile.in: Add build step for tree-ssa-addr-mode.o.
>>         * passes.c: Add pass for pass_tree_addr_mode.
>>
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h  (revision 169834)
>> +++ gcc/tree.h  (working copy)
>> @@ -5580,6 +5580,9 @@
>>  extern tree tree_mem_ref_addr (tree, tree);
>>  extern void copy_mem_ref_info (tree, tree);
>>  
>> +/* In tree-ssa-loop-ivopts.c */
>> +extern void copy_ref_info (tree, tree);
>> +
>>  /* In tree-vrp.c */
>>  extern bool ssa_name_nonnegative_p (const_tree);
>>  
>> Index: gcc/tree-pass.h
>> ===================================================================
>> --- gcc/tree-pass.h     (revision 169834)
>> +++ gcc/tree-pass.h     (working copy)
>> @@ -385,6 +385,7 @@
>>  extern struct gimple_opt_pass pass_parallelize_loops;
>>  extern struct gimple_opt_pass pass_loop_prefetch;
>>  extern struct gimple_opt_pass pass_iv_optimize;
>> +extern struct gimple_opt_pass pass_tree_addr_mode;
>>  extern struct gimple_opt_pass pass_tree_loop_done;
>>  extern struct gimple_opt_pass pass_ch;
>>  extern struct gimple_opt_pass pass_ccp;
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c  (revision 169834)
>> +++ gcc/tree-ssa-loop-ivopts.c  (working copy)
>> @@ -1604,7 +1604,7 @@
>>  
>>  /* Returns true if memory reference REF with step STEP may be unaligned.  */
>>  
>> -static bool
>> +bool
>>  may_be_unaligned_p (tree ref, tree step)
>>  {
>>    tree base;
>> @@ -5916,7 +5916,7 @@
>>  
>>  /* Copies the reference information from OLD_REF to NEW_REF.  */
>>  
>> -static void
>> +void
>>  copy_ref_info (tree new_ref, tree old_ref)
>>  {
>>    tree new_ptr_base = NULL_TREE;
>> Index: gcc/timevar.def
>> ===================================================================
>> --- gcc/timevar.def     (revision 169834)
>> +++ gcc/timevar.def     (working copy)
>> @@ -236,6 +236,7 @@
>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>> +DEFTIMEVAR (TV_TREE_ADDR_MODE        , "expose addressing modes")
>>     
>
> I think we should name this pass lower-address or similar, it is
> really doing target specific lowering of memory accesses.
>
>   
Agreed.
>>  /* Everything else in rest_of_compilation not included above.  */
>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>> Index: gcc/tree-ssa-address.c
>> ===================================================================
>> --- gcc/tree-ssa-address.c      (revision 169834)
>> +++ gcc/tree-ssa-address.c      (working copy)
>> @@ -631,7 +631,7 @@
>>       is <= 2 -- in that case, no loop invariant code motion can be
>>       exposed.  */
>>  
>> -  if (!base_hint && (addr->n > 2))
>> +  if (!base_hint && (addr->n > 2) && iv_cand)
>>      move_variant_to_index (parts, addr, iv_cand);
>>  
>>    /* First move the most expensive feasible multiplication
>> Index: gcc/tree-ssa-addr-mode.c
>> ===================================================================
>> --- gcc/tree-ssa-addr-mode.c    (revision 0)
>> +++ gcc/tree-ssa-addr-mode.c    (revision 0)
>> @@ -0,0 +1,251 @@
>> +/* Expose target addressing modes in gimple representation.
>> +   Copyright (C) 2011
>> +   Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it
>> +under the terms of the GNU General Public License as published by the
>> +Free Software Foundation; either version 3, or (at your option) any
>> +later version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +/* This pass performs a scan over the gimple representation to expose
>> +   target machine addressing in reference-class expressions.  This is
>> +   necessary for some targets for which the current address
>> +   canonicalization is suboptimal.  Induction variable optimizations
>> +   already expose target addressing for many (but not all) expressions
>> +   in loops, so this has most benefit in non-loop code.  Algorithms
>> +   used here are simplifications of those used in the induction
>> +   variable optimizations. */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tree.h"
>> +#include "basic-block.h"
>> +#include "tree-dump.h"
>> +#include "tree-pass.h"
>> +#include "timevar.h"
>> +#include "cfgloop.h"
>> +#include "gimple.h"
>> +#include "tree-flow.h"
>> +#include "tree-affine.h"
>> +
>> +extern void copy_ref_info (tree, tree);
>> +
>> +static bool
>> +tree_contains_mem_ref_p (tree base)
>> +{
>> +  unsigned int i;
>> +  enum tree_code code;
>> +
>> +  if (!base)
>> +    return false;
>> +
>> +  code = TREE_CODE (base);
>> +  if (code == TARGET_MEM_REF || code == MEM_REF)
>> +    return true;
>> +
>> +  for (i = 0; i < TREE_CODE_LENGTH (code); i++)
>> +    if (tree_contains_mem_ref_p (TREE_OPERAND (base, i)))
>> +      return true;
>> +
>> +  return false;
>> +}
>>     
>
> I think you don't need this function, but see below.
>
>   
This was added when I encountered a number of ICEs downstream.  I wasn't 
sure at the time whether to treat those as real bugs or whether what I 
was doing was breaking a contract with a downstream pass.  I will remove 
this and investigate those regressions more carefully.
>> +/* Expose target addressing modes in EXPR.  Logic extracted and
>> +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
>> +   opportunities. */
>> +
>> +static bool
>> +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> +  tree base = *expr, mem_ref;
>> +  struct pointer_map_t *cache;
>> +  aff_tree aff;
>> +
>> +  if (REFERENCE_CLASS_P (*expr))
>>     
>
> I personally prefer indentation like
>
>      if (!REFERENCE_CLASS_P (*expr))
>        return false;
>
>   
So do I.  Will clean this up.
>> +    {
>> +      /* If the expression is a virtual method table reference, don't attempt
>> +        this.  Incorrect code generation occurs; an example test case is
>> +        testsuite/g++.dg/opt/pr47366.C. */
>> +      if (TREE_CODE (base) == COMPONENT_REF
>> +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
>> +       return false;
>>     
>
> ;)  It probably confuses Martins devirtualization code.
>
>   
>> +      base = unshare_expr (base);
>>     
>
> Ick you don't need this I think.
>
>   
Agreed.  Left over from my initial theft of code from IVopts; should be 
more conservative.
>> +      /*
>> +      if (TREE_CODE (base) == TARGET_MEM_REF)
>> +       {
>> +         tree type = build_pointer_type (TREE_TYPE (base));
>> +         base = tree_mem_ref_addr (type, base);
>> +       }
>> +      */
>> +      /* If the addressing mode is already exposed, let it be.
>> +         #### I am not certain this doesn't miss out on some 
>> +         opportunities, so just in case I'm leaving the above
>> +         code commented out in place for now.  But I think all this
>> +         does is make things more explicit that CSE can catch
>> +         anyhow, and in some cases it produces expressions that
>> +         the may-aliasing infrastructure can't handle. */
>> +      if (tree_contains_mem_ref_p (base))
>>     
>
> TREE_CODE (base) == TARGET_MEM_REF || TREE_CODE (base) == MEM_REF
>
> sub-operands can never be complex.  I _think_ you want to not
> bail out for MEM_REFs, though.
>
>   
Per above, will investigate the regressions that resulted.
>> +       return false;
>> +      else
>>     
>
> no need for else {
>
>   
>> +       {
>> +         /* Check that the base expression is addressable. */
>> +         if (may_be_nonaddressable_p (base))
>> +           return false;
>>     
>
> Correct (the access can include a V_C_E or a bitfield-ref, in which
> case we cannot compute the address of the base of the outermost
> access -- the variable name "base" is probably misleading here).
>
> What we want to do during memory access lowering is decompose the
> access into a part that we can always take the address of,
> like for a bitfield access a.b.c into a.b and .c, eventually
> re-write a.b and re-attach .c.  I have plans to re-surrect a lowering
> pass I developed some years ago for 4.7 (it's on the mem-ref branch).
>
>   
>> +         /* On strict alignment platforms, check that the base expression
>> +            is sufficiently aligned.  There is no additional "step"
>> +             information required, so specify single-byte alignment
>> +             for that parameter.  */
>> +         if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
>> +           return false;
>> +
>> +         base = build_fold_addr_expr (base);
>>     
>
> This is probably where you require unshare_expr ...
>
>   
>> +       }
>> +
>> +      /* Express the address of the original expression as an affine
>> +         combination of trees, looking through SSA defs. */
>> +      cache = pointer_map_create ();
>> +      tree_to_aff_combination_expand (base, TREE_TYPE (base), &aff, &cache);
>> +      pointer_map_destroy (cache);
>>     
>
> ... but here the result shouldn't require it anymore.  It's better
> to make a wrapper/worker of tree_to_aff_combination_expand that
> can handle non-addresses.  Probably generally useful (did you run
> into actual problems w/o unshare_expr?).
>
>   
I will do this and place it in tree-affine.[ch].  I did not run into 
actual problems without unshare_expr, and agree it is likely unnecessary.
>> +
>> +      /* It is possible for aff to have no elements, in which case do
>> +        nothing.  Example: lhs = MEM[(struct foo *) 0B].bar does not
>> +         produce an affine combination. */
>> +      if (!aff.n)
>> +       return false;
>> +
>> +      /* Replace the input expression with an equivalent memory reference
>> +        legal on the target machine. */
>> +      mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff, 
>> +                                reference_alias_ptr_type (*expr),
>> +                               NULL_TREE, NULL_TREE, speed);
>>     
>
> Doing the above will un-CSE a lot of dependent computations.  I think
> you want to limit the SSA walking of tree_to_aff_combination_expand
> or invent some mechanism to re-use existing values from the IL.
>
> That said - the above needs lots of work ;)  Like it misses a
> cost function - we should try to re-use base or index registers
> across memory accesses, like IVOPTs does in loops.  It probably
> is enough to collect candidates on a per-BB basis for this.
>
>   
I completely agree.  As noted, I saw some poor results involving extra 
multiplies, and realized I have some work cut out here.  I was initially 
trying not to redesign the wheel where possible, but the existing 
machinery will need more refinement.  I will consult with you as I get 
deeper into that.
>> +      /* Ensure the memory reference was not built with a base expression
>> +        of zero, which confuses the may-aliasing in subsequent dead
>> +        store elimination.  Assume this isn't a useful replacement when
>> +        this occurs. */
>> +      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
>> +       return false;
>>     
>
> Huh, that would be a bug in create_mem_ref I suppose.  Or in alias 
> analysis.
>
>   
I saw this several times, IIRC usually when processing existing 
mem_refs.  I threw this catch-all in so I could make progress.  I will 
dig into this further and let you know what I'm seeing.
>> +      /* Finish the replacement. */
>> +      copy_ref_info (mem_ref, *expr);
>> +      *expr = mem_ref;
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Expose target addressing modes in STMT, which must be an assignment. */
>> +
>> +static void
>> +tree_ssa_addr_mode_stmt (gimple stmt, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> +  tree *lhs, *rhs1, *rhs2, *rhs3;
>> +  enum tree_code subcode = gimple_assign_rhs_code (stmt);
>> +  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (subcode);
>> +  bool changed;
>> +
>> +  /* Bitfields are being ignored for now, as they are in induction
>> +     variable optimization. */
>> +  if (subcode == BIT_FIELD_REF)
>> +    return;
>>     
>
> You can have BIT_FIELD_REFs on the lhs as well.
>
> Note that COMPONENT_REFs with a DECL_BIT_FIELD FIELD_DECL are also
> bit-field accesses.
>
> It's probably easier to move the checks into tree_ssa_addr_mode_tree
> and operate on TREE_CODE of the reference.
>
>   
OK, thanks!
>> +  lhs = gimple_assign_lhs_ptr (stmt);
>> +  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
>> +
>> +  rhs1 = gimple_assign_rhs1_ptr (stmt);
>> +  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
>>     
>
> No need for all the code below - there are never multiple memory
> accesses (you are interested in) in a single statement.
>
>   
OK.  I thought as much, but wasn't certain whether it was a rule I could 
rely on as future ops are added.  I'll remove the extra logic.
>> +  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
>> +    {
>> +      rhs2 = gimple_assign_rhs2_ptr (stmt);
>> +      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
>> +    }
>> +
>> +  if (rhs_class == GIMPLE_TERNARY_RHS)
>> +    {
>> +      rhs3 = gimple_assign_rhs3_ptr (stmt);
>> +      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
>> +    }
>> +
>> +  if (changed)
>> +    update_stmt (stmt);
>> +}
>> +
>> +/* Process the statements in block BB. */
>> +
>> +static void
>> +tree_ssa_addr_mode_block (basic_block bb)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +
>> +  bool speed = optimize_bb_for_speed_p (bb);
>> +
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +    {
>> +      gimple stmt = gsi_stmt (gsi);
>> +
>> +      /* #### Avoiding volatile memory refs like tree-ssa-loop-ivopts.c
>> +         does, but need to revisit whether that's necessary. */
>>     
>
> Do you want to iterate over only memory accesses or also over
> address computations?  I suppose you are only interested in
> scalar memory accesses (memory accesses with some non-BLKmode).
> In that case you get all relevant stms with
>
>          if (gimple_vuse (stmt)
>              && gimple_assign_single_p (stmt))
>
>   
OK, good idea.  Any thoughts on volatile?  It seems like lowering those 
should be harmless, unless the flagging as volatile is somehow lost by 
the lowering.
>> +      if (is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
>> +       {
>> +         tree_ssa_addr_mode_stmt (stmt, &gsi, speed);
>> +       }
>> +    }
>> +}
>>     
>
> I think you got the easy pieces done - now the hard part of
> creating target dependent addressing that actually helps is missing.
> And to not do too much un-CSE that will never be cleaned up afterwards.
>
> Richard.
>
>   
/nod.  Thanks very much!  I appreciate the help and will work on 
refining this.
>> +/* Main entry point for exposing target addressing modes not caught
>> +   by tree-ssa-loop-ivopts.c. */
>> +
>> +static unsigned int
>> +tree_ssa_addr_mode (void)
>> +{
>> +  basic_block bb;
>> +  bool chicken_switch = false;
>> +
>> +  /* #### Temporary debug aid; set to true to disable this phase in debugger. */
>> +  if (chicken_switch)
>> +    return 0;
>> +
>> +  FOR_EACH_BB (bb)
>> +    tree_ssa_addr_mode_block (bb);
>> +
>> +  return 0;
>> +}
>> +
>> +struct gimple_opt_pass pass_tree_addr_mode =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "addrmode",                          /* name */
>> +  NULL,                                        /* gate */
>> +  tree_ssa_addr_mode,                  /* execute */
>> +  NULL,                                        /* sub */
>> +  NULL,                                        /* next */
>> +  0,                                   /* static_pass_number */
>> +  TV_TREE_ADDR_MODE,                   /* tv_id */
>> +  PROP_cfg,                            /* properties_required */
>> +  0,                                   /* properties_provided */
>> +  0,                                   /* properties_destroyed */
>> +  0,                                   /* todo_flags_start */
>> +  /* #### Probably can remove TODO_rebuild_alias. */
>> +  TODO_dump_func | TODO_update_ssa |
>> +    TODO_rebuild_alias                         /* todo_flags_finish */
>> + }
>> +};
>> Index: gcc/tree-flow.h
>> ===================================================================
>> --- gcc/tree-flow.h     (revision 169834)
>> +++ gcc/tree-flow.h     (working copy)
>> @@ -808,6 +808,7 @@
>>                                       addr_space_t);
>>  unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
>>  bool may_be_nonaddressable_p (tree expr);
>> +bool may_be_unaligned_p (tree ref, tree step);
>>  
>>  /* In tree-ssa-threadupdate.c.  */
>>  extern bool thread_through_all_blocks (bool);
>> Index: gcc/Makefile.in
>> ===================================================================
>> --- gcc/Makefile.in     (revision 169834)
>> +++ gcc/Makefile.in     (working copy)
>> @@ -1389,6 +1389,7 @@
>>         tree-sra.o \
>>         tree-switch-conversion.o \
>>         tree-ssa-address.o \
>> +       tree-ssa-addr-mode.o \
>>         tree-ssa-alias.o \
>>         tree-ssa-ccp.o \
>>         tree-ssa-coalesce.o \
>> @@ -2556,6 +2557,9 @@
>>     $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
>>     $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h $(TARGET_H) \
>>     tree-pretty-print.h
>> +tree-ssa-addr-mode.o : tree-ssa-addr-mode.c $(CONFIG_H) $(SYSTEM_H) \
>> +   coretypes.h $(TREE_H) $(BASIC_BLOCK_H) $(TREE_DUMP_H) $(TREE_PASS_H) \
>> +   $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) $(TREE_FLOW_H) $(TREE_AFFINE_H)
>>  tree-ssa-loop-niter.o : tree-ssa-loop-niter.c $(TREE_FLOW_H) $(CONFIG_H) \
>>     $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(PARAMS_H) \
>>     $(TREE_INLINE_H) output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
>> Index: gcc/passes.c
>> ===================================================================
>> --- gcc/passes.c        (revision 169834)
>> +++ gcc/passes.c        (working copy)
>> @@ -944,6 +944,10 @@
>>        NEXT_PASS (pass_rename_ssa_copies);
>>        NEXT_PASS (pass_uncprop);
>>        NEXT_PASS (pass_local_pure_const);
>> +      /* Note: pass_tree_addr_mode should be as close to pass_expand as
>> +        possible.  Particularly must be after DCE since DCE has
>> +         trouble seeing through some of the generated mem_refs. */
>> +      NEXT_PASS (pass_tree_addr_mode);
>>      }
>>    NEXT_PASS (pass_lower_complex_O0);
>>    NEXT_PASS (pass_cleanup_eh);
>>
>>
>>
>>     
>
>
William J. Schmidt - Feb. 7, 2011, 5:42 p.m.
Hi Steven,

Thanks for the quick and detailed response!  A few comments inline below.

Steven Bosscher wrote:
> Hello Bill,
>
>   
>> Following is an early prototype of such a pass.
>>     
>
> Looks quite interesting. It seems to me that this is not the right
> time for GCC 4.6, but it looks like a step in the right direction for
> GCC 4.7.
>
>
>   
Agreed -- this is definitely not for 4.6.
>> 1) Are there concerns with early exposure of target addressing nodes on
>> other targets?  Should the pass be gated to only operate on a subset of
>> targets?
>>     
>
> In general, it would be best if the pass Just Works (tm) for all
> targets, just like IVopts does.
>
>
>   
>> 2) Is the pass placement appropriate?
>>     
>
> It depends on how the combination of statements ends up producing dead
> code. This pass should be "as late as possible" in the pipeline but it
> should not leave a lot of dead code around.
>
>
>   
>>        * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
>>        Remove static keyword.
>>     
>
> Maybe those functions belong in tree.c or tree-dfa.c now?
>
>   
Perhaps so.  I'll revisit function placement as I get a bit further along.
>   
>>        * tree-ssa-address.c (addr_to_parts): Change to handle null
>>        iv_cand parameter.
>>     
>
> You should add a comment about this, expaining why that case should be handled.
>
>   
OK.
>   
>> +extern void copy_ref_info (tree, tree);
>> +
>>     
>
> No need for this extern declaration.
>
>   
>> +static bool
>> +tree_contains_mem_ref_p (tree base)
>>     
>
> There must always be a comment before each function, explaining the
> purpose, inputs and outputs.
>
>   
Aye.  At this point I think this function is going away, per Richi's 
response.  I need to investigate further why processing MEM_REFs and 
TARGET_MEM_REFs produced ICEs downstream.
>   
>> +/* Expose target addressing modes in EXPR.  Logic extracted and
>> +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
>> +   opportunities. */
>> +
>> +static bool
>> +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> +  tree base = *expr, mem_ref;
>> +  struct pointer_map_t *cache;
>> +  aff_tree aff;
>> +
>> +  if (REFERENCE_CLASS_P (*expr))
>> +    {
>> +      /* If the expression is a virtual method table reference, don't attempt
>> +        this.  Incorrect code generation occurs; an example test case is
>> +        testsuite/g++.dg/opt/pr47366.C. */
>> +      if (TREE_CODE (base) == COMPONENT_REF
>> +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
>> +       return false;
>>     
>
> Can you explain how this leads to wrong code?
>
>   
I'll reproduce this and explain further.  As i recall create_mem_ref 
produced an ugly result with both base and index containing zero values.
>   
>> +
>> +      base = unshare_expr (base);
>>     
>
> You may want to postpone unsharing until you know you can make a
> useful transformation, otherwise you may create unnecessary garbage.
>
>   
Agreed.
>   
>> +
>> +      /*
>> +      if (TREE_CODE (base) == TARGET_MEM_REF)
>> +       {
>> +         tree type = build_pointer_type (TREE_TYPE (base));
>> +         base = tree_mem_ref_addr (type, base);
>> +       }
>> +      */
>> +      /* If the addressing mode is already exposed, let it be.
>> +         #### I am not certain this doesn't miss out on some
>> +         opportunities, so just in case I'm leaving the above
>> +         code commented out in place for now.  But I think all this
>> +         does is make things more explicit that CSE can catch
>> +         anyhow, and in some cases it produces expressions that
>> +         the may-aliasing infrastructure can't handle. */
>> +      if (tree_contains_mem_ref_p (base))
>> +       return false;
>>     
>
> I think you should not touch already existing TARGET_MEM_REFs, because
> we have to assume some other pass (IVopts) already tried hard to find
> a good addressing mode. If that TARGET_MEM_REF is not the best choice,
> then this should be fixed elsewhere.
>
> Even if there is a MEM_REF, I think you want to expose TARGET_MEM_REF.
> If you are going to expose memory addressing modes before expand for
> all cases, you may as well do so for all memory references so that
> expand only has to deal with TARGET_MEM_REFs.
>
>   
tree-ssa-address.c: create_mem_ref_raw() contains logic that will 
produce a MEM_REF in preference to a TARGET_MEM_REF, and this kicks in 
for most simple address forms, so there's not as much distinction 
between the two as one might like.  I.e., if we see a MEM_REF, it's very 
possible somebody already tried to create a TARGET_MEM_REF but got a 
MEM_REF instead.  In any case, per Richi's comments, create_mem_ref 
should be able to tolerate processing existing MEM_REFs, so I need to 
investigate further.  I plan to revert to processing MEM_REFs and get 
advice on how to deal with the results.
>   
>> +      else
>> +       {
>> +         /* Check that the base expression is addressable. */
>> +         if (may_be_nonaddressable_p (base))
>> +           return false;
>>     
>
> When can the address of a MEM be non-addressable??
> Note, two spaces before the "*/".
>
>   
See Richi's comment for a good explanation.  And thanks -- I totally 
missed the two-space convention, and will fix that throughout.
>   
>> +         /* On strict alignment platforms, check that the base expression
>> +            is sufficiently aligned.  There is no additional "step"
>> +             information required, so specify single-byte alignment
>> +             for that parameter.  */
>> +         if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
>> +           return false;
>>     
>
> This case should somehow be expressed in a TARGET_MEM_REF also, if
> "expand" can handle it. So I would try to rewrite this kind of memory
> reference to a TARGET_MEM_REF too.
>
>   
I will experiment with removing this check.  I agree it may well be 
unnecessary; it was a simplification of what is going on in IVopts, and 
may not be appropriate here.
>   
>> +      /* Ensure the memory reference was not built with a base expression
>> +        of zero, which confuses the may-aliasing in subsequent dead
>> +        store elimination.  Assume this isn't a useful replacement when
>> +        this occurs. */
>> +      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
>> +       return false;
>>     
>
> Same here: Any replacement in this pass would be a useful replacement.
>
>   
If I can shoot the underlying bug that prompted this, I'll be able to 
get rid of this check.
>   
>> +  /* Bitfields are being ignored for now, as they are in induction
>> +     variable optimization. */
>> +  if (subcode == BIT_FIELD_REF)
>> +    return;
>>     
>
> Right. BIT_FIELD_REFs are a very special case that GCC has trouble
> enough with even without TARGET_MEM_REFs. But a nice TODO or FIXME
> here would be OK :-)
>
>   
I will add a TODO.  Richi has expressed intent to handle bit_field_refs 
better in the future, so it will be good to have the eyecatcher here.
>   
>> +  lhs = gimple_assign_lhs_ptr (stmt);
>> +  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
>> +
>> +  rhs1 = gimple_assign_rhs1_ptr (stmt);
>> +  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
>> +
>> +  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
>> +    {
>> +      rhs2 = gimple_assign_rhs2_ptr (stmt);
>> +      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
>> +    }
>> +
>> +  if (rhs_class == GIMPLE_TERNARY_RHS)
>> +    {
>> +      rhs3 = gimple_assign_rhs3_ptr (stmt);
>> +      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
>> +    }
>>     
>
> Perhaps iterate over gimple_num_ops instead?
>
>
>   
>> +static unsigned int
>> +tree_ssa_addr_mode (void)
>> +{
>> +  basic_block bb;
>> +  bool chicken_switch = false;
>> +
>> +  /* #### Temporary debug aid; set to true to disable this phase in debugger. */
>> +  if (chicken_switch)
>> +    return 0;
>>     
>
> Are you familiar with debug counters (dbgcnt.*)?
>
>   
No, I wasn't.  Thanks for the pointer!
>   
>> +  FOR_EACH_BB (bb)
>> +    tree_ssa_addr_mode_block (bb);
>>     
>
> Perhaps ignore basic blocks that are part of a loop? Otherwise perhaps
> you undo some of the choices made by IVopts. I don't know how IVopts
> handles memory references that do not depend on an IV, though, so
> maybe you have to look at loop blocks also. On the other hand, if a
> memory reference in a loop does not depend on an IV, it's probably
> already hoisted out of the loop...  Another knob to play with if/when
> you're going to tune this pass :-)
>
>   
That was my initial thought as well.  However, I was concerned with 
several things:  the possibility of losing opportunities that don't 
involve induction variables; the possibility of loop optimizations being 
disabled by command switch; and the problem of being tied to a point in 
the pass order where the loop information is still valid.  Since this is 
a cheap, linear-time pass, it didn't seem onerous to just visit 
everything, so that's where I ended up.
>   
>> +      /* Note: pass_tree_addr_mode should be as close to pass_expand as
>> +        possible.  Particularly must be after DCE since DCE has
>> +         trouble seeing through some of the generated mem_refs. */
>> +      NEXT_PASS (pass_tree_addr_mode);
>>     
>
> ...generated TARGET_MEM_REFs.  */
> Note, two spaces before the "*/" :-)
>   
Again, the use of MEM_REFs vs. TARGET_MEM_REFs is on purpose, per the 
discussion above.
> Is it still true that DCE has problems with TARGET_MEM_REFs? And even
> if this is so: Is it a problem here? As far as I understand, the pass
> combines outputs of different non-memory statements that together
> compute an address. If outputs of some statements are combined, those
> statements may become dead code. But if there is no dead code before
> the new pass_tree_addr_mode, the only dead code afterwards should be
> writes to unused SSA names. Perhaps under those conditions, DCE will
> just work??
>
>   
Unfortunately my notes on what happened here are sketchy.  IIRC, I ended 
up with an ICE on some test cases coming out of DCE.  I'll take a note 
to pop this back above DCE to jog my memory.
> Hoping this will make it into GCC 4.7 in one form or another,
>
> Ciao!
> Steven
>   
Thanks again!  I appreciate your help very much.

Bill
Richard Guenther - Feb. 8, 2011, 9:59 a.m.
On Mon, 7 Feb 2011, William J. Schmidt wrote:

> Hi Richi,
> 
> Thanks very much for the quick and thorough response!  Comments inline below.
> 
> Richard Guenther wrote:
> > On Fri, 4 Feb 2011, William J. Schmidt wrote:
> > 
> >   
> > > Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
> > > regression for powerpc targets.  The root of the problem is changes in
> > > address canonicalization a couple of years back.  Discussion in the PR
> > > suggests some form of exposure of target addressing modes prior to the
> > > expand phase.  Steven Bosscher (CCed) suggested this be done in a late
> > > GIMPLE pass, using logic similar to what IVopts does for loops.
> > > 
> > > Following is an early prototype of such a pass.  I've verified that it
> > > restores the regressed code to its earlier, more compact form, and most
> > > of the addressing sequences on powerpc appear to be better or the same
> > > as before.  There are at least some complex expressions where early
> > > exposure produces worse code in the form of extra multiplies; I have
> > > some ideas for how to tune that down so the shift-add machinery for
> > > multiplies isn't derailed.  There are still 16 test cases that regress
> > > on rev 169834 for powerpc, and I haven't examined other targets yet.
> > > Clearly this is not yet ready, but before investing time into further
> > > refinement, I'd like to get commentary on the general approach, as well
> > > as any specific concerns.  Please bear in mind that I am new to GCC, so
> > > I'm still familiarizing myself with the infrastructure and am likely to
> > > miss some things that are obvious to the initiated...
> > > 
> > > The general approach is to rely on existing machinery used by
> > > tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
> > > induction variable optimizations.  The new logic is pretty simple, but
> > > the interactions with later phases may not be.  For now, I've placed the
> > > new pass in passes.c as the last -O1 tree-ssa pass.
> > > 
> > > In addition to general comments, I'd be interested in answers to some
> > > specific questions:
> > > 
> > > 1) Are there concerns with early exposure of target addressing nodes on
> > > other targets?  Should the pass be gated to only operate on a subset of
> > > targets?
> > > 2) Is the pass placement appropriate?
> > > 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
> > > 
> > > Thanks very much for your insights!
> > >     
> > 
> > Comments inline.
> > 
> >   
> > > Bill
> > > 
> > > 
> > > [gcc]
> > > 2011-02-04  William Schmidt  <wschmidt@linux.vnet.ibm.com>
> > > 
> > >         * tree.h (copy_ref_info): New prototype of existing function.
> > >         * tree-pass.h (pass_tree_addr_mode): New declaration.
> > >         * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
> > >         Remove static keyword.
> > >         * timevar.def (TV_TREE_ADDR_MODE): New timevar definition.
> > >         * tree-ssa-address.c (addr_to_parts): Change to handle null
> > >         iv_cand parameter.
> > >         * tree-ssa-addr-mode.c:  New.
> > >         * tree-flow.h (may_be_unaligned_p): New prototype of existing
> > >         function.
> > >         * Makefile.in: Add build step for tree-ssa-addr-mode.o.
> > >         * passes.c: Add pass for pass_tree_addr_mode.
> > > 
> > > Index: gcc/tree.h
> > > ===================================================================
> > > --- gcc/tree.h  (revision 169834)
> > > +++ gcc/tree.h  (working copy)
> > > @@ -5580,6 +5580,9 @@
> > >  extern tree tree_mem_ref_addr (tree, tree);
> > >  extern void copy_mem_ref_info (tree, tree);
> > >  +/* In tree-ssa-loop-ivopts.c */
> > > +extern void copy_ref_info (tree, tree);
> > > +
> > >  /* In tree-vrp.c */
> > >  extern bool ssa_name_nonnegative_p (const_tree);
> > >  Index: gcc/tree-pass.h
> > > ===================================================================
> > > --- gcc/tree-pass.h     (revision 169834)
> > > +++ gcc/tree-pass.h     (working copy)
> > > @@ -385,6 +385,7 @@
> > >  extern struct gimple_opt_pass pass_parallelize_loops;
> > >  extern struct gimple_opt_pass pass_loop_prefetch;
> > >  extern struct gimple_opt_pass pass_iv_optimize;
> > > +extern struct gimple_opt_pass pass_tree_addr_mode;
> > >  extern struct gimple_opt_pass pass_tree_loop_done;
> > >  extern struct gimple_opt_pass pass_ch;
> > >  extern struct gimple_opt_pass pass_ccp;
> > > Index: gcc/tree-ssa-loop-ivopts.c
> > > ===================================================================
> > > --- gcc/tree-ssa-loop-ivopts.c  (revision 169834)
> > > +++ gcc/tree-ssa-loop-ivopts.c  (working copy)
> > > @@ -1604,7 +1604,7 @@
> > >   /* Returns true if memory reference REF with step STEP may be unaligned.
> > > */
> > >  -static bool
> > > +bool
> > >  may_be_unaligned_p (tree ref, tree step)
> > >  {
> > >    tree base;
> > > @@ -5916,7 +5916,7 @@
> > >   /* Copies the reference information from OLD_REF to NEW_REF.  */
> > >  -static void
> > > +void
> > >  copy_ref_info (tree new_ref, tree old_ref)
> > >  {
> > >    tree new_ptr_base = NULL_TREE;
> > > Index: gcc/timevar.def
> > > ===================================================================
> > > --- gcc/timevar.def     (revision 169834)
> > > +++ gcc/timevar.def     (working copy)
> > > @@ -236,6 +236,7 @@
> > >  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
> > >  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
> > >  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
> > > +DEFTIMEVAR (TV_TREE_ADDR_MODE        , "expose addressing modes")
> > >     
> > 
> > I think we should name this pass lower-address or similar, it is
> > really doing target specific lowering of memory accesses.
> > 
> >   
> Agreed.
> > >  /* Everything else in rest_of_compilation not included above.  */
> > >  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
> > > Index: gcc/tree-ssa-address.c
> > > ===================================================================
> > > --- gcc/tree-ssa-address.c      (revision 169834)
> > > +++ gcc/tree-ssa-address.c      (working copy)
> > > @@ -631,7 +631,7 @@
> > >       is <= 2 -- in that case, no loop invariant code motion can be
> > >       exposed.  */
> > >  -  if (!base_hint && (addr->n > 2))
> > > +  if (!base_hint && (addr->n > 2) && iv_cand)
> > >      move_variant_to_index (parts, addr, iv_cand);
> > >     /* First move the most expensive feasible multiplication
> > > Index: gcc/tree-ssa-addr-mode.c
> > > ===================================================================
> > > --- gcc/tree-ssa-addr-mode.c    (revision 0)
> > > +++ gcc/tree-ssa-addr-mode.c    (revision 0)
> > > @@ -0,0 +1,251 @@
> > > +/* Expose target addressing modes in gimple representation.
> > > +   Copyright (C) 2011
> > > +   Free Software Foundation, Inc.
> > > +
> > > +This file is part of GCC.
> > > +
> > > +GCC is free software; you can redistribute it and/or modify it
> > > +under the terms of the GNU General Public License as published by the
> > > +Free Software Foundation; either version 3, or (at your option) any
> > > +later version.
> > > +
> > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > +for more details.
> > > +
> > > +You should have received a copy of the GNU General Public License
> > > +along with GCC; see the file COPYING3.  If not see
> > > +<http://www.gnu.org/licenses/>.  */
> > > +
> > > +/* This pass performs a scan over the gimple representation to expose
> > > +   target machine addressing in reference-class expressions.  This is
> > > +   necessary for some targets for which the current address
> > > +   canonicalization is suboptimal.  Induction variable optimizations
> > > +   already expose target addressing for many (but not all) expressions
> > > +   in loops, so this has most benefit in non-loop code.  Algorithms
> > > +   used here are simplifications of those used in the induction
> > > +   variable optimizations. */
> > > +
> > > +#include "config.h"
> > > +#include "system.h"
> > > +#include "coretypes.h"
> > > +#include "tree.h"
> > > +#include "basic-block.h"
> > > +#include "tree-dump.h"
> > > +#include "tree-pass.h"
> > > +#include "timevar.h"
> > > +#include "cfgloop.h"
> > > +#include "gimple.h"
> > > +#include "tree-flow.h"
> > > +#include "tree-affine.h"
> > > +
> > > +extern void copy_ref_info (tree, tree);
> > > +
> > > +static bool
> > > +tree_contains_mem_ref_p (tree base)
> > > +{
> > > +  unsigned int i;
> > > +  enum tree_code code;
> > > +
> > > +  if (!base)
> > > +    return false;
> > > +
> > > +  code = TREE_CODE (base);
> > > +  if (code == TARGET_MEM_REF || code == MEM_REF)
> > > +    return true;
> > > +
> > > +  for (i = 0; i < TREE_CODE_LENGTH (code); i++)
> > > +    if (tree_contains_mem_ref_p (TREE_OPERAND (base, i)))
> > > +      return true;
> > > +
> > > +  return false;
> > > +}
> > >     
> > 
> > I think you don't need this function, but see below.
> > 
> >   
> This was added when I encountered a number of ICEs downstream.  I wasn't sure
> at the time whether to treat those as real bugs or whether what I was doing
> was breaking a contract with a downstream pass.  I will remove this and
> investigate those regressions more carefully.
> > > +/* Expose target addressing modes in EXPR.  Logic extracted and
> > > +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
> > > +   opportunities. */
> > > +
> > > +static bool
> > > +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool
> > > speed)
> > > +{
> > > +  tree base = *expr, mem_ref;
> > > +  struct pointer_map_t *cache;
> > > +  aff_tree aff;
> > > +
> > > +  if (REFERENCE_CLASS_P (*expr))
> > >     
> > 
> > I personally prefer indentation like
> > 
> >      if (!REFERENCE_CLASS_P (*expr))
> >        return false;
> > 
> >   
> So do I.  Will clean this up.
> > > +    {
> > > +      /* If the expression is a virtual method table reference, don't
> > > attempt
> > > +        this.  Incorrect code generation occurs; an example test case is
> > > +        testsuite/g++.dg/opt/pr47366.C. */
> > > +      if (TREE_CODE (base) == COMPONENT_REF
> > > +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
> > > +       return false;
> > >     
> > 
> > ;)  It probably confuses Martins devirtualization code.
> > 
> >   
> > > +      base = unshare_expr (base);
> > >     
> > 
> > Ick you don't need this I think.
> > 
> >   
> Agreed.  Left over from my initial theft of code from IVopts; should be more
> conservative.
> > > +      /*
> > > +      if (TREE_CODE (base) == TARGET_MEM_REF)
> > > +       {
> > > +         tree type = build_pointer_type (TREE_TYPE (base));
> > > +         base = tree_mem_ref_addr (type, base);
> > > +       }
> > > +      */
> > > +      /* If the addressing mode is already exposed, let it be.
> > > +         #### I am not certain this doesn't miss out on some +
> > > opportunities, so just in case I'm leaving the above
> > > +         code commented out in place for now.  But I think all this
> > > +         does is make things more explicit that CSE can catch
> > > +         anyhow, and in some cases it produces expressions that
> > > +         the may-aliasing infrastructure can't handle. */
> > > +      if (tree_contains_mem_ref_p (base))
> > >     
> > 
> > TREE_CODE (base) == TARGET_MEM_REF || TREE_CODE (base) == MEM_REF
> > 
> > sub-operands can never be complex.  I _think_ you want to not
> > bail out for MEM_REFs, though.
> > 
> >   
> Per above, will investigate the regressions that resulted.
> > > +       return false;
> > > +      else
> > >     
> > 
> > no need for else {
> > 
> >   
> > > +       {
> > > +         /* Check that the base expression is addressable. */
> > > +         if (may_be_nonaddressable_p (base))
> > > +           return false;
> > >     
> > 
> > Correct (the access can include a V_C_E or a bitfield-ref, in which
> > case we cannot compute the address of the base of the outermost
> > access -- the variable name "base" is probably misleading here).
> > 
> > What we want to do during memory access lowering is decompose the
> > access into a part that we can always take the address of,
> > like for a bitfield access a.b.c into a.b and .c, eventually
> > re-write a.b and re-attach .c.  I have plans to re-surrect a lowering
> > pass I developed some years ago for 4.7 (it's on the mem-ref branch).
> > 
> >   
> > > +         /* On strict alignment platforms, check that the base expression
> > > +            is sufficiently aligned.  There is no additional "step"
> > > +             information required, so specify single-byte alignment
> > > +             for that parameter.  */
> > > +         if (STRICT_ALIGNMENT && may_be_unaligned_p (base,
> > > size_one_node))
> > > +           return false;
> > > +
> > > +         base = build_fold_addr_expr (base);
> > >     
> > 
> > This is probably where you require unshare_expr ...
> > 
> >   
> > > +       }
> > > +
> > > +      /* Express the address of the original expression as an affine
> > > +         combination of trees, looking through SSA defs. */
> > > +      cache = pointer_map_create ();
> > > +      tree_to_aff_combination_expand (base, TREE_TYPE (base), &aff,
> > > &cache);
> > > +      pointer_map_destroy (cache);
> > >     
> > 
> > ... but here the result shouldn't require it anymore.  It's better
> > to make a wrapper/worker of tree_to_aff_combination_expand that
> > can handle non-addresses.  Probably generally useful (did you run
> > into actual problems w/o unshare_expr?).
> > 
> >   
> I will do this and place it in tree-affine.[ch].  I did not run into actual
> problems without unshare_expr, and agree it is likely unnecessary.
> > > +
> > > +      /* It is possible for aff to have no elements, in which case do
> > > +        nothing.  Example: lhs = MEM[(struct foo *) 0B].bar does not
> > > +         produce an affine combination. */
> > > +      if (!aff.n)
> > > +       return false;
> > > +
> > > +      /* Replace the input expression with an equivalent memory reference
> > > +        legal on the target machine. */
> > > +      mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff, +
> > > reference_alias_ptr_type (*expr),
> > > +                               NULL_TREE, NULL_TREE, speed);
> > >     
> > 
> > Doing the above will un-CSE a lot of dependent computations.  I think
> > you want to limit the SSA walking of tree_to_aff_combination_expand
> > or invent some mechanism to re-use existing values from the IL.
> > 
> > That said - the above needs lots of work ;)  Like it misses a
> > cost function - we should try to re-use base or index registers
> > across memory accesses, like IVOPTs does in loops.  It probably
> > is enough to collect candidates on a per-BB basis for this.
> > 
> >   
> I completely agree.  As noted, I saw some poor results involving extra
> multiplies, and realized I have some work cut out here.  I was initially
> trying not to redesign the wheel where possible, but the existing machinery
> will need more refinement.  I will consult with you as I get deeper into that.
> > > +      /* Ensure the memory reference was not built with a base expression
> > > +        of zero, which confuses the may-aliasing in subsequent dead
> > > +        store elimination.  Assume this isn't a useful replacement when
> > > +        this occurs. */
> > > +      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
> > > +       return false;
> > >     
> > 
> > Huh, that would be a bug in create_mem_ref I suppose.  Or in alias analysis.
> > 
> >   
> I saw this several times, IIRC usually when processing existing mem_refs.  I
> threw this catch-all in so I could make progress.  I will dig into this
> further and let you know what I'm seeing.
> > > +      /* Finish the replacement. */
> > > +      copy_ref_info (mem_ref, *expr);
> > > +      *expr = mem_ref;
> > > +      return true;
> > > +    }
> > > +  return false;
> > > +}
> > > +
> > > +/* Expose target addressing modes in STMT, which must be an assignment.
> > > */
> > > +
> > > +static void
> > > +tree_ssa_addr_mode_stmt (gimple stmt, gimple_stmt_iterator *gsi, bool
> > > speed)
> > > +{
> > > +  tree *lhs, *rhs1, *rhs2, *rhs3;
> > > +  enum tree_code subcode = gimple_assign_rhs_code (stmt);
> > > +  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (subcode);
> > > +  bool changed;
> > > +
> > > +  /* Bitfields are being ignored for now, as they are in induction
> > > +     variable optimization. */
> > > +  if (subcode == BIT_FIELD_REF)
> > > +    return;
> > >     
> > 
> > You can have BIT_FIELD_REFs on the lhs as well.
> > 
> > Note that COMPONENT_REFs with a DECL_BIT_FIELD FIELD_DECL are also
> > bit-field accesses.
> > 
> > It's probably easier to move the checks into tree_ssa_addr_mode_tree
> > and operate on TREE_CODE of the reference.
> > 
> >   
> OK, thanks!
> > > +  lhs = gimple_assign_lhs_ptr (stmt);
> > > +  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
> > > +
> > > +  rhs1 = gimple_assign_rhs1_ptr (stmt);
> > > +  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
> > >     
> > 
> > No need for all the code below - there are never multiple memory
> > accesses (you are interested in) in a single statement.
> > 
> >   
> OK.  I thought as much, but wasn't certain whether it was a rule I could rely
> on as future ops are added.  I'll remove the extra logic.
> > > +  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
> > > +    {
> > > +      rhs2 = gimple_assign_rhs2_ptr (stmt);
> > > +      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
> > > +    }
> > > +
> > > +  if (rhs_class == GIMPLE_TERNARY_RHS)
> > > +    {
> > > +      rhs3 = gimple_assign_rhs3_ptr (stmt);
> > > +      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
> > > +    }
> > > +
> > > +  if (changed)
> > > +    update_stmt (stmt);
> > > +}
> > > +
> > > +/* Process the statements in block BB. */
> > > +
> > > +static void
> > > +tree_ssa_addr_mode_block (basic_block bb)
> > > +{
> > > +  gimple_stmt_iterator gsi;
> > > +
> > > +  bool speed = optimize_bb_for_speed_p (bb);
> > > +
> > > +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > +    {
> > > +      gimple stmt = gsi_stmt (gsi);
> > > +
> > > +      /* #### Avoiding volatile memory refs like tree-ssa-loop-ivopts.c
> > > +         does, but need to revisit whether that's necessary. */
> > >     
> > 
> > Do you want to iterate over only memory accesses or also over
> > address computations?  I suppose you are only interested in
> > scalar memory accesses (memory accesses with some non-BLKmode).
> > In that case you get all relevant stms with
> > 
> >          if (gimple_vuse (stmt)
> >              && gimple_assign_single_p (stmt))
> > 
> >   
> OK, good idea.  Any thoughts on volatile?  It seems like lowering those should
> be harmless, unless the flagging as volatile is somehow lost by the lowering.

The volatileness should be preserved during the lowering, but I see no
reason to not rewrite volatile accesses.

Richard.
Richard Guenther - Feb. 8, 2011, 10:04 a.m.
On Mon, 7 Feb 2011, William J. Schmidt wrote:

> Hi Steven,
> 
> Thanks for the quick and detailed response!  A few comments inline below.
> 
> Steven Bosscher wrote:
> > Hello Bill,
> > 
> >   
> > > Following is an early prototype of such a pass.
> > >     
> > 
> > Looks quite interesting. It seems to me that this is not the right
> > time for GCC 4.6, but it looks like a step in the right direction for
> > GCC 4.7.
> > 
> > 
> >   
> Agreed -- this is definitely not for 4.6.
> > > 1) Are there concerns with early exposure of target addressing nodes on
> > > other targets?  Should the pass be gated to only operate on a subset of
> > > targets?
> > >     
> > 
> > In general, it would be best if the pass Just Works (tm) for all
> > targets, just like IVopts does.
> > 
> > 
> >   
> > > 2) Is the pass placement appropriate?
> > >     
> > 
> > It depends on how the combination of statements ends up producing dead
> > code. This pass should be "as late as possible" in the pipeline but it
> > should not leave a lot of dead code around.
> > 
> > 
> >   
> > >        * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
> > >        Remove static keyword.
> > >     
> > 
> > Maybe those functions belong in tree.c or tree-dfa.c now?
> > 
> >   
> Perhaps so.  I'll revisit function placement as I get a bit further along.
> >   
> > >        * tree-ssa-address.c (addr_to_parts): Change to handle null
> > >        iv_cand parameter.
> > >     
> > 
> > You should add a comment about this, expaining why that case should be
> > handled.
> > 
> >   
> OK.
> >   
> > > +extern void copy_ref_info (tree, tree);
> > > +
> > >     
> > 
> > No need for this extern declaration.
> > 
> >   
> > > +static bool
> > > +tree_contains_mem_ref_p (tree base)
> > >     
> > 
> > There must always be a comment before each function, explaining the
> > purpose, inputs and outputs.
> > 
> >   
> Aye.  At this point I think this function is going away, per Richi's response.
> I need to investigate further why processing MEM_REFs and TARGET_MEM_REFs
> produced ICEs downstream.
> >   
> > > +/* Expose target addressing modes in EXPR.  Logic extracted and
> > > +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
> > > +   opportunities. */
> > > +
> > > +static bool
> > > +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool
> > > speed)
> > > +{
> > > +  tree base = *expr, mem_ref;
> > > +  struct pointer_map_t *cache;
> > > +  aff_tree aff;
> > > +
> > > +  if (REFERENCE_CLASS_P (*expr))
> > > +    {
> > > +      /* If the expression is a virtual method table reference, don't
> > > attempt
> > > +        this.  Incorrect code generation occurs; an example test case is
> > > +        testsuite/g++.dg/opt/pr47366.C. */
> > > +      if (TREE_CODE (base) == COMPONENT_REF
> > > +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
> > > +       return false;
> > >     
> > 
> > Can you explain how this leads to wrong code?
> > 
> >   
> I'll reproduce this and explain further.  As i recall create_mem_ref produced
> an ugly result with both base and index containing zero values.
> >   
> > > +
> > > +      base = unshare_expr (base);
> > >     
> > 
> > You may want to postpone unsharing until you know you can make a
> > useful transformation, otherwise you may create unnecessary garbage.
> > 
> >   
> Agreed.
> >   
> > > +
> > > +      /*
> > > +      if (TREE_CODE (base) == TARGET_MEM_REF)
> > > +       {
> > > +         tree type = build_pointer_type (TREE_TYPE (base));
> > > +         base = tree_mem_ref_addr (type, base);
> > > +       }
> > > +      */
> > > +      /* If the addressing mode is already exposed, let it be.
> > > +         #### I am not certain this doesn't miss out on some
> > > +         opportunities, so just in case I'm leaving the above
> > > +         code commented out in place for now.  But I think all this
> > > +         does is make things more explicit that CSE can catch
> > > +         anyhow, and in some cases it produces expressions that
> > > +         the may-aliasing infrastructure can't handle. */
> > > +      if (tree_contains_mem_ref_p (base))
> > > +       return false;
> > >     
> > 
> > I think you should not touch already existing TARGET_MEM_REFs, because
> > we have to assume some other pass (IVopts) already tried hard to find
> > a good addressing mode. If that TARGET_MEM_REF is not the best choice,
> > then this should be fixed elsewhere.
> > 
> > Even if there is a MEM_REF, I think you want to expose TARGET_MEM_REF.
> > If you are going to expose memory addressing modes before expand for
> > all cases, you may as well do so for all memory references so that
> > expand only has to deal with TARGET_MEM_REFs.
> > 
> >   
> tree-ssa-address.c: create_mem_ref_raw() contains logic that will produce a
> MEM_REF in preference to a TARGET_MEM_REF, and this kicks in for most simple
> address forms, so there's not as much distinction between the two as one might
> like.  I.e., if we see a MEM_REF, it's very possible somebody already tried to
> create a TARGET_MEM_REF but got a MEM_REF instead.  In any case, per Richi's
> comments, create_mem_ref should be able to tolerate processing existing
> MEM_REFs, so I need to investigate further.  I plan to revert to processing
> MEM_REFs and get advice on how to deal with the results.

TARGET_MEM_REF is indeed just a more "beefier" form of MEM_REF as it
can handle non-constant offsets.  Operands zero and one are "common"
though, so a TARGET_MEM_REF with all operands but zero and one NULL_TREE
is equivalent to a MEM_REF.

Richard.

Patch

Index: gcc/tree.h
===================================================================
--- gcc/tree.h  (revision 169834)
+++ gcc/tree.h  (working copy)
@@ -5580,6 +5580,9 @@ 
 extern tree tree_mem_ref_addr (tree, tree);
 extern void copy_mem_ref_info (tree, tree);
 
+/* In tree-ssa-loop-ivopts.c */
+extern void copy_ref_info (tree, tree);
+
 /* In tree-vrp.c */
 extern bool ssa_name_nonnegative_p (const_tree);
 
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h     (revision 169834)
+++ gcc/tree-pass.h     (working copy)
@@ -385,6 +385,7 @@ 
 extern struct gimple_opt_pass pass_parallelize_loops;
 extern struct gimple_opt_pass pass_loop_prefetch;
 extern struct gimple_opt_pass pass_iv_optimize;
+extern struct gimple_opt_pass pass_tree_addr_mode;
 extern struct gimple_opt_pass pass_tree_loop_done;
 extern struct gimple_opt_pass pass_ch;
 extern struct gimple_opt_pass pass_ccp;
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c  (revision 169834)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -1604,7 +1604,7 @@ 
 
 /* Returns true if memory reference REF with step STEP may be unaligned.  */
 
-static bool
+bool
 may_be_unaligned_p (tree ref, tree step)
 {
   tree base;
@@ -5916,7 +5916,7 @@ 
 
 /* Copies the reference information from OLD_REF to NEW_REF.  */
 
-static void
+void
 copy_ref_info (tree new_ref, tree old_ref)
 {
   tree new_ptr_base = NULL_TREE;
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def     (revision 169834)
+++ gcc/timevar.def     (working copy)
@@ -236,6 +236,7 @@ 
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_TREE_ADDR_MODE        , "expose addressing modes")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c      (revision 169834)
+++ gcc/tree-ssa-address.c      (working copy)
@@ -631,7 +631,7 @@ 
      is <= 2 -- in that case, no loop invariant code motion can be
      exposed.  */
 
-  if (!base_hint && (addr->n > 2))
+  if (!base_hint && (addr->n > 2) && iv_cand)
     move_variant_to_index (parts, addr, iv_cand);
 
   /* First move the most expensive feasible multiplication
Index: gcc/tree-ssa-addr-mode.c
===================================================================
--- gcc/tree-ssa-addr-mode.c    (revision 0)
+++ gcc/tree-ssa-addr-mode.c    (revision 0)
@@ -0,0 +1,251 @@ 
+/* Expose target addressing modes in gimple representation.
+   Copyright (C) 2011
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This pass performs a scan over the gimple representation to expose
+   target machine addressing in reference-class expressions.  This is
+   necessary for some targets for which the current address
+   canonicalization is suboptimal.  Induction variable optimizations
+   already expose target addressing for many (but not all) expressions
+   in loops, so this has most benefit in non-loop code.  Algorithms
+   used here are simplifications of those used in the induction
+   variable optimizations. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "basic-block.h"
+#include "tree-dump.h"
+#include "tree-pass.h"
+#include "timevar.h"
+#include "cfgloop.h"
+#include "gimple.h"
+#include "tree-flow.h"
+#include "tree-affine.h"
+
+extern void copy_ref_info (tree, tree);
+
+static bool
+tree_contains_mem_ref_p (tree base)
+{
+  unsigned int i;
+  enum tree_code code;
+
+  if (!base)
+    return false;
+
+  code = TREE_CODE (base);
+  if (code == TARGET_MEM_REF || code == MEM_REF)
+    return true;
+
+  for (i = 0; i < TREE_CODE_LENGTH (code); i++)
+    if (tree_contains_mem_ref_p (TREE_OPERAND (base, i)))
+      return true;
+
+  return false;
+}
+
+/* Expose target addressing modes in EXPR.  Logic extracted and
+   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
+   opportunities. */
+
+static bool
+tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
+{
+  tree base = *expr, mem_ref;
+  struct pointer_map_t *cache;
+  aff_tree aff;
+
+  if (REFERENCE_CLASS_P (*expr))
+    {
+      /* If the expression is a virtual method table reference, don't attempt
+        this.  Incorrect code generation occurs; an example test case is
+        testsuite/g++.dg/opt/pr47366.C. */
+      if (TREE_CODE (base) == COMPONENT_REF
+         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
+       return false;
+
+      base = unshare_expr (base);
+
+      /*
+      if (TREE_CODE (base) == TARGET_MEM_REF)
+       {
+         tree type = build_pointer_type (TREE_TYPE (base));
+         base = tree_mem_ref_addr (type, base);
+       }
+      */
+      /* If the addressing mode is already exposed, let it be.
+         #### I am not certain this doesn't miss out on some 
+         opportunities, so just in case I'm leaving the above
+         code commented out in place for now.  But I think all this
+         does is make things more explicit that CSE can catch
+         anyhow, and in some cases it produces expressions that
+         the may-aliasing infrastructure can't handle. */
+      if (tree_contains_mem_ref_p (base))
+       return false;
+      else 
+       {
+         /* Check that the base expression is addressable. */
+         if (may_be_nonaddressable_p (base))
+           return false;
+
+         /* On strict alignment platforms, check that the base expression
+            is sufficiently aligned.  There is no additional "step"
+             information required, so specify single-byte alignment
+             for that parameter.  */
+         if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
+           return false;
+
+         base = build_fold_addr_expr (base);
+       }
+
+      /* Express the address of the original expression as an affine
+         combination of trees, looking through SSA defs. */
+      cache = pointer_map_create ();
+      tree_to_aff_combination_expand (base, TREE_TYPE (base), &aff, &cache);
+      pointer_map_destroy (cache);
+
+      /* It is possible for aff to have no elements, in which case do
+        nothing.  Example: lhs = MEM[(struct foo *) 0B].bar does not
+         produce an affine combination. */
+      if (!aff.n)
+       return false;
+
+      /* Replace the input expression with an equivalent memory reference
+        legal on the target machine. */
+      mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff, 
+                                reference_alias_ptr_type (*expr),
+                               NULL_TREE, NULL_TREE, speed);
+
+      /* Ensure the memory reference was not built with a base expression
+        of zero, which confuses the may-aliasing in subsequent dead
+        store elimination.  Assume this isn't a useful replacement when
+        this occurs. */
+      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
+       return false;
+
+      /* Finish the replacement. */
+      copy_ref_info (mem_ref, *expr);
+      *expr = mem_ref;
+
+      return true;
+    }
+  return false;
+}
+
+/* Expose target addressing modes in STMT, which must be an assignment. */
+
+static void
+tree_ssa_addr_mode_stmt (gimple stmt, gimple_stmt_iterator *gsi, bool speed)
+{
+  tree *lhs, *rhs1, *rhs2, *rhs3;
+  enum tree_code subcode = gimple_assign_rhs_code (stmt);
+  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (subcode);
+  bool changed;
+
+  /* Bitfields are being ignored for now, as they are in induction
+     variable optimization. */
+  if (subcode == BIT_FIELD_REF)
+    return;
+
+  lhs = gimple_assign_lhs_ptr (stmt);
+  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
+
+  rhs1 = gimple_assign_rhs1_ptr (stmt);
+  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
+
+  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
+    {
+      rhs2 = gimple_assign_rhs2_ptr (stmt);
+      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
+    }
+
+  if (rhs_class == GIMPLE_TERNARY_RHS)
+    {
+      rhs3 = gimple_assign_rhs3_ptr (stmt);
+      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
+    }
+
+  if (changed)
+    update_stmt (stmt);
+}
+
+/* Process the statements in block BB. */
+
+static void
+tree_ssa_addr_mode_block (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+
+  bool speed = optimize_bb_for_speed_p (bb);
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple stmt = gsi_stmt (gsi);
+
+      /* #### Avoiding volatile memory refs like tree-ssa-loop-ivopts.c
+         does, but need to revisit whether that's necessary. */
+
+      if (is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
+       {
+         tree_ssa_addr_mode_stmt (stmt, &gsi, speed);
+       }
+    }
+}
+
+/* Main entry point for exposing target addressing modes not caught
+   by tree-ssa-loop-ivopts.c. */
+
+static unsigned int
+tree_ssa_addr_mode (void)
+{
+  basic_block bb;
+  bool chicken_switch = false;
+
+  /* #### Temporary debug aid; set to true to disable this phase in debugger. */
+  if (chicken_switch)
+    return 0;
+
+  FOR_EACH_BB (bb)
+    tree_ssa_addr_mode_block (bb);
+
+  return 0;
+}
+
+struct gimple_opt_pass pass_tree_addr_mode =
+{
+ {
+  GIMPLE_PASS,
+  "addrmode",                          /* name */
+  NULL,                                        /* gate */
+  tree_ssa_addr_mode,                  /* execute */
+  NULL,                                        /* sub */
+  NULL,                                        /* next */
+  0,                                   /* static_pass_number */
+  TV_TREE_ADDR_MODE,                   /* tv_id */
+  PROP_cfg,                            /* properties_required */
+  0,                                   /* properties_provided */
+  0,                                   /* properties_destroyed */
+  0,                                   /* todo_flags_start */
+  /* #### Probably can remove TODO_rebuild_alias. */
+  TODO_dump_func | TODO_update_ssa |
+    TODO_rebuild_alias                         /* todo_flags_finish */
+ }
+};
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h     (revision 169834)
+++ gcc/tree-flow.h     (working copy)
@@ -808,6 +808,7 @@ 
                                      addr_space_t);
 unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
 bool may_be_nonaddressable_p (tree expr);
+bool may_be_unaligned_p (tree ref, tree step);
 
 /* In tree-ssa-threadupdate.c.  */
 extern bool thread_through_all_blocks (bool);
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in     (revision 169834)
+++ gcc/Makefile.in     (working copy)
@@ -1389,6 +1389,7 @@ 
        tree-sra.o \
        tree-switch-conversion.o \
        tree-ssa-address.o \
+       tree-ssa-addr-mode.o \
        tree-ssa-alias.o \
        tree-ssa-ccp.o \
        tree-ssa-coalesce.o \
@@ -2556,6 +2557,9 @@ 
    $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
    $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h $(TARGET_H) \
    tree-pretty-print.h
+tree-ssa-addr-mode.o : tree-ssa-addr-mode.c $(CONFIG_H) $(SYSTEM_H) \
+   coretypes.h $(TREE_H) $(BASIC_BLOCK_H) $(TREE_DUMP_H) $(TREE_PASS_H) \
+   $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) $(TREE_FLOW_H) $(TREE_AFFINE_H)
 tree-ssa-loop-niter.o : tree-ssa-loop-niter.c $(TREE_FLOW_H) $(CONFIG_H) \
    $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(PARAMS_H) \
    $(TREE_INLINE_H) output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c        (revision 169834)
+++ gcc/passes.c        (working copy)
@@ -944,6 +944,10 @@ 
       NEXT_PASS (pass_rename_ssa_copies);
       NEXT_PASS (pass_uncprop);
       NEXT_PASS (pass_local_pure_const);
+      /* Note: pass_tree_addr_mode should be as close to pass_expand as
+        possible.  Particularly must be after DCE since DCE has
+         trouble seeing through some of the generated mem_refs. */
+      NEXT_PASS (pass_tree_addr_mode);
     }
   NEXT_PASS (pass_lower_complex_O0);
   NEXT_PASS (pass_cleanup_eh);