RFC: Use Levenshtein spelling suggestions in Fortran FE
diff mbox

Message ID 1448974501-30981-4-git-send-email-rep.dot.nop@gmail.com
State New
Headers show

Commit Message

Bernhard Reutner-Fischer Dec. 1, 2015, 12:55 p.m. UTC
gcc/fortran/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>

	* gfortran.h (gfc_lookup_function_fuzzy): New declaration.
	* resolve.c: Include spellcheck.h.
	(lookup_function_fuzzy_find_candidates): New static function.
	(lookup_uop_fuzzy_find_candidates): Likewise.
	(lookup_uop_fuzzy): Likewise.
	(resolve_operator) <INTRINSIC_USER>: Call lookup_uop_fuzzy.
	(gfc_lookup_function_fuzzy): New definition.
	(resolve_unknown_f): Call gfc_lookup_function_fuzzy.
	* interface.c (check_interface0): Likewise.
	* symbol.c: Include spellcheck.h.
	(lookup_symbol_fuzzy_find_candidates): New static function.
	(lookup_symbol_fuzzy): Likewise.
	(gfc_set_default_type): Call lookup_symbol_fuzzy.
	(lookup_component_fuzzy_find_candidates): New static function.
	(lookup_component_fuzzy): Likewise.
	(gfc_find_component): Call lookup_component_fuzzy.

gcc/testsuite/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>

	* gfortran.dg/spellcheck-operator.f90: New testcase.
	* gfortran.dg/spellcheck-procedure.f90: New testcase.
	* gfortran.dg/spellcheck-structure.f90: New testcase.

---

David Malcolm nice Levenshtein distance spelling check helpers
were used in some parts of other frontends. This proposed patch adds
some spelling corrections to the fortran frontend.

Suggestions are printed if we can find a suitable name, currently
perusing a very simple cutoff factor:
/* If more than half of the letters were misspelled, the suggestion is
   likely to be meaningless.  */
cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
which effectively skips names with less than 4 characters.
For e.g. structures, one could try to be much smarter in an attempt to
also provide suggestions for single-letter members/components.

This patch covers (at least partly):
- user-defined operators
- structures (types and their components)
- functions
- symbols (variables)

I do not immediately see how to handle subroutines. Ideas?

If anybody has a testcase where a spelling-suggestion would make sense
then please pass it along so we maybe can add support for GCC-7.

Signed-off-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
---
 gcc/fortran/gfortran.h                             |   1 +
 gcc/fortran/interface.c                            |  16 ++-
 gcc/fortran/resolve.c                              | 135 ++++++++++++++++++++-
 gcc/fortran/symbol.c                               | 129 +++++++++++++++++++-
 gcc/testsuite/gfortran.dg/spellcheck-operator.f90  |  30 +++++
 gcc/testsuite/gfortran.dg/spellcheck-procedure.f90 |  41 +++++++
 gcc/testsuite/gfortran.dg/spellcheck-structure.f90 |  35 ++++++
 7 files changed, 376 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-operator.f90
 create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
 create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-structure.f90

Comments

Steve Kargl Dec. 1, 2015, 3:01 p.m. UTC | #1
On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
> 
> David Malcolm nice Levenshtein distance spelling check helpers
> were used in some parts of other frontends. This proposed patch adds
> some spelling corrections to the fortran frontend.
> 
> Suggestions are printed if we can find a suitable name, currently
> perusing a very simple cutoff factor:
> /* If more than half of the letters were misspelled, the suggestion is
>    likely to be meaningless.  */
> cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
> which effectively skips names with less than 4 characters.
> For e.g. structures, one could try to be much smarter in an attempt to
> also provide suggestions for single-letter members/components.
> 
> This patch covers (at least partly):
> - user-defined operators
> - structures (types and their components)
> - functions
> - symbols (variables)
> 
> I do not immediately see how to handle subroutines. Ideas?
> 
> If anybody has a testcase where a spelling-suggestion would make sense
> then please pass it along so we maybe can add support for GCC-7.
> 

What problem are you trying to solve here?  The patch looks like
unneeded complexity with the result of injecting C++ idioms into
the Fortran FE.
Bernhard Reutner-Fischer Dec. 1, 2015, 4:12 p.m. UTC | #2
On 1 December 2015 at 16:01, Steve Kargl
<sgk@troutmask.apl.washington.edu> wrote:
> On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
>>
>> David Malcolm nice Levenshtein distance spelling check helpers
>> were used in some parts of other frontends. This proposed patch adds
>> some spelling corrections to the fortran frontend.

> What problem are you trying to solve here?  The patch looks like

The idea is to improve the programmer experience when writing code.
See the testcases enclosed in the patch. I consider this a feature :)

> unneeded complexity with the result of injecting C++ idioms into
> the Fortran FE.

What C++ idioms are you referring to? The autovec?
AFAIU the light use of C++ in GCC is deemed OK. I see usage of
std::swap and std::map in the FE, not to mention the wide-int uses
(wi::). Thus we don't have to realloc/strcat but can use vectors to
the same effect, just as other frontends, including the C frontend,
do.
I take it you remember that we had to change all "try" to something
C++ friendly. If the Fortran FE meant to opt-out of being compiled
with a C++ compiler in the first place, why were all the C++ clashes
rewritten, back then? :)

thanks,
Steve Kargl Dec. 1, 2015, 4:41 p.m. UTC | #3
On Tue, Dec 01, 2015 at 05:12:57PM +0100, Bernhard Reutner-Fischer wrote:
> On 1 December 2015 at 16:01, Steve Kargl
> <sgk@troutmask.apl.washington.edu> wrote:
> > On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
> >>
> >> David Malcolm nice Levenshtein distance spelling check helpers
> >> were used in some parts of other frontends. This proposed patch adds
> >> some spelling corrections to the fortran frontend.
> 
> > What problem are you trying to solve here?  The patch looks like
> 
> The idea is to improve the programmer experience when writing code.
> See the testcases enclosed in the patch. I consider this a feature :)

Opinions differ.  I consider it unnecessary bloat.

> > unneeded complexity with the result of injecting C++ idioms into
> > the Fortran FE.
> 
> What C++ idioms are you referring to? The autovec?
> AFAIU the light use of C++ in GCC is deemed OK. I see usage of
> std::swap and std::map in the FE, not to mention the wide-int uses
> (wi::). Thus we don't have to realloc/strcat but can use vectors to
> the same effect, just as other frontends, including the C frontend,
> do.
> I take it you remember that we had to change all "try" to something
> C++ friendly. If the Fortran FE meant to opt-out of being compiled
> with a C++ compiler in the first place, why were all the C++ clashes
> rewritten, back then? :)

Yes, I know there are other C++ (mis)features within the
Fortran FE especially in the trans-*.c files.  Those are
accepted (by some) as necessary evils to interface with 
the ME.  Your patch injects C++ into otherwise perfectly
fine C code, which makes it more difficult for those with
no or very limited C++ knowledge to maintain the gfortran.

There are currently 806 open bug reports for gfortran.
AFAIK, your patch does not address any of those bug reports.
The continued push to inject C++ into the Fortran FE will
have the (un)intentional consequence of forcing at least one
active gfortran contributor to stop.

--  
Steve
David Malcolm Dec. 1, 2015, 5:28 p.m. UTC | #4
On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:
> gcc/fortran/ChangeLog
> 
> 2015-11-29  Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>
> 
> 	* gfortran.h (gfc_lookup_function_fuzzy): New declaration.
> 	* resolve.c: Include spellcheck.h.
> 	(lookup_function_fuzzy_find_candidates): New static function.
> 	(lookup_uop_fuzzy_find_candidates): Likewise.
> 	(lookup_uop_fuzzy): Likewise.
> 	(resolve_operator) <INTRINSIC_USER>: Call lookup_uop_fuzzy.
> 	(gfc_lookup_function_fuzzy): New definition.
> 	(resolve_unknown_f): Call gfc_lookup_function_fuzzy.
> 	* interface.c (check_interface0): Likewise.
> 	* symbol.c: Include spellcheck.h.
> 	(lookup_symbol_fuzzy_find_candidates): New static function.
> 	(lookup_symbol_fuzzy): Likewise.
> 	(gfc_set_default_type): Call lookup_symbol_fuzzy.
> 	(lookup_component_fuzzy_find_candidates): New static function.
> 	(lookup_component_fuzzy): Likewise.
> 	(gfc_find_component): Call lookup_component_fuzzy.
> 
> gcc/testsuite/ChangeLog
> 
> 2015-11-29  Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>
> 
> 	* gfortran.dg/spellcheck-operator.f90: New testcase.
> 	* gfortran.dg/spellcheck-procedure.f90: New testcase.
> 	* gfortran.dg/spellcheck-structure.f90: New testcase.
> 
> ---
> 
> David Malcolm nice Levenshtein distance spelling check helpers
> were used in some parts of other frontends. This proposed patch adds
> some spelling corrections to the fortran frontend.
> 
> Suggestions are printed if we can find a suitable name, currently
> perusing a very simple cutoff factor:
> /* If more than half of the letters were misspelled, the suggestion is
>    likely to be meaningless.  */
> cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
> which effectively skips names with less than 4 characters.
> For e.g. structures, one could try to be much smarter in an attempt to
> also provide suggestions for single-letter members/components.
> 
> This patch covers (at least partly):
> - user-defined operators
> - structures (types and their components)
> - functions
> - symbols (variables)
> 
> I do not immediately see how to handle subroutines. Ideas?
> 
> If anybody has a testcase where a spelling-suggestion would make sense
> then please pass it along so we maybe can add support for GCC-7.
> 
> Signed-off-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
> ---
>  gcc/fortran/gfortran.h                             |   1 +
>  gcc/fortran/interface.c                            |  16 ++-
>  gcc/fortran/resolve.c                              | 135 ++++++++++++++++++++-
>  gcc/fortran/symbol.c                               | 129 +++++++++++++++++++-
>  gcc/testsuite/gfortran.dg/spellcheck-operator.f90  |  30 +++++
>  gcc/testsuite/gfortran.dg/spellcheck-procedure.f90 |  41 +++++++
>  gcc/testsuite/gfortran.dg/spellcheck-structure.f90 |  35 ++++++
>  7 files changed, 376 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-operator.f90
>  create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
>  create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-structure.f90
> 
> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
> index 5487c93..cbfd592 100644
> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -3060,6 +3060,7 @@ bool gfc_type_is_extensible (gfc_symbol *);
>  bool gfc_resolve_intrinsic (gfc_symbol *, locus *);
>  bool gfc_explicit_interface_required (gfc_symbol *, char *, int);
>  extern int gfc_do_concurrent_flag;
> +const char* gfc_lookup_function_fuzzy (const char *, gfc_symtree *);
>  
> 
>  /* array.c */
> diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
> index 30cc522..19f800f 100644
> --- a/gcc/fortran/interface.c
> +++ b/gcc/fortran/interface.c
> @@ -1590,10 +1590,18 @@ check_interface0 (gfc_interface *p, const char *interface_name)
>  	  if (p->sym->attr.external)
>  	    gfc_error ("Procedure %qs in %s at %L has no explicit interface",
>  		       p->sym->name, interface_name, &p->sym->declared_at);
> -	  else
> -	    gfc_error ("Procedure %qs in %s at %L is neither function nor "
> -		       "subroutine", p->sym->name, interface_name,
> -		      &p->sym->declared_at);
> +	  else {
> +	    const char *guessed
> +	      = gfc_lookup_function_fuzzy (p->sym->name, p->sym->ns->sym_root);
> +	    if (guessed)
> +	      gfc_error ("Procedure %qs in %s at %L is neither function nor "
> +			 "subroutine; did you mean %qs?", p->sym->name,
> +			interface_name, &p->sym->declared_at, guessed);
> +	    else
> +	      gfc_error ("Procedure %qs in %s at %L is neither function nor "
> +			 "subroutine", p->sym->name, interface_name,
> +			&p->sym->declared_at);
> +	  }
>  	  return 1;
>  	}
>  
> diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
> index 685e3f5..6e1f63c 100644
> --- a/gcc/fortran/resolve.c
> +++ b/gcc/fortran/resolve.c
> @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "data.h"
>  #include "target-memory.h" /* for gfc_simplify_transfer */
>  #include "constructor.h"
> +#include "spellcheck.h"
>  
>  /* Types used in equivalence statements.  */
>  
> @@ -2682,6 +2683,61 @@ resolve_specific_f (gfc_expr *expr)
>    return true;
>  }
>  
> +/* Recursively append candidate SYM to CANDIDATES.  */
> +
> +static void
> +lookup_function_fuzzy_find_candidates (gfc_symtree *sym,
> +                                       vec<const char *> *candidates)
> +{
> +  gfc_symtree *p;
> +  for (p = sym->right; p; p = p->right)
> +    {
> +      lookup_function_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +  for (p = sym->left; p; p = p->left)
> +    {
> +      lookup_function_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +}
> +
> +
> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
> +
> +const char*
> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
> +{
> +  auto_vec <const char *> candidates;
> +  lookup_function_fuzzy_find_candidates (fun, &candidates);
> +
> +  /* Determine closest match.  */
> +  int i;
> +  const char *name, *best = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  FOR_EACH_VEC_ELT (candidates, i, name)
> +    {
> +      edit_distance_t dist = levenshtein_distance (fn, name);
> +      if (dist < best_distance)
> +	{
> +	  best_distance = dist;
> +	  best = name;
> +	}
> +    }
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best)
> +    {
> +      unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
> +      if (best_distance > cutoff)
> +	return NULL;
> +    }
> +  return best;
> +}


Caveat: I'm not very familiar with the Fortran FE, so take the following
with a pinch of salt.

If I'm reading things right, here, and in various other places, you're
building a vec of const char *, and then seeing which one of those
candidates is the best match for another const char *.

You could simplify things by adding a helper function to spellcheck.h,
akin to this one:

extern tree
find_closest_identifier (tree target, const auto_vec<tree> *candidates);

This would reduce the amount of duplication in the patch (and slightly
reduce the amount of C++).

[are there IDENTIFIER nodes in the Fortran FE, or is it all const char
*? this would avoid some strlen calls]
 
>  /* Resolve a procedure call not known to be generic nor specific.  */
>  
> @@ -2732,8 +2788,15 @@ set_type:
>  
>        if (ts->type == BT_UNKNOWN)
>  	{
> -	  gfc_error ("Function %qs at %L has no IMPLICIT type",
> -		     sym->name, &expr->where);
> +	  const char *guessed
> +	    = gfc_lookup_function_fuzzy (sym->name, sym->ns->sym_root);
> +	  if (guessed)
> +	    gfc_error ("Function %qs at %L has no IMPLICIT type"
> +		       "; did you mean %qs?",
> +		       sym->name, &expr->where, guessed);
> +	  else
> +	    gfc_error ("Function %qs at %L has no IMPLICIT type",
> +		       sym->name, &expr->where);
>  	  return false;
>  	}
>        else
> @@ -3504,6 +3567,63 @@ compare_shapes (gfc_expr *op1, gfc_expr *op2)
>    return t;
>  }
>  
> +/* Recursively append candidate UOP to CANDIDATES.  */
> +
> +static void
> +lookup_uop_fuzzy_find_candidates (gfc_symtree *uop,
> +				  vec<const char *> *candidates)
> +{
> +  gfc_symtree *p;
> +  /* Not sure how to properly filter here.  Use all for a start.
> +     n.uop.op is NULL for empty interface operators (is that legal?) disregard
> +     these as i suppose they don't make terribly sense.  */
> +  for (p = uop->right; p; p = p->right)
> +    {
> +      lookup_function_fuzzy_find_candidates (p, candidates);
> +      if (p->n.uop->op != NULL)
> +	candidates->safe_push (p->name);
> +    }
> +  for (p = uop->left; p; p = p->left)
> +    {
> +      lookup_function_fuzzy_find_candidates (p, candidates);
> +      if (p->n.uop->op != NULL)
> +	candidates->safe_push (p->name);
> +    }
> +}
> +
> +/* Lookup user-operator OP fuzzily, taking names in UOP into account.  */
> +
> +static const char*
> +lookup_uop_fuzzy (const char *op, gfc_symtree *uop)
> +{
> +  auto_vec <const char *> candidates;
> +  lookup_uop_fuzzy_find_candidates (uop, &candidates);
> +
> +  /* Determine closest match.  */
> +  int i;
> +  const char *name, *best = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  FOR_EACH_VEC_ELT (candidates, i, name)
> +    {
> +      edit_distance_t dist = levenshtein_distance (op, name);
> +      if (dist < best_distance)
> +	{
> +	  best_distance = dist;
> +	  best = name;
> +	}
> +    }
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best)
> +    {
> +      unsigned int cutoff = MAX (strlen (op), strlen (best)) / 2;
> +      if (best_distance > cutoff)
> +	return NULL;
> +    }
> +  return best;
> +}

Here again, I think.


>  /* Resolve an operator expression node.  This can involve replacing the
>     operation with a user defined function call.  */
> @@ -3703,7 +3823,16 @@ resolve_operator (gfc_expr *e)
>  
>      case INTRINSIC_USER:
>        if (e->value.op.uop->op == NULL)
> -	sprintf (msg, _("Unknown operator '%s' at %%L"), e->value.op.uop->name);
> +	{
> +	  const char *name = e->value.op.uop->name;
> +	  const char *guessed;
> +	  guessed = lookup_uop_fuzzy (name, e->value.op.uop->ns->uop_root);
> +	  if (guessed)
> +	    sprintf (msg, _("Unknown operator '%s' at %%L; did you mean '%s'?"),
> +		name, guessed);
> +	  else
> +	    sprintf (msg, _("Unknown operator '%s' at %%L"), name);
> +	}
>        else if (op2 == NULL)
>  	sprintf (msg, _("Operand of user operator '%s' at %%L is %s"),
>  		 e->value.op.uop->name, gfc_typename (&op1->ts));
> diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
> index ff9aff9..212f7d8 100644
> --- a/gcc/fortran/symbol.c
> +++ b/gcc/fortran/symbol.c
> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "parse.h"
>  #include "match.h"
>  #include "constructor.h"
> +#include "spellcheck.h"
>  
> 
>  /* Strings for all symbol attributes.  We use these for dumping the
> @@ -235,6 +236,62 @@ gfc_get_default_type (const char *name, gfc_namespace *ns)
>  }
>  
> 
> +/* Recursively append candidate SYM to CANDIDATES.  */
> +
> +static void
> +lookup_symbol_fuzzy_find_candidates (gfc_symtree *sym,
> +				        vec<const char *> *candidates)
> +{
> +  gfc_symtree *p;
> +  for (p = sym->right; p; p = p->right)
> +    {
> +      lookup_symbol_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +  for (p = sym->left; p; p = p->left)
> +    {
> +      lookup_symbol_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +}
> +
> +
> +/* Lookup symbol SYM fuzzily, taking names in SYMBOL into account.  */
> +
> +static const char*
> +lookup_symbol_fuzzy (const char *sym, gfc_symbol *symbol)
> +{
> +  auto_vec <const char *> candidates;
> +  lookup_symbol_fuzzy_find_candidates (symbol->ns->sym_root, &candidates);
> +
> +  /* Determine closest match.  */
> +  int i;
> +  const char *name, *best = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  FOR_EACH_VEC_ELT (candidates, i, name)
> +    {
> +      edit_distance_t dist = levenshtein_distance (sym, name);
> +      if (dist < best_distance)
> +	{
> +	  best_distance = dist;
> +	  best = name;
> +	}
> +    }
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best)
> +    {
> +      unsigned int cutoff = MAX (strlen (sym), strlen (best)) / 2;
> +      if (best_distance > cutoff)
> +	return NULL;
> +    }
> +  return best;
> +}

Here again, I think.

> +
>  /* Given a pointer to a symbol, set its type according to the first
>     letter of its name.  Fails if the letter in question has no default
>     type.  */
> @@ -253,8 +310,15 @@ gfc_set_default_type (gfc_symbol *sym, int error_flag, gfc_namespace *ns)
>      {
>        if (error_flag && !sym->attr.untyped)
>  	{
> -	  gfc_error ("Symbol %qs at %L has no IMPLICIT type",
> -		     sym->name, &sym->declared_at);
> +	  const char *guessed
> +	    = lookup_symbol_fuzzy (sym->name, sym);
> +	  if (guessed)
> +	    gfc_error ("Symbol %qs at %L has no IMPLICIT type"
> +		       "; did you mean %qs?",
> +		       sym->name, &sym->declared_at, guessed);
> +	  else
> +	    gfc_error ("Symbol %qs at %L has no IMPLICIT type",
> +		       sym->name, &sym->declared_at);
>  	  sym->attr.untyped = 1; /* Ensure we only give an error once.  */
>  	}
>  
> @@ -2188,6 +2252,55 @@ bad:
>  }
>  
> 
> +/* Recursively append candidate COMPONENT structures to CANDIDATES.  */
> +
> +static void
> +lookup_component_fuzzy_find_candidates (gfc_component *component,
> +				        vec<const char *> *candidates)
> +{
> +  for (gfc_component *p = component; p; p = p->next)
> +    {
> +      if (00 && p->ts.type == BT_DERIVED)
> +	/* ??? There's no (suitable) DERIVED_TYPE which would come in
> +	   handy throughout the frontend; Use CLASS_DATA here for brevity.  */
> +	lookup_component_fuzzy_find_candidates (CLASS_DATA (p), candidates);
> +      candidates->safe_push (p->name);
> +    }
> +}
> +
> +/* Lookup component MEMBER fuzzily, taking names in COMPONENT into account.  */
> +
> +static const char*
> +lookup_component_fuzzy (const char *member, gfc_component *component)
> +{
> +  auto_vec <const char *> candidates;
> +  lookup_component_fuzzy_find_candidates (component, &candidates);
> +
> +  /* Determine closest match.  */
> +  int i;
> +  const char *name, *best = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  FOR_EACH_VEC_ELT (candidates, i, name)
> +    {
> +      edit_distance_t dist = levenshtein_distance (member, name);
> +      if (dist < best_distance)
> +	{
> +	  best_distance = dist;
> +	  best = name;
> +	}
> +    }
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best)
> +    {
> +      unsigned int cutoff = MAX (strlen (member), strlen (best)) / 2;
> +      if (best_distance > cutoff)
> +	return NULL;
> +    }
> +  return best;
> +}

...and here again.

>  /* Given a derived type node and a component name, try to locate the
>     component structure.  Returns the NULL pointer if the component is
>     not found or the components are private.  If noaccess is set, no access
> @@ -2238,8 +2351,16 @@ gfc_find_component (gfc_symbol *sym, const char *name,
>      }
>  
>    if (p == NULL && !silent)
> -    gfc_error ("%qs at %C is not a member of the %qs structure",
> -	       name, sym->name);
> +    {
> +      const char *guessed = lookup_component_fuzzy (name, sym->components);
> +      if (guessed)
> +	gfc_error ("%qs at %C is not a member of the %qs structure"
> +		   "; did you mean %qs?",
> +		   name, sym->name, guessed);
> +      else
> +	gfc_error ("%qs at %C is not a member of the %qs structure",
> +		   name, sym->name);
> +    }
>  
>    return p;
>  }
> diff --git a/gcc/testsuite/gfortran.dg/spellcheck-operator.f90 b/gcc/testsuite/gfortran.dg/spellcheck-operator.f90
> new file mode 100644
> index 0000000..810a770
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/spellcheck-operator.f90
> @@ -0,0 +1,30 @@
> +! { dg-do compile }
> +! test levenshtein based spelling suggestions
> +
> +module mymod1
> +  implicit none
> +  contains
> +    function something_good (iarg1)
> +      integer :: something_good
> +      integer, intent(in) :: iarg1
> +      something_good = iarg1 + 42
> +    end function something_good
> +end module mymod1
> +
> +program spellchekc
> +  use mymod1
> +  implicit none
> +
> +  interface operator (.mywrong.)
> +    module procedure something_wring ! { dg-error "Procedure .something_wring. in operator interface .mywrong. at .1. is neither function nor subroutine; did you mean .something_good.\\?|User operator procedure .something_wring. at .1. must be a FUNCTION" }
> +  end interface
> +
> +  interface operator (.mygood.)
> +    module procedure something_good
> +  end interface
> +
> +  integer :: i, j, added
> +  i = 0
> +  j = 0
> +  added = .mygoof. j ! { dg-error "Unknown operator .mygoof. at .1.; did you mean .mygood.\\?" }
> +end program spellchekc
> diff --git a/gcc/testsuite/gfortran.dg/spellcheck-procedure.f90 b/gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
> new file mode 100644
> index 0000000..7923081
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
> @@ -0,0 +1,41 @@
> +! { dg-do compile }
> +! test levenshtein based spelling suggestions
> +
> +module mymod1
> +  implicit none
> +  contains
> +    function something_good (iarg1)
> +      integer :: something_good
> +      integer, intent(in) :: iarg1
> +      something_good = iarg1 + 42
> +    end function something_good
> +end module mymod1
> +
> +subroutine bark_unless_zero(iarg)
> +  implicit none
> +  integer, intent(in) :: iarg
> +  if (iarg /= 0) call abort
> +end subroutine bark_unless_zero
> +
> +function myadd(iarg1, iarg2)
> +  implicit none
> +  integer :: myadd
> +  integer, intent(in) :: iarg1, iarg2
> +  myadd = iarg1 + iarg2
> +end function myadd
> +
> +program spellchekc
> +  use mymod1
> +  implicit none
> +
> +  integer :: i, j, myadd
> +  i = 0
> +  j = 0
> +! I suppose this cannot be made to work, no\\?
> +!  call barf_unless_zero(i) ! { -dg-error "; did you mean .bark_unless_zero.\\?" }
> +  j = something_goof(j) ! { dg-error "no IMPLICIT type; did you mean .something_good.\\?" }
> +  j = myaddd(i, j) ! { dg-error "no IMPLICIT type; did you mean .myadd.\\?" }
> +  j = mya(i, j) ! { dg-error "no IMPLICIT type; did you mean .myadd.\\?" }
> +  if (j /= 42) call abort
> +
> +end program spellchekc
> diff --git a/gcc/testsuite/gfortran.dg/spellcheck-structure.f90 b/gcc/testsuite/gfortran.dg/spellcheck-structure.f90
> new file mode 100644
> index 0000000..929e05f
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/spellcheck-structure.f90
> @@ -0,0 +1,35 @@
> +! { dg-do compile }
> +! test levenshtein based spelling suggestions
> +implicit none
> +
> +!!!!!!!!!!!!!! structure tests !!!!!!!!!!!!!!
> +type type1
> +   real :: radius
> +   integer :: i
> +end type type1
> +
> +type type2
> +  integer :: myint
> +  type(type1) :: mytype
> +end type type2
> +
> +type type3
> +  type(type2) :: type_2
> +end type type3
> +type type4
> +  type(type3) :: type_3
> +end type type4
> +
> +type(type1) :: t1
> +t1%radiuz = .0 ! { dg-error ".radiuz. at .1. is not a member of the .type1. structure; did you mean .radius.\\?" }
> +t1%x = .0 ! { dg-error ".x. at .1. is not a member of the .type1. structure" }
> +type(type2) :: t2
> +t2%mytape%radius = .0 ! { dg-error ".mytape. at .1. is not a member of the .type2. structure; did you mean .mytype.\\?" }
> +t2%mytype%radious = .0 ! { dg-error ".radious. at .1. is not a member of the .type1. structure; did you mean .radius.\\?" }
> +type(type4) :: t4
> +t4%type_3%type_2%mytype%radium = 88.0 ! { dg-error ".radium. at .1. is not a member of the .type1. structure; did you mean .radius.\\?" }
> +
> +!!!!!!!!!!!!!! symbol tests !!!!!!!!!!!!!!
> +integer :: iarg1
> +iarg2 = 1 ! { dg-error "Symbol .iarg2. at .1. has no IMPLICIT type; did you mean .iarg1.\\?" }
> +end
Bernhard Reutner-Fischer Dec. 1, 2015, 5:34 p.m. UTC | #5
On 1 December 2015 at 17:41, Steve Kargl
<sgk@troutmask.apl.washington.edu> wrote:
> On Tue, Dec 01, 2015 at 05:12:57PM +0100, Bernhard Reutner-Fischer wrote:
>> On 1 December 2015 at 16:01, Steve Kargl
>> <sgk@troutmask.apl.washington.edu> wrote:
>> > On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
>> >>
>> >> David Malcolm nice Levenshtein distance spelling check helpers
>> >> were used in some parts of other frontends. This proposed patch adds
>> >> some spelling corrections to the fortran frontend.
>>
>> > What problem are you trying to solve here?  The patch looks like
>>
>> The idea is to improve the programmer experience when writing code.
>> See the testcases enclosed in the patch. I consider this a feature :)
>
> Opinions differ.  I consider it unnecessary bloat.

Fair enough.
I fully agree that it's bloat.

The compiler is so tremendously bloated by now anyway that i consider
these couple of kilobyte to have a nice bloat/user friendliness
factor, overall ;)
I can imagine that people code their fortran programs in an IDE (the
bloated variant of an editor, mine is ~20518 bytes of text, no data,
no bss) and IDEs will sooner or later support fixit-hints. Even the
console/terminal users might enjoy to safe them a cycle of opening a
different file, looking up the type/module/etc name and then going
back to the source-file to correct their typo. *I* would welcome that
sometimes for sure :)

>> > unneeded complexity with the result of injecting C++ idioms into
>> > the Fortran FE.
>>
>> What C++ idioms are you referring to? The autovec?
>> AFAIU the light use of C++ in GCC is deemed OK. I see usage of
>> std::swap and std::map in the FE, not to mention the wide-int uses
>> (wi::). Thus we don't have to realloc/strcat but can use vectors to
>> the same effect, just as other frontends, including the C frontend,
>> do.
>> I take it you remember that we had to change all "try" to something
>> C++ friendly. If the Fortran FE meant to opt-out of being compiled
>> with a C++ compiler in the first place, why were all the C++ clashes
>> rewritten, back then? :)
>
> Yes, I know there are other C++ (mis)features within the
> Fortran FE especially in the trans-*.c files.  Those are
> accepted (by some) as necessary evils to interface with
> the ME.  Your patch injects C++ into otherwise perfectly
> fine C code, which makes it more difficult for those with
> no or very limited C++ knowledge to maintain the gfortran.

So you're in favour of using realloc and strcat, ok. I can use that.
Let me see if ipa-icf can replace all the identical tails of the
lookup_*_fuzzy into a common helper.
Shouldn't rely on LTO anyway nor ipa-icf i suppose.

>
> There are currently 806 open bug reports for gfortran.
> AFAIK, your patch does not address any of those bug reports.

I admit i didn't look..

> The continued push to inject C++ into the Fortran FE will
> have the (un)intentional consequence of forcing at least one
> active gfortran contributor to stop.

That was not my intention for sure.

cheers,
Bernhard Reutner-Fischer Dec. 1, 2015, 5:51 p.m. UTC | #6
On 1 December 2015 at 18:28, David Malcolm <dmalcolm@redhat.com> wrote:
> On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:


>> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
>> +
>> +const char*
>> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
>> +{
>> +  auto_vec <const char *> candidates;
>> +  lookup_function_fuzzy_find_candidates (fun, &candidates);
>> +
>> +  /* Determine closest match.  */
>> +  int i;
>> +  const char *name, *best = NULL;
>> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
>> +
>> +  FOR_EACH_VEC_ELT (candidates, i, name)
>> +    {
>> +      edit_distance_t dist = levenshtein_distance (fn, name);
>> +      if (dist < best_distance)
>> +     {
>> +       best_distance = dist;
>> +       best = name;
>> +     }
>> +    }
>> +  /* If more than half of the letters were misspelled, the suggestion is
>> +     likely to be meaningless.  */
>> +  if (best)
>> +    {
>> +      unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
>> +      if (best_distance > cutoff)
>> +     return NULL;
>> +    }
>> +  return best;
>> +}
>
>
> Caveat: I'm not very familiar with the Fortran FE, so take the following
> with a pinch of salt.
>
> If I'm reading things right, here, and in various other places, you're
> building a vec of const char *, and then seeing which one of those
> candidates is the best match for another const char *.
>
> You could simplify things by adding a helper function to spellcheck.h,
> akin to this one:
>
> extern tree
> find_closest_identifier (tree target, const auto_vec<tree> *candidates);

I was hoping for ipa-icf to fix that up on my behalf. I'll try to see
if it does. Short of that: yes, should do that.

>
> This would reduce the amount of duplication in the patch (and slightly
> reduce the amount of C++).

As said, we could as well use a list of candidates with NULL as record marker.
Implementation cosmetics. Steve seems to not be thrilled by the
overall idea in the first place, so unless there is clear support by
somebody else i won't pursue this any further, it's not that i'm bored
or ran out of stuff i should do.. ;)
>
> [are there IDENTIFIER nodes in the Fortran FE, or is it all const char
> *? this would avoid some strlen calls]

Right, but in the Fortran FE these are const char*.

thanks for your comments!
David Malcolm Dec. 1, 2015, 5:58 p.m. UTC | #7
On Tue, 2015-12-01 at 18:51 +0100, Bernhard Reutner-Fischer wrote:
> On 1 December 2015 at 18:28, David Malcolm <dmalcolm@redhat.com> wrote:
> > On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:
> 
> 
> >> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
> >> +
> >> +const char*
> >> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
> >> +{
> >> +  auto_vec <const char *> candidates;
> >> +  lookup_function_fuzzy_find_candidates (fun, &candidates);
> >> +
> >> +  /* Determine closest match.  */
> >> +  int i;
> >> +  const char *name, *best = NULL;
> >> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> >> +
> >> +  FOR_EACH_VEC_ELT (candidates, i, name)
> >> +    {
> >> +      edit_distance_t dist = levenshtein_distance (fn, name);
> >> +      if (dist < best_distance)
> >> +     {
> >> +       best_distance = dist;
> >> +       best = name;
> >> +     }
> >> +    }
> >> +  /* If more than half of the letters were misspelled, the suggestion is
> >> +     likely to be meaningless.  */
> >> +  if (best)
> >> +    {
> >> +      unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
> >> +      if (best_distance > cutoff)
> >> +     return NULL;
> >> +    }
> >> +  return best;
> >> +}
> >
> >
> > Caveat: I'm not very familiar with the Fortran FE, so take the following
> > with a pinch of salt.
> >
> > If I'm reading things right, here, and in various other places, you're
> > building a vec of const char *, and then seeing which one of those
> > candidates is the best match for another const char *.
> >
> > You could simplify things by adding a helper function to spellcheck.h,
> > akin to this one:
> >
> > extern tree
> > find_closest_identifier (tree target, const auto_vec<tree> *candidates);
> 
> I was hoping for ipa-icf to fix that up on my behalf. I'll try to see
> if it does. Short of that: yes, should do that.

I was more thinking about code readability; don't rely on ipa-icf - fix
it in the source.

> > This would reduce the amount of duplication in the patch (and slightly
> > reduce the amount of C++).
> 
> As said, we could as well use a list of candidates with NULL as record marker.
> Implementation cosmetics. Steve seems to not be thrilled by the
> overall idea in the first place, so unless there is clear support by
> somebody else i won't pursue this any further, it's not that i'm bored
> or ran out of stuff i should do.. ;)

(FWIW I liked the idea, but I'm not a Fortran person so my opinion
counts much less that Steve's)

> > [are there IDENTIFIER nodes in the Fortran FE, or is it all const char
> > *? this would avoid some strlen calls]
> 
> Right, but in the Fortran FE these are const char*.
> 
> thanks for your comments!
Steve Kargl Dec. 1, 2015, 7:49 p.m. UTC | #8
On Tue, Dec 01, 2015 at 06:34:57PM +0100, Bernhard Reutner-Fischer wrote:
> On 1 December 2015 at 17:41, Steve Kargl
> >
> > Yes, I know there are other C++ (mis)features within the
> > Fortran FE especially in the trans-*.c files.  Those are
> > accepted (by some) as necessary evils to interface with
> > the ME.  Your patch injects C++ into otherwise perfectly
> > fine C code, which makes it more difficult for those with
> > no or very limited C++ knowledge to maintain the gfortran.
> 
> So you're in favour of using realloc and strcat, ok. I can use that.
> Let me see if ipa-icf can replace all the identical tails of the
> lookup_*_fuzzy into a common helper.
> Shouldn't rely on LTO anyway nor ipa-icf i suppose.

Yes, I would prefer it, but certainly won't demand it.
There are other Fortran contributors/maintainers.  They
may prefer you approach, so give them time to speak up.
Steve Kargl Dec. 1, 2015, 8 p.m. UTC | #9
On Tue, Dec 01, 2015 at 12:58:28PM -0500, David Malcolm wrote:
> On Tue, 2015-12-01 at 18:51 +0100, Bernhard Reutner-Fischer wrote:
> > As said, we could as well use a list of candidates with NULL as record marker.
> > Implementation cosmetics. Steve seems to not be thrilled by the
> > overall idea in the first place, so unless there is clear support by
> > somebody else i won't pursue this any further, it's not that i'm bored
> > or ran out of stuff i should do.. ;)
> 
> (FWIW I liked the idea, but I'm not a Fortran person so my opinion
> counts much less that Steve's)
> 

Your opinion is as valid as mine.

My only concern is code maintenance.  Injection of C++ (or any
other language) into C code seems to add possible complications
when something needs to be fix or changed to accommodate a new
Fortran freature.
Janne Blomqvist Dec. 3, 2015, 9:29 a.m. UTC | #10
On Tue, Dec 1, 2015 at 7:51 PM, Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
> As said, we could as well use a list of candidates with NULL as record marker.
> Implementation cosmetics. Steve seems to not be thrilled by the
> overall idea in the first place, so unless there is clear support by
> somebody else i won't pursue this any further, it's not that i'm bored
> or ran out of stuff i should do.. ;)

FWIW, I think the idea of this patch is quite nice, and I'd like to
see it in the compiler.

I'm personally Ok with "C++-isms", but nowadays my contributions are
so minor that my opinion shouldn't carry that much weight on this
matter.
Mikael Morin Dec. 3, 2015, 1:53 p.m. UTC | #11
Le 03/12/2015 10:29, Janne Blomqvist a écrit :
> On Tue, Dec 1, 2015 at 7:51 PM, Bernhard Reutner-Fischer
> <rep.dot.nop@gmail.com> wrote:
>> As said, we could as well use a list of candidates with NULL as record marker.
>> Implementation cosmetics. Steve seems to not be thrilled by the
>> overall idea in the first place, so unless there is clear support by
>> somebody else i won't pursue this any further, it's not that i'm bored
>> or ran out of stuff i should do.. ;)
>
> FWIW, I think the idea of this patch is quite nice, and I'd like to
> see it in the compiler.
>
I like this feature as well.

> I'm personally Ok with "C++-isms", but nowadays my contributions are
> so minor that my opinion shouldn't carry that much weight on this
> matter.
>
Same here.
David Malcolm suggested to move the candidate selection code to the 
common middle-end infrastructure, which would move half of the so-called 
"bloat" there.  Steve, would that work for you?

It seems to me that the remaining C++-isms are rather acceptable.
I do agree that the vec implementation details seem overly complex for 
something whose job is just the memory management of a growing (or 
shrinking) vector.  However, the API is consistent and self-explanatory, 
and the usage of it that is made here (just a few "safe_push") is not 
more complex than what would be done with a C-only API.

Mikael
Steve Kargl Dec. 4, 2015, 12:08 a.m. UTC | #12
On Thu, Dec 03, 2015 at 02:53:06PM +0100, Mikael Morin wrote:
> Le 03/12/2015 10:29, Janne Blomqvist a écrit :
> > On Tue, Dec 1, 2015 at 7:51 PM, Bernhard Reutner-Fischer
> > <rep.dot.nop@gmail.com> wrote:
> >> As said, we could as well use a list of candidates with NULL as record marker.
> >> Implementation cosmetics. Steve seems to not be thrilled by the
> >> overall idea in the first place, so unless there is clear support by
> >> somebody else i won't pursue this any further, it's not that i'm bored
> >> or ran out of stuff i should do.. ;)
> >
> > FWIW, I think the idea of this patch is quite nice, and I'd like to
> > see it in the compiler.
> >
> I like this feature as well.
> 
> > I'm personally Ok with "C++-isms", but nowadays my contributions are
> > so minor that my opinion shouldn't carry that much weight on this
> > matter.
> >
> Same here.
> David Malcolm suggested to move the candidate selection code to the 
> common middle-end infrastructure, which would move half of the so-called 
> "bloat" there.  Steve, would that work for you?

Fine with me.

When debugging, if I run into C++isms, I'll stop and move to
a new bug.  We certainly have enough open bugs to choose from.
Mikael Morin Dec. 5, 2015, 7:53 p.m. UTC | #13
Hello,

to get things moving again, a few comments on top of David Malcolm's:

Le 01/12/2015 13:55, Bernhard Reutner-Fischer a écrit :
>
> David Malcolm nice Levenshtein distance spelling check helpers
> were used in some parts of other frontends. This proposed patch adds
> some spelling corrections to the fortran frontend.
>
> Suggestions are printed if we can find a suitable name, currently
> perusing a very simple cutoff factor:
> /* If more than half of the letters were misspelled, the suggestion is
>     likely to be meaningless.  */
> cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
> which effectively skips names with less than 4 characters.
> For e.g. structures, one could try to be much smarter in an attempt to
> also provide suggestions for single-letter members/components.
>
> This patch covers (at least partly):
> - user-defined operators
> - structures (types and their components)
> - functions
> - symbols (variables)
>
> I do not immediately see how to handle subroutines. Ideas?
>
Not sure what you are looking for; I can get an error generated in 
gfc_procedure_use if using IMPLICIT NONE (EXTERNAL)

> If anybody has a testcase where a spelling-suggestion would make sense
> then please pass it along so we maybe can add support for GCC-7.
>


> diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
> index 685e3f5..6e1f63c 100644
> --- a/gcc/fortran/resolve.c
> +++ b/gcc/fortran/resolve.c
> @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "data.h"
>   #include "target-memory.h" /* for gfc_simplify_transfer */
>   #include "constructor.h"
> +#include "spellcheck.h"
>
>   /* Types used in equivalence statements.  */
>
> @@ -2682,6 +2683,61 @@ resolve_specific_f (gfc_expr *expr)
>     return true;
>   }
>
> +/* Recursively append candidate SYM to CANDIDATES.  */
> +
> +static void
> +lookup_function_fuzzy_find_candidates (gfc_symtree *sym,
> +                                       vec<const char *> *candidates)
> +{
> +  gfc_symtree *p;
> +  for (p = sym->right; p; p = p->right)
> +    {
> +      lookup_function_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +  for (p = sym->left; p; p = p->left)
> +    {
> +      lookup_function_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +}

It seems you are considering some candidates more than once here.
The first time through the recursive call you will consider say 
sym->right->right, and with the loop, you'll consider it again after 
returning from the recursive call.
The usual way to traverse the whole tree is to handle the current 
pointer and recurse on left and right pointers.  So without loop.
There is gfc_traverse_ns that you might find handy to do that (no 
obligation).

Same goes for the user operators below.

> +
> +
> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
> +
> +const char*
> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
> +{
> +  auto_vec <const char *> candidates;
> +  lookup_function_fuzzy_find_candidates (fun, &candidates);

You have to start the lookup with the current namespace's sym_root (not 
with fun), otherwise you'll miss some candidates.
You may also want to query parent namespaces for host-associated symbols.

> +
> +  /* Determine closest match.  */
> +  int i;
> +  const char *name, *best = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +

[...]

> diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
> index ff9aff9..212f7d8 100644
> --- a/gcc/fortran/symbol.c
> +++ b/gcc/fortran/symbol.c
> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "parse.h"
>   #include "match.h"
>   #include "constructor.h"
> +#include "spellcheck.h"
>
>
>   /* Strings for all symbol attributes.  We use these for dumping the
> @@ -235,6 +236,62 @@ gfc_get_default_type (const char *name, gfc_namespace *ns)
>   }
>
>
> +/* Recursively append candidate SYM to CANDIDATES.  */
> +
> +static void
> +lookup_symbol_fuzzy_find_candidates (gfc_symtree *sym,
> +				        vec<const char *> *candidates)
> +{
> +  gfc_symtree *p;
> +  for (p = sym->right; p; p = p->right)
> +    {
> +      lookup_symbol_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +  for (p = sym->left; p; p = p->left)
> +    {
> +      lookup_symbol_fuzzy_find_candidates (p, candidates);
> +      if (p->n.sym->ts.type != BT_UNKNOWN)
> +	candidates->safe_push (p->name);
> +    }
> +}
This looks like the same as lookup_function_fuzzy_find_candidates, isn't it?
Maybe have a general symbol traversal function with a selection callback 
argument to test whether the symbol is what you want, depending on the 
context (is it a function? a subroutine? etc).

> +
> +
> +/* Lookup symbol SYM fuzzily, taking names in SYMBOL into account.  */
> +
> +static const char*
> +lookup_symbol_fuzzy (const char *sym, gfc_symbol *symbol)
> +{
> +  auto_vec <const char *> candidates;
> +  lookup_symbol_fuzzy_find_candidates (symbol->ns->sym_root, &candidates);
> +
> +  /* Determine closest match.  */
> +  int i;
> +  const char *name, *best = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  FOR_EACH_VEC_ELT (candidates, i, name)
> +    {
> +      edit_distance_t dist = levenshtein_distance (sym, name);
> +      if (dist < best_distance)
> +	{
> +	  best_distance = dist;
> +	  best = name;
> +	}
> +    }
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best)
> +    {
> +      unsigned int cutoff = MAX (strlen (sym), strlen (best)) / 2;
> +      if (best_distance > cutoff)
> +	return NULL;
> +    }
> +  return best;
> +}
> +
> +
>   /* Given a pointer to a symbol, set its type according to the first
>      letter of its name.  Fails if the letter in question has no default
>      type.  */
> @@ -253,8 +310,15 @@ gfc_set_default_type (gfc_symbol *sym, int error_flag, gfc_namespace *ns)
>       {
>         if (error_flag && !sym->attr.untyped)
>   	{
> -	  gfc_error ("Symbol %qs at %L has no IMPLICIT type",
> -		     sym->name, &sym->declared_at);
> +	  const char *guessed
> +	    = lookup_symbol_fuzzy (sym->name, sym);
> +	  if (guessed)
> +	    gfc_error ("Symbol %qs at %L has no IMPLICIT type"
> +		       "; did you mean %qs?",
> +		       sym->name, &sym->declared_at, guessed);
> +	  else
> +	    gfc_error ("Symbol %qs at %L has no IMPLICIT type",
> +		       sym->name, &sym->declared_at);
>   	  sym->attr.untyped = 1; /* Ensure we only give an error once.  */
>   	}
>
> @@ -2188,6 +2252,55 @@ bad:
>   }
>
>
> +/* Recursively append candidate COMPONENT structures to CANDIDATES.  */
> +
> +static void
> +lookup_component_fuzzy_find_candidates (gfc_component *component,
> +				        vec<const char *> *candidates)
> +{
> +  for (gfc_component *p = component; p; p = p->next)
> +    {
> +      if (00 && p->ts.type == BT_DERIVED)
> +	/* ??? There's no (suitable) DERIVED_TYPE which would come in
> +	   handy throughout the frontend; Use CLASS_DATA here for brevity.  */
> +	lookup_component_fuzzy_find_candidates (CLASS_DATA (p), candidates);
I don't understand what you are looking for here.
Are you trying to handle type extension?  Then I guess you would have to 
pass the derived type symbol instead of its components, and use 
gfc_get_derived_super_type to retrieve the parent type.

Mikael

Patch
diff mbox

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5487c93..cbfd592 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3060,6 +3060,7 @@  bool gfc_type_is_extensible (gfc_symbol *);
 bool gfc_resolve_intrinsic (gfc_symbol *, locus *);
 bool gfc_explicit_interface_required (gfc_symbol *, char *, int);
 extern int gfc_do_concurrent_flag;
+const char* gfc_lookup_function_fuzzy (const char *, gfc_symtree *);
 
 
 /* array.c */
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 30cc522..19f800f 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -1590,10 +1590,18 @@  check_interface0 (gfc_interface *p, const char *interface_name)
 	  if (p->sym->attr.external)
 	    gfc_error ("Procedure %qs in %s at %L has no explicit interface",
 		       p->sym->name, interface_name, &p->sym->declared_at);
-	  else
-	    gfc_error ("Procedure %qs in %s at %L is neither function nor "
-		       "subroutine", p->sym->name, interface_name,
-		      &p->sym->declared_at);
+	  else {
+	    const char *guessed
+	      = gfc_lookup_function_fuzzy (p->sym->name, p->sym->ns->sym_root);
+	    if (guessed)
+	      gfc_error ("Procedure %qs in %s at %L is neither function nor "
+			 "subroutine; did you mean %qs?", p->sym->name,
+			interface_name, &p->sym->declared_at, guessed);
+	    else
+	      gfc_error ("Procedure %qs in %s at %L is neither function nor "
+			 "subroutine", p->sym->name, interface_name,
+			&p->sym->declared_at);
+	  }
 	  return 1;
 	}
 
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 685e3f5..6e1f63c 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -29,6 +29,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "data.h"
 #include "target-memory.h" /* for gfc_simplify_transfer */
 #include "constructor.h"
+#include "spellcheck.h"
 
 /* Types used in equivalence statements.  */
 
@@ -2682,6 +2683,61 @@  resolve_specific_f (gfc_expr *expr)
   return true;
 }
 
+/* Recursively append candidate SYM to CANDIDATES.  */
+
+static void
+lookup_function_fuzzy_find_candidates (gfc_symtree *sym,
+                                       vec<const char *> *candidates)
+{
+  gfc_symtree *p;
+  for (p = sym->right; p; p = p->right)
+    {
+      lookup_function_fuzzy_find_candidates (p, candidates);
+      if (p->n.sym->ts.type != BT_UNKNOWN)
+	candidates->safe_push (p->name);
+    }
+  for (p = sym->left; p; p = p->left)
+    {
+      lookup_function_fuzzy_find_candidates (p, candidates);
+      if (p->n.sym->ts.type != BT_UNKNOWN)
+	candidates->safe_push (p->name);
+    }
+}
+
+
+/* Lookup function FN fuzzily, taking names in FUN into account.  */
+
+const char*
+gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
+{
+  auto_vec <const char *> candidates;
+  lookup_function_fuzzy_find_candidates (fun, &candidates);
+
+  /* Determine closest match.  */
+  int i;
+  const char *name, *best = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+
+  FOR_EACH_VEC_ELT (candidates, i, name)
+    {
+      edit_distance_t dist = levenshtein_distance (fn, name);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best = name;
+	}
+    }
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best)
+    {
+      unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
+      if (best_distance > cutoff)
+	return NULL;
+    }
+  return best;
+}
+
 
 /* Resolve a procedure call not known to be generic nor specific.  */
 
@@ -2732,8 +2788,15 @@  set_type:
 
       if (ts->type == BT_UNKNOWN)
 	{
-	  gfc_error ("Function %qs at %L has no IMPLICIT type",
-		     sym->name, &expr->where);
+	  const char *guessed
+	    = gfc_lookup_function_fuzzy (sym->name, sym->ns->sym_root);
+	  if (guessed)
+	    gfc_error ("Function %qs at %L has no IMPLICIT type"
+		       "; did you mean %qs?",
+		       sym->name, &expr->where, guessed);
+	  else
+	    gfc_error ("Function %qs at %L has no IMPLICIT type",
+		       sym->name, &expr->where);
 	  return false;
 	}
       else
@@ -3504,6 +3567,63 @@  compare_shapes (gfc_expr *op1, gfc_expr *op2)
   return t;
 }
 
+/* Recursively append candidate UOP to CANDIDATES.  */
+
+static void
+lookup_uop_fuzzy_find_candidates (gfc_symtree *uop,
+				  vec<const char *> *candidates)
+{
+  gfc_symtree *p;
+  /* Not sure how to properly filter here.  Use all for a start.
+     n.uop.op is NULL for empty interface operators (is that legal?) disregard
+     these as i suppose they don't make terribly sense.  */
+  for (p = uop->right; p; p = p->right)
+    {
+      lookup_function_fuzzy_find_candidates (p, candidates);
+      if (p->n.uop->op != NULL)
+	candidates->safe_push (p->name);
+    }
+  for (p = uop->left; p; p = p->left)
+    {
+      lookup_function_fuzzy_find_candidates (p, candidates);
+      if (p->n.uop->op != NULL)
+	candidates->safe_push (p->name);
+    }
+}
+
+/* Lookup user-operator OP fuzzily, taking names in UOP into account.  */
+
+static const char*
+lookup_uop_fuzzy (const char *op, gfc_symtree *uop)
+{
+  auto_vec <const char *> candidates;
+  lookup_uop_fuzzy_find_candidates (uop, &candidates);
+
+  /* Determine closest match.  */
+  int i;
+  const char *name, *best = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+
+  FOR_EACH_VEC_ELT (candidates, i, name)
+    {
+      edit_distance_t dist = levenshtein_distance (op, name);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best = name;
+	}
+    }
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best)
+    {
+      unsigned int cutoff = MAX (strlen (op), strlen (best)) / 2;
+      if (best_distance > cutoff)
+	return NULL;
+    }
+  return best;
+}
+
 
 /* Resolve an operator expression node.  This can involve replacing the
    operation with a user defined function call.  */
@@ -3703,7 +3823,16 @@  resolve_operator (gfc_expr *e)
 
     case INTRINSIC_USER:
       if (e->value.op.uop->op == NULL)
-	sprintf (msg, _("Unknown operator '%s' at %%L"), e->value.op.uop->name);
+	{
+	  const char *name = e->value.op.uop->name;
+	  const char *guessed;
+	  guessed = lookup_uop_fuzzy (name, e->value.op.uop->ns->uop_root);
+	  if (guessed)
+	    sprintf (msg, _("Unknown operator '%s' at %%L; did you mean '%s'?"),
+		name, guessed);
+	  else
+	    sprintf (msg, _("Unknown operator '%s' at %%L"), name);
+	}
       else if (op2 == NULL)
 	sprintf (msg, _("Operand of user operator '%s' at %%L is %s"),
 		 e->value.op.uop->name, gfc_typename (&op1->ts));
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index ff9aff9..212f7d8 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -27,6 +27,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "parse.h"
 #include "match.h"
 #include "constructor.h"
+#include "spellcheck.h"
 
 
 /* Strings for all symbol attributes.  We use these for dumping the
@@ -235,6 +236,62 @@  gfc_get_default_type (const char *name, gfc_namespace *ns)
 }
 
 
+/* Recursively append candidate SYM to CANDIDATES.  */
+
+static void
+lookup_symbol_fuzzy_find_candidates (gfc_symtree *sym,
+				        vec<const char *> *candidates)
+{
+  gfc_symtree *p;
+  for (p = sym->right; p; p = p->right)
+    {
+      lookup_symbol_fuzzy_find_candidates (p, candidates);
+      if (p->n.sym->ts.type != BT_UNKNOWN)
+	candidates->safe_push (p->name);
+    }
+  for (p = sym->left; p; p = p->left)
+    {
+      lookup_symbol_fuzzy_find_candidates (p, candidates);
+      if (p->n.sym->ts.type != BT_UNKNOWN)
+	candidates->safe_push (p->name);
+    }
+}
+
+
+/* Lookup symbol SYM fuzzily, taking names in SYMBOL into account.  */
+
+static const char*
+lookup_symbol_fuzzy (const char *sym, gfc_symbol *symbol)
+{
+  auto_vec <const char *> candidates;
+  lookup_symbol_fuzzy_find_candidates (symbol->ns->sym_root, &candidates);
+
+  /* Determine closest match.  */
+  int i;
+  const char *name, *best = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+
+  FOR_EACH_VEC_ELT (candidates, i, name)
+    {
+      edit_distance_t dist = levenshtein_distance (sym, name);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best = name;
+	}
+    }
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best)
+    {
+      unsigned int cutoff = MAX (strlen (sym), strlen (best)) / 2;
+      if (best_distance > cutoff)
+	return NULL;
+    }
+  return best;
+}
+
+
 /* Given a pointer to a symbol, set its type according to the first
    letter of its name.  Fails if the letter in question has no default
    type.  */
@@ -253,8 +310,15 @@  gfc_set_default_type (gfc_symbol *sym, int error_flag, gfc_namespace *ns)
     {
       if (error_flag && !sym->attr.untyped)
 	{
-	  gfc_error ("Symbol %qs at %L has no IMPLICIT type",
-		     sym->name, &sym->declared_at);
+	  const char *guessed
+	    = lookup_symbol_fuzzy (sym->name, sym);
+	  if (guessed)
+	    gfc_error ("Symbol %qs at %L has no IMPLICIT type"
+		       "; did you mean %qs?",
+		       sym->name, &sym->declared_at, guessed);
+	  else
+	    gfc_error ("Symbol %qs at %L has no IMPLICIT type",
+		       sym->name, &sym->declared_at);
 	  sym->attr.untyped = 1; /* Ensure we only give an error once.  */
 	}
 
@@ -2188,6 +2252,55 @@  bad:
 }
 
 
+/* Recursively append candidate COMPONENT structures to CANDIDATES.  */
+
+static void
+lookup_component_fuzzy_find_candidates (gfc_component *component,
+				        vec<const char *> *candidates)
+{
+  for (gfc_component *p = component; p; p = p->next)
+    {
+      if (00 && p->ts.type == BT_DERIVED)
+	/* ??? There's no (suitable) DERIVED_TYPE which would come in
+	   handy throughout the frontend; Use CLASS_DATA here for brevity.  */
+	lookup_component_fuzzy_find_candidates (CLASS_DATA (p), candidates);
+      candidates->safe_push (p->name);
+    }
+}
+
+/* Lookup component MEMBER fuzzily, taking names in COMPONENT into account.  */
+
+static const char*
+lookup_component_fuzzy (const char *member, gfc_component *component)
+{
+  auto_vec <const char *> candidates;
+  lookup_component_fuzzy_find_candidates (component, &candidates);
+
+  /* Determine closest match.  */
+  int i;
+  const char *name, *best = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+
+  FOR_EACH_VEC_ELT (candidates, i, name)
+    {
+      edit_distance_t dist = levenshtein_distance (member, name);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best = name;
+	}
+    }
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best)
+    {
+      unsigned int cutoff = MAX (strlen (member), strlen (best)) / 2;
+      if (best_distance > cutoff)
+	return NULL;
+    }
+  return best;
+}
+
 /* Given a derived type node and a component name, try to locate the
    component structure.  Returns the NULL pointer if the component is
    not found or the components are private.  If noaccess is set, no access
@@ -2238,8 +2351,16 @@  gfc_find_component (gfc_symbol *sym, const char *name,
     }
 
   if (p == NULL && !silent)
-    gfc_error ("%qs at %C is not a member of the %qs structure",
-	       name, sym->name);
+    {
+      const char *guessed = lookup_component_fuzzy (name, sym->components);
+      if (guessed)
+	gfc_error ("%qs at %C is not a member of the %qs structure"
+		   "; did you mean %qs?",
+		   name, sym->name, guessed);
+      else
+	gfc_error ("%qs at %C is not a member of the %qs structure",
+		   name, sym->name);
+    }
 
   return p;
 }
diff --git a/gcc/testsuite/gfortran.dg/spellcheck-operator.f90 b/gcc/testsuite/gfortran.dg/spellcheck-operator.f90
new file mode 100644
index 0000000..810a770
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/spellcheck-operator.f90
@@ -0,0 +1,30 @@ 
+! { dg-do compile }
+! test levenshtein based spelling suggestions
+
+module mymod1
+  implicit none
+  contains
+    function something_good (iarg1)
+      integer :: something_good
+      integer, intent(in) :: iarg1
+      something_good = iarg1 + 42
+    end function something_good
+end module mymod1
+
+program spellchekc
+  use mymod1
+  implicit none
+
+  interface operator (.mywrong.)
+    module procedure something_wring ! { dg-error "Procedure .something_wring. in operator interface .mywrong. at .1. is neither function nor subroutine; did you mean .something_good.\\?|User operator procedure .something_wring. at .1. must be a FUNCTION" }
+  end interface
+
+  interface operator (.mygood.)
+    module procedure something_good
+  end interface
+
+  integer :: i, j, added
+  i = 0
+  j = 0
+  added = .mygoof. j ! { dg-error "Unknown operator .mygoof. at .1.; did you mean .mygood.\\?" }
+end program spellchekc
diff --git a/gcc/testsuite/gfortran.dg/spellcheck-procedure.f90 b/gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
new file mode 100644
index 0000000..7923081
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
@@ -0,0 +1,41 @@ 
+! { dg-do compile }
+! test levenshtein based spelling suggestions
+
+module mymod1
+  implicit none
+  contains
+    function something_good (iarg1)
+      integer :: something_good
+      integer, intent(in) :: iarg1
+      something_good = iarg1 + 42
+    end function something_good
+end module mymod1
+
+subroutine bark_unless_zero(iarg)
+  implicit none
+  integer, intent(in) :: iarg
+  if (iarg /= 0) call abort
+end subroutine bark_unless_zero
+
+function myadd(iarg1, iarg2)
+  implicit none
+  integer :: myadd
+  integer, intent(in) :: iarg1, iarg2
+  myadd = iarg1 + iarg2
+end function myadd
+
+program spellchekc
+  use mymod1
+  implicit none
+
+  integer :: i, j, myadd
+  i = 0
+  j = 0
+! I suppose this cannot be made to work, no\\?
+!  call barf_unless_zero(i) ! { -dg-error "; did you mean .bark_unless_zero.\\?" }
+  j = something_goof(j) ! { dg-error "no IMPLICIT type; did you mean .something_good.\\?" }
+  j = myaddd(i, j) ! { dg-error "no IMPLICIT type; did you mean .myadd.\\?" }
+  j = mya(i, j) ! { dg-error "no IMPLICIT type; did you mean .myadd.\\?" }
+  if (j /= 42) call abort
+
+end program spellchekc
diff --git a/gcc/testsuite/gfortran.dg/spellcheck-structure.f90 b/gcc/testsuite/gfortran.dg/spellcheck-structure.f90
new file mode 100644
index 0000000..929e05f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/spellcheck-structure.f90
@@ -0,0 +1,35 @@ 
+! { dg-do compile }
+! test levenshtein based spelling suggestions
+implicit none
+
+!!!!!!!!!!!!!! structure tests !!!!!!!!!!!!!!
+type type1
+   real :: radius
+   integer :: i
+end type type1
+
+type type2
+  integer :: myint
+  type(type1) :: mytype
+end type type2
+
+type type3
+  type(type2) :: type_2
+end type type3
+type type4
+  type(type3) :: type_3
+end type type4
+
+type(type1) :: t1
+t1%radiuz = .0 ! { dg-error ".radiuz. at .1. is not a member of the .type1. structure; did you mean .radius.\\?" }
+t1%x = .0 ! { dg-error ".x. at .1. is not a member of the .type1. structure" }
+type(type2) :: t2
+t2%mytape%radius = .0 ! { dg-error ".mytape. at .1. is not a member of the .type2. structure; did you mean .mytype.\\?" }
+t2%mytype%radious = .0 ! { dg-error ".radious. at .1. is not a member of the .type1. structure; did you mean .radius.\\?" }
+type(type4) :: t4
+t4%type_3%type_2%mytype%radium = 88.0 ! { dg-error ".radium. at .1. is not a member of the .type1. structure; did you mean .radius.\\?" }
+
+!!!!!!!!!!!!!! symbol tests !!!!!!!!!!!!!!
+integer :: iarg1
+iarg2 = 1 ! { dg-error "Symbol .iarg2. at .1. has no IMPLICIT type; did you mean .iarg1.\\?" }
+end