Message ID | 1468516947.13104.50.camel@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Jul 14, 2016 at 01:22:27PM -0400, David Malcolm wrote: > I wrote a patch for this, similar to yours, but with a slightly > different approach for handling builtins. > > I think it's reasonable to offer suggestions for misspellings of e.g. > "snprintf" even if the pertinent header hasn't been included, since it > will help guide a novice developer to the missing header. For example, > if the user has: > > void > test_7 (int i, int j) > { > int buffer[100]; > snprint (buffer, 100, "%i of %i", i, j); > } > > ...then they get: > > test.c: In function ‘test_7’: > test.c:5:3: warning: implicit declaration of function ‘snprint’; did > you mean ‘snprintf’? [-Wimplicit-function-declaration] > snprint (buffer, 100, "%i of %i", i, j); > ^~~~~~~ > snprintf > > ...and on correcting it, they get the "you forgot <stdio.h>" warning: I find it weird. What is special on the builtins? Lots of standard headers contain lots of functions that aren't backed by builtins, some of them used more often than the builtins; the reason for builtins is often just that gcc wants to be able to optimize them somehow. E.g. note fopen is not a builtin, so you still will not suggest it if stdio.h is not included. Not all the builtins out there are even enabled in all compilation modes when the containing header is included, it can depend on the language version, feature test macros, etc. So, by considering invisible builtins you could e.g. suggest C99 builtins in strict C89 compilation mode where the header would not define it, etc. In any case, this is your baby, so it is up to the FE maintainers and you to get agreement on, I won't fight on this. > What I find surprising is the suggestion of "carg"; I've seen this a > few times as a suggestion. I confess to having to look up carg; IMHO > "carg" is much less likely to be a useful suggestion than, say, > "printf". For the short identifiers, the question is if the maximum Damerau-Levenshtein distance (do you handle transpositions now?) shouldn't be lowered (say for at most 4 chars identifiers only consider levenshtein distance 1 and not more, e.g. for 3 character identifiers allowing distance of 2 or 3 means you pretty much suggest any other 3 char identifier), regardless of if it is a builtin or not. > @@ -3992,6 +3995,26 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind) > if (kind == FUZZY_LOOKUP_TYPENAME) > if (TREE_CODE (binding->decl) != TYPE_DECL) > continue; > + if (kind == FUZZY_LOOKUP_FUNCTION_NAME) > + if (TREE_CODE (binding->decl) != FUNCTION_DECL) > + continue; Note this is handled in the second patch I've posted, and this one doesn't handle function pointer variables/parameters. Jakub
On Thu, 2016-07-14 at 19:47 +0200, Jakub Jelinek wrote: > On Thu, Jul 14, 2016 at 01:22:27PM -0400, David Malcolm wrote: > > I wrote a patch for this, similar to yours, but with a slightly > > different approach for handling builtins. > > > > I think it's reasonable to offer suggestions for misspellings of > > e.g. > > "snprintf" even if the pertinent header hasn't been included, since > > it > > will help guide a novice developer to the missing header. For > > example, > > if the user has: > > > > void > > test_7 (int i, int j) > > { > > int buffer[100]; > > snprint (buffer, 100, "%i of %i", i, j); > > } > > > > ...then they get: > > > > test.c: In function ‘test_7’: > > test.c:5:3: warning: implicit declaration of function ‘snprint’; > > did > > you mean ‘snprintf’? [-Wimplicit-function-declaration] > > snprint (buffer, 100, "%i of %i", i, j); > > ^~~~~~~ > > snprintf > > > > ...and on correcting it, they get the "you forgot <stdio.h>" > > warning: > > I find it weird. What is special on the builtins? Lots of standard > headers > contain lots of functions that aren't backed by builtins, some of > them used > more often than the builtins; the reason for builtins is often just > that gcc > wants to be able to optimize them somehow. > > E.g. note fopen is not a builtin, so you still will not suggest it if > stdio.h is not included. > > Not all the builtins out there are even enabled in all compilation > modes > when the containing header is included, it can depend on the language > version, feature test macros, etc. So, by considering invisible > builtins > you could e.g. suggest C99 builtins in strict C89 compilation mode > where the > header would not define it, etc. > > In any case, this is your baby, so it is up to the FE maintainers and > you to > get agreement on, I won't fight on this. Thanks. I agree with the points you raise; in that light I'm fine with your patch. > > What I find surprising is the suggestion of "carg"; I've seen this > > a > > few times as a suggestion. I confess to having to look up carg; > > IMHO > > "carg" is much less likely to be a useful suggestion than, say, > > "printf". > > For the short identifiers, the question is if the maximum Damerau > -Levenshtein > distance (do you handle transpositions now?) We don't currently handle transpositions, but it would be good to do so (PR other/69968). > shouldn't be lowered (say for at > most 4 chars identifiers only consider levenshtein distance 1 and not > more, > e.g. for 3 character identifiers allowing distance of 2 or 3 means > you pretty > much suggest any other 3 char identifier), regardless of if it is a > builtin or not. The logic in get_best_meaningful_candidate is currently: unsigned int cutoff = MAX (m_goal_len, m_best_candidate_len) / 2; if (m_best_distance > cutoff) return NULL; For a pair of 3 character identifiers, cutoff will be 3 / 2 == 1, so it will suggest within a distance of 1 and reject if the distance is 2 or 3. Or is there a bug? That cutoff could definitely be improved. Perhaps the MAX should be changed to a MIN? Or just base it off the goal length e.g. unsigned int cutoff = m_goal_len / 2; thus implicitly rejecting suggestions when m_goal_len == 1. That would eliminate the "bar" to "char" suggestion in the PR, since m_goal_len == 3 would have a cutoff of 1. > > > @@ -3992,6 +3995,26 @@ lookup_name_fuzzy (tree name, enum > > lookup_name_fuzzy_kind kind) > > if (kind == FUZZY_LOOKUP_TYPENAME) > > if (TREE_CODE (binding->decl) != TYPE_DECL) > > continue; > > + if (kind == FUZZY_LOOKUP_FUNCTION_NAME) > > + if (TREE_CODE (binding->decl) != FUNCTION_DECL) > > + continue; > > Note this is handled in the second patch I've posted, and this one > doesn't > handle function pointer variables/parameters. > > Jakub
On Thu, Jul 14, 2016 at 02:22:36PM -0400, David Malcolm wrote: > I agree with the points you raise; in that light I'm fine with your > patch. Ok, Marek has approved it earlier, so I went ahead and committed. > > shouldn't be lowered (say for at > > most 4 chars identifiers only consider levenshtein distance 1 and not > > more, > > e.g. for 3 character identifiers allowing distance of 2 or 3 means > > you pretty > > much suggest any other 3 char identifier), regardless of if it is a > > builtin or not. > > The logic in get_best_meaningful_candidate is currently: > > unsigned int cutoff = MAX (m_goal_len, m_best_candidate_len) / 2; > if (m_best_distance > cutoff) > return NULL; > > For a pair of 3 character identifiers, cutoff will be 3 / 2 == 1, so it > will suggest within a distance of 1 and reject if the distance is 2 or > 3. Or is there a bug? > > That cutoff could definitely be improved. Perhaps the MAX should be > changed to a MIN? Or just base it off the goal length e.g. > unsigned int cutoff = m_goal_len / 2; > thus implicitly rejecting > suggestions when m_goal_len == 1. > That would eliminate the "bar" to > "char" suggestion in the PR, since m_goal_len == 3 would have a cutoff > of 1. I'm afraid I don't have enough experience, so we'll need to look at lost of suggestions and find the right tuning. Jakub
From 7f3cd32fd4338db7b8b8f09b16429975c5907cb6 Mon Sep 17 00:00:00 2001 From: David Malcolm <dmalcolm@redhat.com> Date: Wed, 13 Jul 2016 13:40:22 -0400 Subject: [PATCH] FIXME: PR c/71858 - Don't always offer names of builtins as misspellings --- gcc/c-family/c-common.h | 3 +++ gcc/c/c-decl.c | 25 ++++++++++++++++++++++++- gcc/testsuite/gcc.dg/spellcheck-identifiers.c | 15 +++++++++++++++ 3 files changed, 42 insertions(+), 1 deletion(-) diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 44d98d1..32e68d0 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -994,6 +994,9 @@ enum lookup_name_fuzzy_kind { /* Names of types. */ FUZZY_LOOKUP_TYPENAME, + /* Names of functions. */ + FUZZY_LOOKUP_FUNCTION_NAME, + /* Any name. */ FUZZY_LOOKUP_NAME }; diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index dc14485..de03ad4 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -3090,7 +3090,7 @@ implicit_decl_warning (location_t loc, tree id, tree olddecl) bool warned; const char *hint = NULL; if (!olddecl) - hint = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME); + hint = lookup_name_fuzzy (id, FUZZY_LOOKUP_FUNCTION_NAME); if (flag_isoc99) if (hint) @@ -3984,6 +3984,9 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind) { if (!binding->id) continue; + + gcc_assert (TREE_CODE (binding->id) == IDENTIFIER_NODE); + /* Don't use bindings from implicitly declared functions, as they were likely misspellings themselves. */ if (TREE_CODE (binding->decl) == FUNCTION_DECL) @@ -3992,6 +3995,26 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind) if (kind == FUZZY_LOOKUP_TYPENAME) if (TREE_CODE (binding->decl) != TYPE_DECL) continue; + if (kind == FUZZY_LOOKUP_FUNCTION_NAME) + if (TREE_CODE (binding->decl) != FUNCTION_DECL) + continue; + + /* There are numerous builtin functions with short names, such as + "carg" which are prone to being suggested, and are generally + unhelpful as suggestions. (PR c/71858) + Avoid suggesting them by imposing a minimum length on builtin + functions as suggestions. + A minimum length of 5 is rather arbitrary, but it allows for + "fopen", but prevents "carg". A more sophisticated approach + might be to analyze how often each builtin is used in real + code, and filter out the less-frequently-used ones. */ + const int builtin_length_threshold = 5; + if (TREE_CODE (binding->decl) == FUNCTION_DECL) + if (DECL_IS_BUILTIN (binding->decl)) + if (IDENTIFIER_LENGTH (binding->id) < builtin_length_threshold) + continue; + + /* Passed all tests; consider this as a suggestion. */ bm.consider (binding->id); } diff --git a/gcc/testsuite/gcc.dg/spellcheck-identifiers.c b/gcc/testsuite/gcc.dg/spellcheck-identifiers.c index 22a12d0..2111499 100644 --- a/gcc/testsuite/gcc.dg/spellcheck-identifiers.c +++ b/gcc/testsuite/gcc.dg/spellcheck-identifiers.c @@ -134,3 +134,18 @@ test_7 (int i, int j) snprintf { dg-end-multiline-output "" } */ } + +/* PR c/71858 - Don't always offer names of builtins as misspellings; + in particular, don't offer "carg" as a suggestion for "bar". */ + +int +test_8 (void) +{ + return bar (); + /* { dg-bogus "did you mean" "" { target *-*-* } 144 } */ + /* { dg-warning "implicit declaration" "" { target *-*-* } 144 } */ + /* { dg-begin-multiline-output "" } + return bar (); + ^~~ + { dg-end-multiline-output "" } */ +} -- 1.8.5.3