From patchwork Fri May 29 16:54:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tom Tromey X-Patchwork-Id: 1300850 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=adacore.com Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49YVzf0VlPz9sSF for ; Sat, 30 May 2020 02:54:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C167C388A832; Fri, 29 May 2020 16:54:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from rock.gnat.com (rock.gnat.com [205.232.38.15]) by sourceware.org (Postfix) with ESMTP id 26C08383E821 for ; Fri, 29 May 2020 16:54:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 26C08383E821 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey@adacore.com Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id B1C9F1165C3; Fri, 29 May 2020 12:54:48 -0400 (EDT) X-Virus-Scanned: Debian amavisd-new at gnat.com Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id y-vZ5+L4eMAd; Fri, 29 May 2020 12:54:48 -0400 (EDT) Received: from murgatroyd.Home (174-16-104-48.hlrn.qwest.net [174.16.104.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by rock.gnat.com (Postfix) with ESMTPSA id 6F0FA11650D; Fri, 29 May 2020 12:54:48 -0400 (EDT) From: Tom Tromey To: gcc-patches@gcc.gnu.org Subject: [PATCH] Prefer simple case changes in spelling suggestions Date: Fri, 29 May 2020 10:54:43 -0600 Message-Id: <20200529165443.24395-1-tromey@adacore.com> X-Mailer: git-send-email 2.21.3 MIME-Version: 1.0 X-Spam-Status: No, score=-15.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Tromey Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" I got this error message when editing gcc and recompiling: ../../gcc/gcc/ada/gcc-interface/decl.c:7714:39: error: ‘DWARF_GNAT_ENCODINGS_all’ was not declared in this scope; did you mean ‘DWARF_GNAT_ENCODINGS_GDB’? 7714 | = debug_info && gnat_encodings == DWARF_GNAT_ENCODINGS_all; | ^~~~~~~~~~~~~~~~~~~~~~~~ | DWARF_GNAT_ENCODINGS_GDB This suggestion could be improved -- what happened here is that I failed to upper-case the word, and DWARF_GNAT_ENCODINGS_ALL was the correct spelling. This patch changes gcc's spell checker to prefer simple case changes when possible. I tested this using the self-tests. A new self-test is also included. gcc/ChangeLog: * spellcheck.c (CASE_COST): New define. (BASE_COST): New define. (get_edit_distance): Recognize case changes. (get_edit_distance_cutoff): Update. (test_edit_distances): Update. (get_old_cutoff): Update. (test_find_closest_string): Add case sensitivity test. --- gcc/spellcheck.c | 114 ++++++++++++++++++++++++++++++----------------- 1 file changed, 74 insertions(+), 40 deletions(-) diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c index 7891260a258..9002617453f 100644 --- a/gcc/spellcheck.c +++ b/gcc/spellcheck.c @@ -25,6 +25,12 @@ along with GCC; see the file COPYING3. If not see #include "spellcheck.h" #include "selftest.h" +/* Cost of a case transformation. */ +#define CASE_COST 1 + +/* Cost of another kind of edit. */ +#define BASE_COST 2 + /* Get the edit distance between the two strings: the minimal number of edits that are needed to change one string into another, where edits can be one-character insertions, removals, or substitutions, @@ -47,9 +53,9 @@ get_edit_distance (const char *s, int len_s, } if (len_s == 0) - return len_t; + return BASE_COST * len_t; if (len_t == 0) - return len_s; + return BASE_COST * len_s; /* We effectively build a matrix where each (i, j) contains the distance between the prefix strings s[0:j] and t[0:i]. @@ -67,7 +73,7 @@ get_edit_distance (const char *s, int len_s, /* The first row is for the case of an empty target string, which we can reach by deleting every character in the source string. */ for (int i = 0; i < len_s + 1; i++) - v_one_ago[i] = i; + v_one_ago[i] = i * BASE_COST; /* Build successive rows. */ for (int i = 0; i < len_t; i++) @@ -83,21 +89,28 @@ get_edit_distance (const char *s, int len_s, /* The initial column is for the case of an empty source string; we can reach prefixes of the target string of length i by inserting i characters. */ - v_next[0] = i + 1; + v_next[0] = (i + 1) * BASE_COST; /* Build the rest of the row by considering neighbors to the north, west and northwest. */ for (int j = 0; j < len_s; j++) { - edit_distance_t cost = (s[j] == t[i] ? 0 : 1); - edit_distance_t deletion = v_next[j] + 1; - edit_distance_t insertion = v_one_ago[j + 1] + 1; + edit_distance_t cost; + + if (s[j] == t[i]) + cost = 0; + else if (TOLOWER (s[j]) == TOLOWER (t[i])) + cost = CASE_COST; + else + cost = BASE_COST; + edit_distance_t deletion = v_next[j] + BASE_COST; + edit_distance_t insertion = v_one_ago[j + 1] + BASE_COST; edit_distance_t substitution = v_one_ago[j] + cost; edit_distance_t cheapest = MIN (deletion, insertion); cheapest = MIN (cheapest, substitution); if (i > 0 && j > 0 && s[j] == t[i - 1] && s[j - 1] == t[i]) { - edit_distance_t transposition = v_two_ago[j - 1] + 1; + edit_distance_t transposition = v_two_ago[j - 1] + BASE_COST; cheapest = MIN (cheapest, transposition); } v_next[j + 1] = cheapest; @@ -185,11 +198,11 @@ get_edit_distance_cutoff (size_t goal_len, size_t candidate_len) /* If the lengths are close, then round down. */ if (max_length - min_length <= 1) /* ...but allow an edit distance of at least 1. */ - return MAX (max_length / 3, 1); + return BASE_COST * MAX (max_length / 3, 1); /* Otherwise, round up (thus giving a little extra leeway to some cases involving insertions/deletions). */ - return (max_length + 2) / 3; + return BASE_COST * (max_length + 2) / 3; } #if CHECKING_P @@ -228,47 +241,50 @@ test_get_edit_distance_both_ways (const char *a, const char *b, static void test_edit_distances () { - test_get_edit_distance_both_ways ("", "nonempty", strlen ("nonempty")); - test_get_edit_distance_both_ways ("saturday", "sunday", 3); - test_get_edit_distance_both_ways ("foo", "m_foo", 2); - test_get_edit_distance_both_ways ("hello_world", "HelloWorld", 3); + test_get_edit_distance_both_ways ("", "nonempty", + BASE_COST * strlen ("nonempty")); + test_get_edit_distance_both_ways ("saturday", "sunday", + BASE_COST * 3); + test_get_edit_distance_both_ways ("foo", "m_foo", BASE_COST * 2); + test_get_edit_distance_both_ways ("hello_world", "HelloWorld", 4); test_get_edit_distance_both_ways - ("the quick brown fox jumps over the lazy dog", "dog", 40); + ("the quick brown fox jumps over the lazy dog", "dog", BASE_COST * 40); test_get_edit_distance_both_ways ("the quick brown fox jumps over the lazy dog", "the quick brown dog jumps over the lazy fox", - 4); + BASE_COST * 4); test_get_edit_distance_both_ways ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,", "All your base are belong to us", - 44); + BASE_COST * 44); test_get_edit_distance_both_ways ("foo", "FOO", 3); - test_get_edit_distance_both_ways ("fee", "deed", 2); - test_get_edit_distance_both_ways ("coorzd1", "coordx1", 2); - test_get_edit_distance_both_ways ("assert", "sqrt", 3); - test_get_edit_distance_both_ways ("PATH_MAX", "INT8_MAX", 3); - test_get_edit_distance_both_ways ("time", "nice", 2); - test_get_edit_distance_both_ways ("bar", "carg", 2); + test_get_edit_distance_both_ways ("fee", "deed", BASE_COST * 2); + test_get_edit_distance_both_ways ("coorzd1", "coordx1", BASE_COST * 2); + test_get_edit_distance_both_ways ("assert", "sqrt", BASE_COST * 3); + test_get_edit_distance_both_ways ("PATH_MAX", "INT8_MAX", BASE_COST * 3); + test_get_edit_distance_both_ways ("time", "nice", BASE_COST * 2); + test_get_edit_distance_both_ways ("bar", "carg", BASE_COST * 2); test_get_edit_distance_both_ways ("gtk_widget_show_all", - "GtkWidgetShowAll", 7); - test_get_edit_distance_both_ways ("m_bar", "bar", 2); - test_get_edit_distance_both_ways ("MACRO", "MACRAME", 3); - test_get_edit_distance_both_ways ("ab", "ac", 1); - test_get_edit_distance_both_ways ("ab", "a", 1); - test_get_edit_distance_both_ways ("a", "b", 1); - test_get_edit_distance_both_ways ("nanl", "name", 2); - test_get_edit_distance_both_ways ("char", "bar", 2); - test_get_edit_distance_both_ways ("-optimize", "fsanitize", 5); - test_get_edit_distance_both_ways ("__DATE__", "__i386__", 4); + "GtkWidgetShowAll", 10); + test_get_edit_distance_both_ways ("m_bar", "bar", BASE_COST * 2); + test_get_edit_distance_both_ways ("MACRO", "MACRAME", BASE_COST * 3); + test_get_edit_distance_both_ways ("ab", "ac", BASE_COST * 1); + test_get_edit_distance_both_ways ("ab", "a", BASE_COST * 1); + test_get_edit_distance_both_ways ("a", "b", BASE_COST * 1); + test_get_edit_distance_both_ways ("nanl", "name", BASE_COST * 2); + test_get_edit_distance_both_ways ("char", "bar", BASE_COST * 2); + test_get_edit_distance_both_ways ("-optimize", "fsanitize", BASE_COST * 5); + test_get_edit_distance_both_ways ("__DATE__", "__i386__", BASE_COST * 4); /* Examples where transposition helps. */ - test_get_edit_distance_both_ways ("ab", "ba", 1); - test_get_edit_distance_both_ways ("ba", "abc", 2); - test_get_edit_distance_both_ways ("coorzd1", "coordz1", 1); + test_get_edit_distance_both_ways ("ab", "ba", BASE_COST * 1); + test_get_edit_distance_both_ways ("ba", "abc", BASE_COST * 2); + test_get_edit_distance_both_ways ("coorzd1", "coordz1", BASE_COST * 1); test_get_edit_distance_both_ways ("abcdefghijklmnopqrstuvwxyz", - "bacdefghijklmnopqrstuvwxzy", 2); - test_get_edit_distance_both_ways ("saturday", "sundya", 4); - test_get_edit_distance_both_ways ("signed", "singed", 1); + "bacdefghijklmnopqrstuvwxzy", + BASE_COST * 2); + test_get_edit_distance_both_ways ("saturday", "sundya", BASE_COST * 4); + test_get_edit_distance_both_ways ("signed", "singed", BASE_COST * 1); } /* Subroutine of test_get_edit_distance_cutoff, for emulating the @@ -277,7 +293,7 @@ test_edit_distances () static edit_distance_t get_old_cutoff (size_t goal_len, size_t candidate_len) { - return MAX (goal_len, candidate_len) / 2; + return BASE_COST * MAX (goal_len, candidate_len) / 2; } /* Verify that the cutoff for "meaningfulness" of suggestions is at least as @@ -428,6 +444,24 @@ test_find_closest_string () candidates.safe_push("coordy1"); candidates.safe_push("coordz1"); ASSERT_STREQ ("coordz1", find_closest_string ("coorzd1", &candidates)); + + candidates.truncate (0); + candidates.safe_push ("DWARF_GNAT_ENCODINGS_GDB"); + candidates.safe_push ("DWARF_GNAT_ENCODINGS_ALL"); + candidates.safe_push ("DWARF_GNAT_ENCODINGS_MINIMAL"); + ASSERT_STREQ ("DWARF_GNAT_ENCODINGS_ALL", + find_closest_string ("DWARF_GNAT_ENCODINGS_all", + &candidates)); + + /* The same as the previous test, but with a different order of + candidates. */ + candidates.truncate (0); + candidates.safe_push ("DWARF_GNAT_ENCODINGS_ALL"); + candidates.safe_push ("DWARF_GNAT_ENCODINGS_GDB"); + candidates.safe_push ("DWARF_GNAT_ENCODINGS_MINIMAL"); + ASSERT_STREQ ("DWARF_GNAT_ENCODINGS_ALL", + find_closest_string ("DWARF_GNAT_ENCODINGS_all", + &candidates)); } /* Test data for test_metric_conditions. */