diff mbox

[ovs-dev,RFC] checkpatch.py: look for misspellings in commit messages

Message ID f7ty3w4eu1v.fsf@redhat.com
State RFC
Headers show

Commit Message

Aaron Conole March 16, 2017, 10:27 p.m. UTC
Teach checkpatch to find misspellings based on the enchant framework.
When a potential error is found, print some additional suggestions for
fixups.

Some additional keywords are kept in an extras list and added to the
dictionary specific to the Open vSwitch project.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Cc: Russell Bryant <rbryant@redhat.com>
Cc: Joe Stringer <joe@ovn.org>
Cc: Eric Garver <e@erig.me>
---
NOTE: This is RFC for a few reasons.  I'm not sure the use - and there
 is probably some cleanup that needs to happen around tags, etc.
 Additionally, I hope to get some feedback on a good enough set of extra
 keywords.

 utilities/checkpatch.py | 42 +++
 1 file changed, 42 insertions(+)

---
2.9.3

Comments

Joe Stringer March 16, 2017, 11 p.m. UTC | #1
On 16 March 2017 at 15:27, Aaron Conole <aconole@redhat.com> wrote:
> Teach checkpatch to find misspellings based on the enchant framework.
> When a potential error is found, print some additional suggestions for
> fixups.
>
> Some additional keywords are kept in an extras list and added to the
> dictionary specific to the Open vSwitch project.
>
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> Cc: Russell Bryant <rbryant@redhat.com>
> Cc: Joe Stringer <joe@ovn.org>
> Cc: Eric Garver <e@erig.me>
> ---
> NOTE: This is RFC for a few reasons.  I'm not sure the use - and there
>  is probably some cleanup that needs to happen around tags, etc.
>  Additionally, I hope to get some feedback on a good enough set of extra
>  keywords.

Overall I'm a bit ambivalent about it. Not sure how important it really is.

>  utilities/checkpatch.py | 42 +++
>  1 file changed, 42 insertions(+)
>
> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
> index 26eb5c3..4e0d319 100755
> --- a/utilities/checkpatch.py
> +++ b/utilities/checkpatch.py
> @@ -24,6 +24,23 @@ __warnings = 0
>  print_file_name = None
>  checking_file = False
>
> +spell_check_patch = True
> +spell_check_dict = None
> +
> +try:
> +    import enchant
> +
> +    extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd', 'netdev',
> +                      'selinux', 'ovs-ctl', 'dpctl', 'ofctl', 'openvswitch',
> +                      'dpdk', 'hugepage', 'hugepages']

I can think of a few extra words - OpenFlow, revalidator, xlate,
mf_field, mf_fields, geneve, TLV, tun_metadata...

Makes me a bit concerned that there are just way too many jargon words
that this checker may complain about.

We could try to cover a reasonable set, and then rather than printing
a warning, print it as an info comment to just request the submitter
to validate they haven't misspelled anything.
Aaron Conole March 23, 2017, 2:38 p.m. UTC | #2
Joe Stringer <joe@ovn.org> writes:

> On 16 March 2017 at 15:27, Aaron Conole <aconole@redhat.com> wrote:
>> Teach checkpatch to find misspellings based on the enchant framework.
>> When a potential error is found, print some additional suggestions for
>> fixups.
>>
>> Some additional keywords are kept in an extras list and added to the
>> dictionary specific to the Open vSwitch project.
>>
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> Cc: Russell Bryant <rbryant@redhat.com>
>> Cc: Joe Stringer <joe@ovn.org>
>> Cc: Eric Garver <e@erig.me>
>> ---
>> NOTE: This is RFC for a few reasons.  I'm not sure the use - and there
>>  is probably some cleanup that needs to happen around tags, etc.
>>  Additionally, I hope to get some feedback on a good enough set of extra
>>  keywords.
>
> Overall I'm a bit ambivalent about it. Not sure how important it really is.
>
>>  utilities/checkpatch.py | 42 +++
>>  1 file changed, 42 insertions(+)
>>
>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>> index 26eb5c3..4e0d319 100755
>> --- a/utilities/checkpatch.py
>> +++ b/utilities/checkpatch.py
>> @@ -24,6 +24,23 @@ __warnings = 0
>>  print_file_name = None
>>  checking_file = False
>>
>> +spell_check_patch = True
>> +spell_check_dict = None
>> +
>> +try:
>> +    import enchant
>> +
>> +    extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd', 'netdev',
>> +                      'selinux', 'ovs-ctl', 'dpctl', 'ofctl', 'openvswitch',
>> +                      'dpdk', 'hugepage', 'hugepages']
>
> I can think of a few extra words - OpenFlow, revalidator, xlate,
> mf_field, mf_fields, geneve, TLV, tun_metadata...
>
> Makes me a bit concerned that there are just way too many jargon words
> that this checker may complain about.
>
> We could try to cover a reasonable set, and then rather than printing
> a warning, print it as an info comment to just request the submitter
> to validate they haven't misspelled anything.

I think I'll sideline it..  It seemed like something useful when I was
going over a different patch, but given your concerns about too much
jargon (with which I tend agree) then let's drop it.  I'd much rather
have the tool not grow annoying hairs.
diff mbox

Patch

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 26eb5c3..4e0d319 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -24,6 +24,23 @@  __warnings = 0
 print_file_name = None
 checking_file = False
 
+spell_check_patch = True
+spell_check_dict = None
+
+try:
+    import enchant
+
+    extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd', 'netdev',
+                      'selinux', 'ovs-ctl', 'dpctl', 'ofctl', 'openvswitch',
+                      'dpdk', 'hugepage', 'hugepages']
+
+    spell_check_dict = enchant.Dict("en_US")
+    for kw in extra_keywords:
+        spell_check_dict.add(kw)
+
+except:
+    print("WARN: Enchant library not available.  Spell checking disabled.\n")
+    spell_check_patch = False
 
 def print_file():
     global print_file_name
@@ -157,6 +174,22 @@  def if_and_for_end_with_bracket_check(line):
     return True
 
 
+def spell_check(line):
+    printed_line = False
+    if spell_check_patch and spell_check_dict is not None:
+        words = line.split(' ')
+        for word in words:
+            strword = re.sub(r'\W+', '', word)
+            if not spell_check_dict.check(strword):
+                if not printed_line:
+                    printed_line = True
+                    print("Spell check in line:")
+                    print("%s\n" % line)
+                suggestions = ','.join(spell_check_dict.suggest(strword))
+                print_warning("Bad spelling '%s', did you mean '%s'?" %
+                              (strword, suggestions))
+
+
 def ovs_checkpatch_parse(text):
     global print_file_name
     lineno = 0
@@ -214,6 +247,8 @@  def ovs_checkpatch_parse(text):
             elif is_co_author.match(line):
                 m = is_co_author.match(line)
                 co_authors.append(m.group(3))
+            else:
+                spell_check(line)
         elif parse == 2:
             print_line = False
             newfile = hunks.match(line)
@@ -290,6 +325,8 @@  def ovs_checkpatch_file(filename):
         print_error("Unable to parse file '%s'. Is it a patch?" % filename)
         return -1
 
+    subj = mail['Subject']
+    spell_check(subj)
     for part in mail.walk():
         if part.get_content_maintype() == 'multipart':
             continue
@@ -326,6 +363,11 @@  if __name__ == '__main__':
             skip_trailing_whitespace_check = True
         elif o in ("-f", "--check-file"):
             checking_file = True
+        elif o in ("-i", "--no-spell-check"):
+            # The following check means we will enforce missing enchant
+            # dependency.
+            if spell_check_patch:
+                spell_check_patch = False
         else:
             print("Unknown option '%s'" % o)
             sys.exit(-1)