diff mbox series

[ovs-dev] checkpatch.py: Load multiple codespell dictionaries.

Message ID 20250211122331.235395-1-roid@nvidia.com
State Superseded
Delegated to: aaron conole
Headers show
Series [ovs-dev] checkpatch.py: Load multiple codespell dictionaries. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/cirrus-robot success cirrus build: passed

Commit Message

Roi Dayan Feb. 11, 2025, 12:23 p.m. UTC
Load dictionary_code.txt in addition to the default dictionary.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Acked-by: Salem Sol <salems@nvidia.com>
---
 utilities/checkpatch.py | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Aaron Conole Feb. 12, 2025, 5:18 p.m. UTC | #1
Hi Roi,

Roi Dayan via dev <ovs-dev@openvswitch.org> writes:

> Load dictionary_code.txt in addition to the default dictionary.

The code dictionary isn't loaded by default with codespell (codespell_lib/_codespell.py)::

  _builtin_default = "clear,rare"

And there are some questionable conversions in that dictionary (like
uint to unit and stdio to studio).  I think adding the _rare dictionary
could make sense, but perhaps we should be more careful when adding the
others.

Can you add the rationale for turning these on?  I think it's okay to
turn on more than one codespell dict, but we should consider the
individual dictionaries, too.

> Signed-off-by: Roi Dayan <roid@nvidia.com>
> Acked-by: Salem Sol <salems@nvidia.com>
> ---
>  utilities/checkpatch.py | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
> index f8caeb811604..9571380c291f 100755
> --- a/utilities/checkpatch.py
> +++ b/utilities/checkpatch.py
> @@ -42,14 +42,16 @@ missing_authors = []
>  def open_spell_check_dict():
>      import enchant
>  
> +    codespell_files = []
>      try:
>          import codespell_lib
>          codespell_dir = os.path.dirname(codespell_lib.__file__)
> -        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
> -        if not os.path.exists(codespell_file):
> -            codespell_file = ''
> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
> +            fn = os.path.join(codespell_dir, 'data', fn)
> +            if os.path.exists(fn):
> +                codespell_files.append(fn)
>      except:
> -        codespell_file = ''
> +        pass
>  
>      try:
>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>  
>          spell_check_dict = enchant.Dict("en_US")
>  
> -        if codespell_file:
> -            with open(codespell_file) as f:
> +        for fn in codespell_files:
> +            with open(fn) as f:
>                  for line in f.readlines():
>                      words = line.strip().split('>')[1].strip(', ').split(',')
>                      for word in words:
Roi Dayan Feb. 13, 2025, 7:49 a.m. UTC | #2
On 12/02/2025 19:18, Aaron Conole wrote:
> Hi Roi,
> 
> Roi Dayan via dev <ovs-dev@openvswitch.org> writes:
> 
>> Load dictionary_code.txt in addition to the default dictionary.
> 
> The code dictionary isn't loaded by default with codespell (codespell_lib/_codespell.py)::
> 
>   _builtin_default = "clear,rare"
> 
> And there are some questionable conversions in that dictionary (like
> uint to unit and stdio to studio).  I think adding the _rare dictionary
> could make sense, but perhaps we should be more careful when adding the
> others.
> 
> Can you add the rationale for turning these on?  I think it's okay to
> turn on more than one codespell dict, but we should consider the
> individual dictionaries, too.

I don't think it matters what is loaded by default or not as the script
uses enchant and not codespell.
Also don't look at the conversions as it's not being used since we don't
use codespell. In the code below it's being stripped to take only the
final wording and add to enchant as allowed words.

I looked again also in the others and I think most of the words already in
enchant dictionary but loading them won't harm.
I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
as we use the enchant en_US dictionary which should be equal more or less.
The other has more unique words which I think this is what we can say in the
commit message.

What do you think?


Files I see in codespell path:

dictionary_code.txt
dictionary_en-GB_to_en-US.txt
dictionary_informal.txt
dictionary_names.txt
dictionary_rare.txt
dictionary.txt
dictionary_usage.txt


> 
>> Signed-off-by: Roi Dayan <roid@nvidia.com>
>> Acked-by: Salem Sol <salems@nvidia.com>
>> ---
>>  utilities/checkpatch.py | 14 ++++++++------
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>> index f8caeb811604..9571380c291f 100755
>> --- a/utilities/checkpatch.py
>> +++ b/utilities/checkpatch.py
>> @@ -42,14 +42,16 @@ missing_authors = []
>>  def open_spell_check_dict():
>>      import enchant
>>  
>> +    codespell_files = []
>>      try:
>>          import codespell_lib
>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>> -        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
>> -        if not os.path.exists(codespell_file):
>> -            codespell_file = ''
>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>> +            fn = os.path.join(codespell_dir, 'data', fn)
>> +            if os.path.exists(fn):
>> +                codespell_files.append(fn)
>>      except:
>> -        codespell_file = ''
>> +        pass
>>  
>>      try:
>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>  
>>          spell_check_dict = enchant.Dict("en_US")
>>  
>> -        if codespell_file:
>> -            with open(codespell_file) as f:
>> +        for fn in codespell_files:
>> +            with open(fn) as f:
>>                  for line in f.readlines():
>>                      words = line.strip().split('>')[1].strip(', ').split(',')
>>                      for word in words:
>
Aaron Conole Feb. 18, 2025, 1:30 p.m. UTC | #3
Roi Dayan <roid@nvidia.com> writes:

> On 12/02/2025 19:18, Aaron Conole wrote:
>> Hi Roi,
>> 
>> Roi Dayan via dev <ovs-dev@openvswitch.org> writes:
>> 
>>> Load dictionary_code.txt in addition to the default dictionary.
>> 
>> The code dictionary isn't loaded by default with codespell (codespell_lib/_codespell.py)::
>> 
>>   _builtin_default = "clear,rare"
>> 
>> And there are some questionable conversions in that dictionary (like
>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>> could make sense, but perhaps we should be more careful when adding the
>> others.
>> 
>> Can you add the rationale for turning these on?  I think it's okay to
>> turn on more than one codespell dict, but we should consider the
>> individual dictionaries, too.
>
> I don't think it matters what is loaded by default or not as the script
> uses enchant and not codespell.

Yes, but the point is the codespell authors don't think that this
dictionary is a good default.

> Also don't look at the conversions as it's not being used since we don't
> use codespell. In the code below it's being stripped to take only the
> final wording and add to enchant as allowed words.
>
> I looked again also in the others and I think most of the words already in
> enchant dictionary but loading them won't harm.
> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
> as we use the enchant en_US dictionary which should be equal more or less.
> The other has more unique words which I think this is what we can say in the
> commit message.
>
> What do you think?

Yes, as you noted most of the words are already there.  I actually ran
through many of the RHS spellings, and they already appear (as you
noted).  Actually, we only are not already getting:

  * copiable
  * clonable
  * subpatches
  * traceback
  * tracebacks

Just 5 words and they are not actually universally agreed upon
spellings.  For example, if I use something like wiktionary (not the
most authoritative source, I agree):

  https://en.wiktionary.org/wiki/clonable#English

It says that 'cloneable' is an alternative form used in computing
context.  Enchant suggests 'clone able' or 'clone-able'

Likewise, there isn't an accepted form of copiable (and enchant does
similar, including with subpatches).

So I guess 'traceback' and 'tracebacks' are for sure the ones that there
isn't yet any ambiguity.

Anyway, I guess it's okay to add, but we should probably consider
looking at all the dictionaries and seeing which ones make sense to add
as well.  Otherwise, it's quite a bit of change here for something that
could be done by just adding the words above directly (ie: you make 7
lines of change here, vs adding words to extra_keywords).

> Files I see in codespell path:
>
> dictionary_code.txt
> dictionary_en-GB_to_en-US.txt
> dictionary_informal.txt
> dictionary_names.txt
> dictionary_rare.txt
> dictionary.txt
> dictionary_usage.txt
>
>
>> 
>>> Signed-off-by: Roi Dayan <roid@nvidia.com>
>>> Acked-by: Salem Sol <salems@nvidia.com>
>>> ---
>>>  utilities/checkpatch.py | 14 ++++++++------
>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>> index f8caeb811604..9571380c291f 100755
>>> --- a/utilities/checkpatch.py
>>> +++ b/utilities/checkpatch.py
>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>  def open_spell_check_dict():
>>>      import enchant
>>>  
>>> +    codespell_files = []
>>>      try:
>>>          import codespell_lib
>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>> -        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
>>> -        if not os.path.exists(codespell_file):
>>> -            codespell_file = ''
>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>> +            if os.path.exists(fn):
>>> +                codespell_files.append(fn)
>>>      except:
>>> -        codespell_file = ''
>>> +        pass
>>>  
>>>      try:
>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>  
>>>          spell_check_dict = enchant.Dict("en_US")
>>>  
>>> -        if codespell_file:
>>> -            with open(codespell_file) as f:
>>> +        for fn in codespell_files:
>>> +            with open(fn) as f:
>>>                  for line in f.readlines():
>>>                      words = line.strip().split('>')[1].strip(', ').split(',')
>>>                      for word in words:
>>
Roi Dayan Feb. 23, 2025, 8:13 a.m. UTC | #4
On 18/02/2025 15:30, Aaron Conole wrote:
> Roi Dayan <roid@nvidia.com> writes:
> 
>> On 12/02/2025 19:18, Aaron Conole wrote:
>>> Hi Roi,
>>>
>>> Roi Dayan via dev <ovs-dev@openvswitch.org> writes:
>>>
>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>
>>> The code dictionary isn't loaded by default with codespell (codespell_lib/_codespell.py)::
>>>
>>>   _builtin_default = "clear,rare"
>>>
>>> And there are some questionable conversions in that dictionary (like
>>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>>> could make sense, but perhaps we should be more careful when adding the
>>> others.
>>>
>>> Can you add the rationale for turning these on?  I think it's okay to
>>> turn on more than one codespell dict, but we should consider the
>>> individual dictionaries, too.
>>
>> I don't think it matters what is loaded by default or not as the script
>> uses enchant and not codespell.
> 
> Yes, but the point is the codespell authors don't think that this
> dictionary is a good default.
> 
>> Also don't look at the conversions as it's not being used since we don't
>> use codespell. In the code below it's being stripped to take only the
>> final wording and add to enchant as allowed words.
>>
>> I looked again also in the others and I think most of the words already in
>> enchant dictionary but loading them won't harm.
>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>> as we use the enchant en_US dictionary which should be equal more or less.
>> The other has more unique words which I think this is what we can say in the
>> commit message.
>>
>> What do you think?
> 
> Yes, as you noted most of the words are already there.  I actually ran
> through many of the RHS spellings, and they already appear (as you
> noted).  Actually, we only are not already getting:
> 
>   * copiable
>   * clonable
>   * subpatches
>   * traceback
>   * tracebacks
> 
> Just 5 words and they are not actually universally agreed upon
> spellings.  For example, if I use something like wiktionary (not the
> most authoritative source, I agree):
> 
>   https://en.wiktionary.org/wiki/clonable#English
> 
> It says that 'cloneable' is an alternative form used in computing
> context.  Enchant suggests 'clone able' or 'clone-able'
> 
> Likewise, there isn't an accepted form of copiable (and enchant does
> similar, including with subpatches).
> 
> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
> isn't yet any ambiguity.
> 
> Anyway, I guess it's okay to add, but we should probably consider
> looking at all the dictionaries and seeing which ones make sense to add
> as well.  Otherwise, it's quite a bit of change here for something that
> could be done by just adding the words above directly (ie: you make 7
> lines of change here, vs adding words to extra_keywords).
> 

yes but this change allows newer versions of codespell with potential
updates to the dictionary to catch in.

I looked a bit in the other dictionaries.
We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
use enchant for core words.
Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
dictionary_names.txt as they seem to be more for spelling mistakes rather
than introducing words.

So the only exception is dictionary.txt which is already loaded and
dictionary_code.txt which seems to add those more accepted words
like you noted.

So I don't think we need to add the others. from here we can keep
updating the internal list.

What do you think?

>> Files I see in codespell path:
>>
>> dictionary_code.txt
>> dictionary_en-GB_to_en-US.txt
>> dictionary_informal.txt
>> dictionary_names.txt
>> dictionary_rare.txt
>> dictionary.txt
>> dictionary_usage.txt
>>
>>
>>>
>>>> Signed-off-by: Roi Dayan <roid@nvidia.com>
>>>> Acked-by: Salem Sol <salems@nvidia.com>
>>>> ---
>>>>  utilities/checkpatch.py | 14 ++++++++------
>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>> index f8caeb811604..9571380c291f 100755
>>>> --- a/utilities/checkpatch.py
>>>> +++ b/utilities/checkpatch.py
>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>>  def open_spell_check_dict():
>>>>      import enchant
>>>>  
>>>> +    codespell_files = []
>>>>      try:
>>>>          import codespell_lib
>>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>> -        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
>>>> -        if not os.path.exists(codespell_file):
>>>> -            codespell_file = ''
>>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>>> +            if os.path.exists(fn):
>>>> +                codespell_files.append(fn)
>>>>      except:
>>>> -        codespell_file = ''
>>>> +        pass
>>>>  
>>>>      try:
>>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>  
>>>>          spell_check_dict = enchant.Dict("en_US")
>>>>  
>>>> -        if codespell_file:
>>>> -            with open(codespell_file) as f:
>>>> +        for fn in codespell_files:
>>>> +            with open(fn) as f:
>>>>                  for line in f.readlines():
>>>>                      words = line.strip().split('>')[1].strip(', ').split(',')
>>>>                      for word in words:
>>>
>
Aaron Conole March 5, 2025, 3:25 p.m. UTC | #5
Roi Dayan <roid@nvidia.com> writes:

> On 18/02/2025 15:30, Aaron Conole wrote:
>> Roi Dayan <roid@nvidia.com> writes:
>> 
>>> On 12/02/2025 19:18, Aaron Conole wrote:
>>>> Hi Roi,
>>>>
>>>> Roi Dayan via dev <ovs-dev@openvswitch.org> writes:
>>>>
>>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>>
>>>> The code dictionary isn't loaded by default with codespell (codespell_lib/_codespell.py)::
>>>>
>>>>   _builtin_default = "clear,rare"
>>>>
>>>> And there are some questionable conversions in that dictionary (like
>>>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>>>> could make sense, but perhaps we should be more careful when adding the
>>>> others.
>>>>
>>>> Can you add the rationale for turning these on?  I think it's okay to
>>>> turn on more than one codespell dict, but we should consider the
>>>> individual dictionaries, too.
>>>
>>> I don't think it matters what is loaded by default or not as the script
>>> uses enchant and not codespell.
>> 
>> Yes, but the point is the codespell authors don't think that this
>> dictionary is a good default.
>> 
>>> Also don't look at the conversions as it's not being used since we don't
>>> use codespell. In the code below it's being stripped to take only the
>>> final wording and add to enchant as allowed words.
>>>
>>> I looked again also in the others and I think most of the words already in
>>> enchant dictionary but loading them won't harm.
>>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>>> as we use the enchant en_US dictionary which should be equal more or less.
>>> The other has more unique words which I think this is what we can say in the
>>> commit message.
>>>
>>> What do you think?
>> 
>> Yes, as you noted most of the words are already there.  I actually ran
>> through many of the RHS spellings, and they already appear (as you
>> noted).  Actually, we only are not already getting:
>> 
>>   * copiable
>>   * clonable
>>   * subpatches
>>   * traceback
>>   * tracebacks
>> 
>> Just 5 words and they are not actually universally agreed upon
>> spellings.  For example, if I use something like wiktionary (not the
>> most authoritative source, I agree):
>> 
>>   https://en.wiktionary.org/wiki/clonable#English
>> 
>> It says that 'cloneable' is an alternative form used in computing
>> context.  Enchant suggests 'clone able' or 'clone-able'
>> 
>> Likewise, there isn't an accepted form of copiable (and enchant does
>> similar, including with subpatches).
>> 
>> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
>> isn't yet any ambiguity.
>> 
>> Anyway, I guess it's okay to add, but we should probably consider
>> looking at all the dictionaries and seeing which ones make sense to add
>> as well.  Otherwise, it's quite a bit of change here for something that
>> could be done by just adding the words above directly (ie: you make 7
>> lines of change here, vs adding words to extra_keywords).
>> 
>
> yes but this change allows newer versions of codespell with potential
> updates to the dictionary to catch in.
>
> I looked a bit in the other dictionaries.
> We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
> use enchant for core words.
> Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
> dictionary_names.txt as they seem to be more for spelling mistakes rather
> than introducing words.
>
> So the only exception is dictionary.txt which is already loaded and
> dictionary_code.txt which seems to add those more accepted words
> like you noted.
>
> So I don't think we need to add the others. from here we can keep
> updating the internal list.
>
> What do you think?

I've been thinking about it, and I think it could be useful to have this
facility.  Can you make the dictionary selection also configurable via
command line (similar to codespell option)?

>>> Files I see in codespell path:
>>>
>>> dictionary_code.txt
>>> dictionary_en-GB_to_en-US.txt
>>> dictionary_informal.txt
>>> dictionary_names.txt
>>> dictionary_rare.txt
>>> dictionary.txt
>>> dictionary_usage.txt
>>>
>>>
>>>>
>>>>> Signed-off-by: Roi Dayan <roid@nvidia.com>
>>>>> Acked-by: Salem Sol <salems@nvidia.com>
>>>>> ---
>>>>>  utilities/checkpatch.py | 14 ++++++++------
>>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>>> index f8caeb811604..9571380c291f 100755
>>>>> --- a/utilities/checkpatch.py
>>>>> +++ b/utilities/checkpatch.py
>>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>>>  def open_spell_check_dict():
>>>>>      import enchant
>>>>>  
>>>>> +    codespell_files = []
>>>>>      try:
>>>>>          import codespell_lib
>>>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>>> -        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
>>>>> -        if not os.path.exists(codespell_file):
>>>>> -            codespell_file = ''
>>>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>>>> +            if os.path.exists(fn):
>>>>> +                codespell_files.append(fn)
>>>>>      except:
>>>>> -        codespell_file = ''
>>>>> +        pass
>>>>>  
>>>>>      try:
>>>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>>  
>>>>>          spell_check_dict = enchant.Dict("en_US")
>>>>>  
>>>>> -        if codespell_file:
>>>>> -            with open(codespell_file) as f:
>>>>> +        for fn in codespell_files:
>>>>> +            with open(fn) as f:
>>>>>                  for line in f.readlines():
>>>>>                      words = line.strip().split('>')[1].strip(', ').split(',')
>>>>>                      for word in words:
>>>>
>>
Roi Dayan March 10, 2025, 12:50 p.m. UTC | #6
On 05/03/2025 17:25, Aaron Conole wrote:
> Roi Dayan <roid@nvidia.com> writes:
> 
>> On 18/02/2025 15:30, Aaron Conole wrote:
>>> Roi Dayan <roid@nvidia.com> writes:
>>>
>>>> On 12/02/2025 19:18, Aaron Conole wrote:
>>>>> Hi Roi,
>>>>>
>>>>> Roi Dayan via dev <ovs-dev@openvswitch.org> writes:
>>>>>
>>>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>>>
>>>>> The code dictionary isn't loaded by default with codespell (codespell_lib/_codespell.py)::
>>>>>
>>>>>   _builtin_default = "clear,rare"
>>>>>
>>>>> And there are some questionable conversions in that dictionary (like
>>>>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>>>>> could make sense, but perhaps we should be more careful when adding the
>>>>> others.
>>>>>
>>>>> Can you add the rationale for turning these on?  I think it's okay to
>>>>> turn on more than one codespell dict, but we should consider the
>>>>> individual dictionaries, too.
>>>>
>>>> I don't think it matters what is loaded by default or not as the script
>>>> uses enchant and not codespell.
>>>
>>> Yes, but the point is the codespell authors don't think that this
>>> dictionary is a good default.
>>>
>>>> Also don't look at the conversions as it's not being used since we don't
>>>> use codespell. In the code below it's being stripped to take only the
>>>> final wording and add to enchant as allowed words.
>>>>
>>>> I looked again also in the others and I think most of the words already in
>>>> enchant dictionary but loading them won't harm.
>>>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>>>> as we use the enchant en_US dictionary which should be equal more or less.
>>>> The other has more unique words which I think this is what we can say in the
>>>> commit message.
>>>>
>>>> What do you think?
>>>
>>> Yes, as you noted most of the words are already there.  I actually ran
>>> through many of the RHS spellings, and they already appear (as you
>>> noted).  Actually, we only are not already getting:
>>>
>>>   * copiable
>>>   * clonable
>>>   * subpatches
>>>   * traceback
>>>   * tracebacks
>>>
>>> Just 5 words and they are not actually universally agreed upon
>>> spellings.  For example, if I use something like wiktionary (not the
>>> most authoritative source, I agree):
>>>
>>>   https://en.wiktionary.org/wiki/clonable#English
>>>
>>> It says that 'cloneable' is an alternative form used in computing
>>> context.  Enchant suggests 'clone able' or 'clone-able'
>>>
>>> Likewise, there isn't an accepted form of copiable (and enchant does
>>> similar, including with subpatches).
>>>
>>> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
>>> isn't yet any ambiguity.
>>>
>>> Anyway, I guess it's okay to add, but we should probably consider
>>> looking at all the dictionaries and seeing which ones make sense to add
>>> as well.  Otherwise, it's quite a bit of change here for something that
>>> could be done by just adding the words above directly (ie: you make 7
>>> lines of change here, vs adding words to extra_keywords).
>>>
>>
>> yes but this change allows newer versions of codespell with potential
>> updates to the dictionary to catch in.
>>
>> I looked a bit in the other dictionaries.
>> We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
>> use enchant for core words.
>> Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
>> dictionary_names.txt as they seem to be more for spelling mistakes rather
>> than introducing words.
>>
>> So the only exception is dictionary.txt which is already loaded and
>> dictionary_code.txt which seems to add those more accepted words
>> like you noted.
>>
>> So I don't think we need to add the others. from here we can keep
>> updating the internal list.
>>
>> What do you think?
> 
> I've been thinking about it, and I think it could be useful to have this
> facility.  Can you make the dictionary selection also configurable via
> command line (similar to codespell option)?
> 

yes. sending v2.

>>>> Files I see in codespell path:
>>>>
>>>> dictionary_code.txt
>>>> dictionary_en-GB_to_en-US.txt
>>>> dictionary_informal.txt
>>>> dictionary_names.txt
>>>> dictionary_rare.txt
>>>> dictionary.txt
>>>> dictionary_usage.txt
>>>>
>>>>
>>>>>
>>>>>> Signed-off-by: Roi Dayan <roid@nvidia.com>
>>>>>> Acked-by: Salem Sol <salems@nvidia.com>
>>>>>> ---
>>>>>>  utilities/checkpatch.py | 14 ++++++++------
>>>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>>>> index f8caeb811604..9571380c291f 100755
>>>>>> --- a/utilities/checkpatch.py
>>>>>> +++ b/utilities/checkpatch.py
>>>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>>>>  def open_spell_check_dict():
>>>>>>      import enchant
>>>>>>  
>>>>>> +    codespell_files = []
>>>>>>      try:
>>>>>>          import codespell_lib
>>>>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>>>> -        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
>>>>>> -        if not os.path.exists(codespell_file):
>>>>>> -            codespell_file = ''
>>>>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>>>>> +            if os.path.exists(fn):
>>>>>> +                codespell_files.append(fn)
>>>>>>      except:
>>>>>> -        codespell_file = ''
>>>>>> +        pass
>>>>>>  
>>>>>>      try:
>>>>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>>>  
>>>>>>          spell_check_dict = enchant.Dict("en_US")
>>>>>>  
>>>>>> -        if codespell_file:
>>>>>> -            with open(codespell_file) as f:
>>>>>> +        for fn in codespell_files:
>>>>>> +            with open(fn) as f:
>>>>>>                  for line in f.readlines():
>>>>>>                      words = line.strip().split('>')[1].strip(', ').split(',')
>>>>>>                      for word in words:
>>>>>
>>>
>
diff mbox series

Patch

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index f8caeb811604..9571380c291f 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -42,14 +42,16 @@  missing_authors = []
 def open_spell_check_dict():
     import enchant
 
+    codespell_files = []
     try:
         import codespell_lib
         codespell_dir = os.path.dirname(codespell_lib.__file__)
-        codespell_file = os.path.join(codespell_dir, 'data', 'dictionary.txt')
-        if not os.path.exists(codespell_file):
-            codespell_file = ''
+        for fn in ['dictionary.txt', 'dictionary_code.txt']:
+            fn = os.path.join(codespell_dir, 'data', fn)
+            if os.path.exists(fn):
+                codespell_files.append(fn)
     except:
-        codespell_file = ''
+        pass
 
     try:
         extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
@@ -121,8 +123,8 @@  def open_spell_check_dict():
 
         spell_check_dict = enchant.Dict("en_US")
 
-        if codespell_file:
-            with open(codespell_file) as f:
+        for fn in codespell_files:
+            with open(fn) as f:
                 for line in f.readlines():
                     words = line.strip().split('>')[1].strip(', ').split(',')
                     for word in words: