diff mbox series

[ovs-dev,v4] checkpatch: Detect "trojan source" attack

Message ID 20211130162053.19069-1-mkp@redhat.com
State Accepted
Headers show
Series [ovs-dev,v4] checkpatch: Detect "trojan source" attack | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Mike Pattrick Nov. 30, 2021, 4:20 p.m. UTC
Recently there has been a lot of press about the "trojan source" attack,
where Unicode characters are used to obfuscate the true functionality of
code. This attack didn't effect OVS, but adding the check here will help
guard against it sneaking in later.

Signed-off-by: Mike Pattrick <mkp@redhat.com>
---
Changes in v2:
   - Now all unicode characters will result in an error.

Changes in v3:
   - Added a test to validate behavior

Changes in v4:
   - Simplified regex

---
 tests/checkpatch.at     | 22 ++++++++++++++++++++++
 utilities/checkpatch.py | 11 +++++++++++
 2 files changed, 33 insertions(+)

Comments

Gaetan Rivet Dec. 1, 2021, 3:12 p.m. UTC | #1
On Tue, Nov 30, 2021, at 17:20, Mike Pattrick wrote:
> Recently there has been a lot of press about the "trojan source" attack,
> where Unicode characters are used to obfuscate the true functionality of
> code. This attack didn't effect OVS, but adding the check here will help
> guard against it sneaking in later.
>
> Signed-off-by: Mike Pattrick <mkp@redhat.com>

Thanks for the updated regex.

Acked-by: Gaetan Rivet <grive@u256.net>
Ilya Maximets Jan. 4, 2022, 6:12 p.m. UTC | #2
On 11/30/21 17:20, Mike Pattrick wrote:
> Recently there has been a lot of press about the "trojan source" attack,
> where Unicode characters are used to obfuscate the true functionality of
> code. This attack didn't effect OVS, but adding the check here will help
> guard against it sneaking in later.
> 
> Signed-off-by: Mike Pattrick <mkp@redhat.com>
> ---
> Changes in v2:
>    - Now all unicode characters will result in an error.
> 
> Changes in v3:
>    - Added a test to validate behavior
> 
> Changes in v4:
>    - Simplified regex
> 
> ---
>  tests/checkpatch.at     | 22 ++++++++++++++++++++++
>  utilities/checkpatch.py | 11 +++++++++++
>  2 files changed, 33 insertions(+)

Some weird stuff is going on here.  On one of my systems (rhel 8.5)
I'm getting a consistent test failure.  The reason for that appears
to be a different default locale used while executing checkpatch.py
from a testsuite.  If checkpatch.py is invoked by hands, the
locale.getpreferredencoding() returns 'UTF-8', but if invoked from
a testsuite, I'm getting 'ANSI_X3.4-1968' instead.

This impacts the file read, since open() by default uses the
locale.getpreferredencoding() encoding, so email.message_from_file()
fails to read the file with unicode symbols and throws an exception
resulting with the following error:

  ERROR: Unable to parse file 'test.patch'. Is it a patch?

That fails the unit test.
I am not sure why the default locale is different while running
under the testsuite on this system.  The following change seems
to fix the problem:

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 395d0fcde..8c7faa419 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -954,7 +954,7 @@ def ovs_checkpatch_print_result():
 
 def ovs_checkpatch_file(filename):
     try:
-        mail = email.message_from_file(open(filename, 'r'))
+        mail = email.message_from_file(open(filename, 'r', encoding='utf8'))
     except:
         print_error("Unable to parse file '%s'. Is it a patch?" % filename)
         return -1
---

If that looks good to you, I can fold the change in before applying
the patch.

What do you think?

Best regards, Ilya Maximets.
Mike Pattrick Jan. 4, 2022, 6:35 p.m. UTC | #3
On Tue, Jan 4, 2022 at 1:12 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 11/30/21 17:20, Mike Pattrick wrote:
> > Recently there has been a lot of press about the "trojan source" attack,
> > where Unicode characters are used to obfuscate the true functionality of
> > code. This attack didn't effect OVS, but adding the check here will help
> > guard against it sneaking in later.
> >
> > Signed-off-by: Mike Pattrick <mkp@redhat.com>
> > ---
> > Changes in v2:
> >    - Now all unicode characters will result in an error.
> >
> > Changes in v3:
> >    - Added a test to validate behavior
> >
> > Changes in v4:
> >    - Simplified regex
> >
> > ---
> >  tests/checkpatch.at     | 22 ++++++++++++++++++++++
> >  utilities/checkpatch.py | 11 +++++++++++
> >  2 files changed, 33 insertions(+)
>
> Some weird stuff is going on here.  On one of my systems (rhel 8.5)
> I'm getting a consistent test failure.  The reason for that appears
> to be a different default locale used while executing checkpatch.py
> from a testsuite.  If checkpatch.py is invoked by hands, the
> locale.getpreferredencoding() returns 'UTF-8', but if invoked from
> a testsuite, I'm getting 'ANSI_X3.4-1968' instead.
>
> This impacts the file read, since open() by default uses the
> locale.getpreferredencoding() encoding, so email.message_from_file()
> fails to read the file with unicode symbols and throws an exception
> resulting with the following error:
>
>   ERROR: Unable to parse file 'test.patch'. Is it a patch?
>
> That fails the unit test.
> I am not sure why the default locale is different while running
> under the testsuite on this system.  The following change seems
> to fix the problem:
>
> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
> index 395d0fcde..8c7faa419 100755
> --- a/utilities/checkpatch.py
> +++ b/utilities/checkpatch.py
> @@ -954,7 +954,7 @@ def ovs_checkpatch_print_result():
>
>  def ovs_checkpatch_file(filename):
>      try:
> -        mail = email.message_from_file(open(filename, 'r'))
> +        mail = email.message_from_file(open(filename, 'r', encoding='utf8'))
>      except:
>          print_error("Unable to parse file '%s'. Is it a patch?" % filename)
>          return -1
> ---
>
> If that looks good to you, I can fold the change in before applying
> the patch.
>
> What do you think?

That is strange, I reproduced this behavior on RHEL 8.5; however, RHEL
9.0 beta uses the UTF8 locale.

This change looks good to me!

-M

>
> Best regards, Ilya Maximets.
>
Ilya Maximets Jan. 4, 2022, 7:59 p.m. UTC | #4
On 1/4/22 19:35, Mike Pattrick wrote:
> On Tue, Jan 4, 2022 at 1:12 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 11/30/21 17:20, Mike Pattrick wrote:
>>> Recently there has been a lot of press about the "trojan source" attack,
>>> where Unicode characters are used to obfuscate the true functionality of
>>> code. This attack didn't effect OVS, but adding the check here will help
>>> guard against it sneaking in later.
>>>
>>> Signed-off-by: Mike Pattrick <mkp@redhat.com>
>>> ---
>>> Changes in v2:
>>>    - Now all unicode characters will result in an error.
>>>
>>> Changes in v3:
>>>    - Added a test to validate behavior
>>>
>>> Changes in v4:
>>>    - Simplified regex
>>>
>>> ---
>>>  tests/checkpatch.at     | 22 ++++++++++++++++++++++
>>>  utilities/checkpatch.py | 11 +++++++++++
>>>  2 files changed, 33 insertions(+)
>>
>> Some weird stuff is going on here.  On one of my systems (rhel 8.5)
>> I'm getting a consistent test failure.  The reason for that appears
>> to be a different default locale used while executing checkpatch.py
>> from a testsuite.  If checkpatch.py is invoked by hands, the
>> locale.getpreferredencoding() returns 'UTF-8', but if invoked from
>> a testsuite, I'm getting 'ANSI_X3.4-1968' instead.
>>
>> This impacts the file read, since open() by default uses the
>> locale.getpreferredencoding() encoding, so email.message_from_file()
>> fails to read the file with unicode symbols and throws an exception
>> resulting with the following error:
>>
>>   ERROR: Unable to parse file 'test.patch'. Is it a patch?
>>
>> That fails the unit test.
>> I am not sure why the default locale is different while running
>> under the testsuite on this system.  The following change seems
>> to fix the problem:
>>
>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>> index 395d0fcde..8c7faa419 100755
>> --- a/utilities/checkpatch.py
>> +++ b/utilities/checkpatch.py
>> @@ -954,7 +954,7 @@ def ovs_checkpatch_print_result():
>>
>>  def ovs_checkpatch_file(filename):
>>      try:
>> -        mail = email.message_from_file(open(filename, 'r'))
>> +        mail = email.message_from_file(open(filename, 'r', encoding='utf8'))
>>      except:
>>          print_error("Unable to parse file '%s'. Is it a patch?" % filename)
>>          return -1
>> ---
>>
>> If that looks good to you, I can fold the change in before applying
>> the patch.
>>
>> What do you think?
> 
> That is strange, I reproduced this behavior on RHEL 8.5; however, RHEL
> 9.0 beta uses the UTF8 locale.
> 
> This change looks good to me!


Thanks, Mike and Gaetan!  With the change above, applied.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/tests/checkpatch.at b/tests/checkpatch.at
index 68c917af9..fadb625e9 100755
--- a/tests/checkpatch.at
+++ b/tests/checkpatch.at
@@ -424,3 +424,25 @@  try_checkpatch \
 "
 
 AT_CLEANUP
+
+AT_SETUP([checkpatch - Unicode code])
+try_checkpatch \
+   "COMMON_PATCH_HEADER
+    +     if (snowman == ☃️) {  /* Emoji
+    +     void НelloWorld() {  /* Homoglyph
+    +     ة /* ;C++            /* BiDi
+    " \
+    "ERROR: Inappropriate non-ascii characters detected.
+    #8 FILE: A.c:1:
+         if (snowman == ☃️) {  /* Emoji
+
+ERROR: Inappropriate non-ascii characters detected.
+    #9 FILE: A.c:2:
+         void НelloWorld() {  /* Homoglyph
+
+ERROR: Inappropriate non-ascii characters detected.
+    #10 FILE: A.c:3:
+         ة /* ;C++            /* BiDi
+"
+
+AT_CLEANUP
diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index bf95358d5..84d5ded8c 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -181,6 +181,7 @@  __regex_added_doc_rst = re.compile(
 __regex_empty_return = re.compile(r'\s*return;')
 __regex_if_macros = re.compile(r'^ +(%s) \([\S]([\s\S]+[\S])*\) { +\\' %
                                __parenthesized_constructs)
+__regex_nonascii_characters = re.compile("[^\u0000-\u007f]")
 
 skip_leading_whitespace_check = False
 skip_trailing_whitespace_check = False
@@ -294,6 +295,11 @@  def pointer_whitespace_check(line):
     return __regex_ptr_declaration_missing_whitespace.search(line) is not None
 
 
+def nonascii_character_check(line):
+    """Return TRUE if inappropriate Unicode characters are detected """
+    return __regex_nonascii_characters.search(line) is not None
+
+
 def cast_whitespace_check(line):
     """Return TRUE if there is no space between the '()' used in a cast and
        the expression whose type is cast, i.e.: '(void *)foo'"""
@@ -565,6 +571,11 @@  checks = [
      'print':
      lambda: print_error("Inappropriate spacing in pointer declaration")},
 
+    {'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
+     'check': lambda x: nonascii_character_check(x),
+     'print':
+     lambda: print_error("Inappropriate non-ascii characters detected.")},
+
     {'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
      'prereq': lambda x: not is_comment_line(x),
      'check': lambda x: cast_whitespace_check(x),