Add stricter checks when parsing incoming patches

Message ID 20180713184628.24488-1-dzickus@redhat.com
State Changes Requested
Headers show
Series
  • Add stricter checks when parsing incoming patches
Related show

Commit Message

Don Zickus July 13, 2018, 6:46 p.m.
The patch parser has a multi-stage if-then-else statement that tries to
determine which part of a patch is the comment and what piece is the
patch itself.

Unfortunately there is a gap in between state 1 and state 2 where chunks
of the comment can be accidentally added to the patch.

For example if a comment has a line that begins with 'diff' or 'Index'.
That will trigger the state 0 to state 1 transition and sit there until
many comment lines later a '--- ' line is found to move it to state 2.

As a result many comment lines are truncated and stuck into a patch buffer
instead.  This makes it more difficult to process metadata found in the
comment buffer.

This patch adds some strict rules based on various common patch preambles
like git, quilt, and rcs.

Now if the patch is in state 1 because of a 'diff ' or 'Index:', it needs
to expect the next common preamble to continue to stay in state 1 otherwise
accept the state 0 to state 1 transition was an accident and move back to
state 0.

This patch has been in our internal patchwork instance for 8 years now and is
the result of various patches we have seen internally.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 patchwork/parser.py | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Stephen Finucane Aug. 29, 2018, 10:57 a.m. | #1
On Fri, 2018-07-13 at 14:46 -0400, Don Zickus wrote:
> The patch parser has a multi-stage if-then-else statement that tries to
> determine which part of a patch is the comment and what piece is the
> patch itself.
> 
> Unfortunately there is a gap in between state 1 and state 2 where chunks
> of the comment can be accidentally added to the patch.
> 
> For example if a comment has a line that begins with 'diff' or 'Index'.
> That will trigger the state 0 to state 1 transition and sit there until
> many comment lines later a '--- ' line is found to move it to state 2.
> 
> As a result many comment lines are truncated and stuck into a patch buffer
> instead.  This makes it more difficult to process metadata found in the
> comment buffer.
> 
> This patch adds some strict rules based on various common patch preambles
> like git, quilt, and rcs.
> 
> Now if the patch is in state 1 because of a 'diff ' or 'Index:', it needs
> to expect the next common preamble to continue to stay in state 1 otherwise
> accept the state 0 to state 1 transition was an accident and move back to
> state 0.
> 
> This patch has been in our internal patchwork instance for 8 years now and is
> the result of various patches we have seen internally.
> 
> Signed-off-by: Don Zickus <dzickus@redhat.com>

I applied this locally and ran the unit tests.

  $ docker-compose run --rm web tox -e py27-django111

It's failing pretty hard.

  Ran 455 tests in 145.925s

  FAILED (failures=62, errors=6, skipped=1)
  Destroying test database for alias 'default'...

I'm guessing a good few of these are as a result of how we auto-
generate patches but I imagine a few more of them might be valid. Could
you have a look at these failures and see how we can remedy them?

Stephen

> ---
>  patchwork/parser.py | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index a2db403..5566c69 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -792,6 +792,19 @@ def parse_patch(content):
>  
>              if line.startswith(('rename from ', 'rename to ')):
>                  state = 6
> +            elif line.startswith('diff ') or line.startswith('Index: ') \
> +                    or line.startswith('deleted file ') \
> +                    or line.startswith('index ') \
> +                    or line.startswith('new file ') \
> +                    or line.startswith('====') \
> +                    or line.startswith('RCS file: ') \
> +                    or line.startswith('retrieving revision '):
> +                state = 1
> +            else:
> +                state = 0
> +                commentbuf += buf
> +                buf = ''
> +
>          elif state == 2:
>              if line.startswith('+++ '):
>                  state = 3

Patch

diff --git a/patchwork/parser.py b/patchwork/parser.py
index a2db403..5566c69 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -792,6 +792,19 @@  def parse_patch(content):
 
             if line.startswith(('rename from ', 'rename to ')):
                 state = 6
+            elif line.startswith('diff ') or line.startswith('Index: ') \
+                    or line.startswith('deleted file ') \
+                    or line.startswith('index ') \
+                    or line.startswith('new file ') \
+                    or line.startswith('====') \
+                    or line.startswith('RCS file: ') \
+                    or line.startswith('retrieving revision '):
+                state = 1
+            else:
+                state = 0
+                commentbuf += buf
+                buf = ''
+
         elif state == 2:
             if line.startswith('+++ '):
                 state = 3