[1/1] parser: fix parsing of patches with headings

Message ID 20180628194211.19085-1-dzickus@redhat.com
State Accepted
Headers show
Series
  • [1/1] parser: fix parsing of patches with headings
Related show

Commit Message

Don Zickus June 28, 2018, 7:42 p.m.
From: Jiri Benc <jbenc@redhat.com>

Some people tend to use lines full of '=' as a fancy way to format headings
in their commit messages in a rst-like style. However, the current parser
treats such lines as a beginning of a diff.

The only currently used tool that produces diffs with '=' lines is quilt in
the default configuration. However, even with quilt, the diff looks this
way:

    Index: dir/file
    ===================================================================
    --- dir.orig/file
    +++ dir/file
    @@ ...etc...

It's enough to match on the "Index:" line. The state of the state machine is
kept at 1 when it encounters the '=' line, thus it's safe to remove the
match on '=' completely.

[This prevents us from properly parsing metadata out of the changelog. -dcz ]

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 patchwork/parser.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Don Zickus July 12, 2018, 6:21 p.m. | #1
On Thu, Jun 28, 2018 at 03:42:11PM -0400, Don Zickus wrote:
> From: Jiri Benc <jbenc@redhat.com>
> 
> Some people tend to use lines full of '=' as a fancy way to format headings
> in their commit messages in a rst-like style. However, the current parser
> treats such lines as a beginning of a diff.

bump.

Cheers,
Don

> 
> The only currently used tool that produces diffs with '=' lines is quilt in
> the default configuration. However, even with quilt, the diff looks this
> way:
> 
>     Index: dir/file
>     ===================================================================
>     --- dir.orig/file
>     +++ dir/file
>     @@ ...etc...
> 
> It's enough to match on the "Index:" line. The state of the state machine is
> kept at 1 when it encounters the '=' line, thus it's safe to remove the
> match on '=' completely.
> 
> [This prevents us from properly parsing metadata out of the changelog. -dcz ]
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
>  patchwork/parser.py | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index a40f931..a2db403 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -745,7 +745,7 @@ def parse_patch(content):
>      # state specified the line we just saw, and what to expect next
>      state = 0
>      # 0: text
> -    # 1: suspected patch header (diff, ====, Index:)
> +    # 1: suspected patch header (diff, Index:)
>      # 2: patch header line 1 (---)
>      # 3: patch header line 2 (+++)
>      # 4: patch hunk header line (@@ line)
> @@ -753,7 +753,7 @@ def parse_patch(content):
>      # 6: patch meta header (rename from/rename to)
>      #
>      # valid transitions:
> -    #  0 -> 1 (diff, ===, Index:)
> +    #  0 -> 1 (diff, Index:)
>      #  0 -> 2 (---)
>      #  1 -> 2 (---)
>      #  2 -> 3 (+++)
> @@ -776,7 +776,7 @@ def parse_patch(content):
>          line += '\n'
>  
>          if state == 0:
> -            if line.startswith('diff ') or line.startswith('===') \
> +            if line.startswith('diff ') \
>                      or line.startswith('Index: '):
>                  state = 1
>                  buf += line
> -- 
> 2.14.4
> 
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Stephen Finucane July 13, 2018, 5:33 p.m. | #2
On Thu, 2018-06-28 at 15:42 -0400, Don Zickus wrote:
> From: Jiri Benc <jbenc@redhat.com>
> 
> Some people tend to use lines full of '=' as a fancy way to format headings
> in their commit messages in a rst-like style. However, the current parser
> treats such lines as a beginning of a diff.
> 
> The only currently used tool that produces diffs with '=' lines is quilt in
> the default configuration. However, even with quilt, the diff looks this
> way:
> 
>     Index: dir/file
>     ===================================================================
>     --- dir.orig/file
>     +++ dir/file
>     @@ ...etc...
> 
> It's enough to match on the "Index:" line. The state of the state machine is
> kept at 1 when it encounters the '=' line, thus it's safe to remove the
> match on '=' completely.
> 
> [This prevents us from properly parsing metadata out of the changelog. -dcz ]
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>

Sorry about the delay getting to this - both Daniel and I have been up
the walls, it seems.

This looks good to me and has been applied. Would it be possible to get
a sample patch for both the "patch with rst-style commit messages" and
the quilt-style patches so I can whip up some regressions tests for
this? You should probably mail them directly to me as the list will
likely drop them.

Stephen

> ---
>  patchwork/parser.py | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index a40f931..a2db403 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -745,7 +745,7 @@ def parse_patch(content):
>      # state specified the line we just saw, and what to expect next
>      state = 0
>      # 0: text
> -    # 1: suspected patch header (diff, ====, Index:)
> +    # 1: suspected patch header (diff, Index:)
>      # 2: patch header line 1 (---)
>      # 3: patch header line 2 (+++)
>      # 4: patch hunk header line (@@ line)
> @@ -753,7 +753,7 @@ def parse_patch(content):
>      # 6: patch meta header (rename from/rename to)
>      #
>      # valid transitions:
> -    #  0 -> 1 (diff, ===, Index:)
> +    #  0 -> 1 (diff, Index:)
>      #  0 -> 2 (---)
>      #  1 -> 2 (---)
>      #  2 -> 3 (+++)
> @@ -776,7 +776,7 @@ def parse_patch(content):
>          line += '\n'
>  
>          if state == 0:
> -            if line.startswith('diff ') or line.startswith('===') \
> +            if line.startswith('diff ') \
>                      or line.startswith('Index: '):
>                  state = 1
>                  buf += line
Don Zickus July 13, 2018, 6:28 p.m. | #3
On Fri, Jul 13, 2018 at 06:33:10PM +0100, Stephen Finucane wrote:
> On Thu, 2018-06-28 at 15:42 -0400, Don Zickus wrote:
> > From: Jiri Benc <jbenc@redhat.com>
> > 
> > Some people tend to use lines full of '=' as a fancy way to format headings
> > in their commit messages in a rst-like style. However, the current parser
> > treats such lines as a beginning of a diff.
> > 
> > The only currently used tool that produces diffs with '=' lines is quilt in
> > the default configuration. However, even with quilt, the diff looks this
> > way:
> > 
> >     Index: dir/file
> >     ===================================================================
> >     --- dir.orig/file
> >     +++ dir/file
> >     @@ ...etc...
> > 
> > It's enough to match on the "Index:" line. The state of the state machine is
> > kept at 1 when it encounters the '=' line, thus it's safe to remove the
> > match on '=' completely.
> > 
> > [This prevents us from properly parsing metadata out of the changelog. -dcz ]
> > 
> > Signed-off-by: Jiri Benc <jbenc@redhat.com>
> 
> Sorry about the delay getting to this - both Daniel and I have been up
> the walls, it seems.
> 
> This looks good to me and has been applied. Would it be possible to get
> a sample patch for both the "patch with rst-style commit messages" and
> the quilt-style patches so I can whip up some regressions tests for
> this? You should probably mail them directly to me as the list will
> likely drop them.

I realize I have another patch in this area that has been in our internal
patchwork tree for awhile that I will send out too while you are looking at
this.

Cheers,
Don

> 
> Stephen
> 
> > ---
> >  patchwork/parser.py | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > index a40f931..a2db403 100644
> > --- a/patchwork/parser.py
> > +++ b/patchwork/parser.py
> > @@ -745,7 +745,7 @@ def parse_patch(content):
> >      # state specified the line we just saw, and what to expect next
> >      state = 0
> >      # 0: text
> > -    # 1: suspected patch header (diff, ====, Index:)
> > +    # 1: suspected patch header (diff, Index:)
> >      # 2: patch header line 1 (---)
> >      # 3: patch header line 2 (+++)
> >      # 4: patch hunk header line (@@ line)
> > @@ -753,7 +753,7 @@ def parse_patch(content):
> >      # 6: patch meta header (rename from/rename to)
> >      #
> >      # valid transitions:
> > -    #  0 -> 1 (diff, ===, Index:)
> > +    #  0 -> 1 (diff, Index:)
> >      #  0 -> 2 (---)
> >      #  1 -> 2 (---)
> >      #  2 -> 3 (+++)
> > @@ -776,7 +776,7 @@ def parse_patch(content):
> >          line += '\n'
> >  
> >          if state == 0:
> > -            if line.startswith('diff ') or line.startswith('===') \
> > +            if line.startswith('diff ') \
> >                      or line.startswith('Index: '):
> >                  state = 1
> >                  buf += line
>

Patch

diff --git a/patchwork/parser.py b/patchwork/parser.py
index a40f931..a2db403 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -745,7 +745,7 @@  def parse_patch(content):
     # state specified the line we just saw, and what to expect next
     state = 0
     # 0: text
-    # 1: suspected patch header (diff, ====, Index:)
+    # 1: suspected patch header (diff, Index:)
     # 2: patch header line 1 (---)
     # 3: patch header line 2 (+++)
     # 4: patch hunk header line (@@ line)
@@ -753,7 +753,7 @@  def parse_patch(content):
     # 6: patch meta header (rename from/rename to)
     #
     # valid transitions:
-    #  0 -> 1 (diff, ===, Index:)
+    #  0 -> 1 (diff, Index:)
     #  0 -> 2 (---)
     #  1 -> 2 (---)
     #  2 -> 3 (+++)
@@ -776,7 +776,7 @@  def parse_patch(content):
         line += '\n'
 
         if state == 0:
-            if line.startswith('diff ') or line.startswith('===') \
+            if line.startswith('diff ') \
                     or line.startswith('Index: '):
                 state = 1
                 buf += line