parser: Don't extract diffs from replies

Message ID 20170409152355.17160-1-stephen@that.guru
State Accepted
Headers show

Commit Message

Stephen Finucane April 9, 2017, 3:23 p.m.
In '2a915efd', a check was added to ensure mails prefixed with 'RE:' or
similar would not be parsed as patches. By time this check actually
happens, any patches had already been extracted from the mail thus these
patches were re-added to the mail content before saving the comment.
Unfortunately, this didn't take into account cases where a patch or diff
was not the last part of a mail but rather located somewhere in the
middle of the content:

    Introduction content
    Diff or patch content ***
    Additional content

This would result in mangling of the mail as the patch would _always_ be
appended to the end:

    Introduction content
    Additional content
    Diff or patch content ***

Handle this by only breaking a mail into a comment and a diff if there
is any possibility that we might want to use that diff.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Fixes: 2a915efd ("parser: fix wrong parsing of diff comments")
Closes-bug: #95
Cc: Ralf Baechle <ralf@linux-mips.org>
---
 patchwork/parser.py | 63 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 49 insertions(+), 14 deletions(-)

Comments

Stephen Finucane April 15, 2017, 12:35 a.m. | #1
On Sun, 2017-04-09 at 16:23 +0100, Stephen Finucane wrote:
> In '2a915efd', a check was added to ensure mails prefixed with 'RE:'
> or
> similar would not be parsed as patches. By time this check actually
> happens, any patches had already been extracted from the mail thus
> these
> patches were re-added to the mail content before saving the comment.
> Unfortunately, this didn't take into account cases where a patch or
> diff
> was not the last part of a mail but rather located somewhere in the
> middle of the content:
> 
>     Introduction content
>     Diff or patch content ***
>     Additional content
> 
> This would result in mangling of the mail as the patch would _always_
> be
> appended to the end:
> 
>     Introduction content
>     Additional content
>     Diff or patch content ***
> 
> Handle this by only breaking a mail into a comment and a diff if
> there
> is any possibility that we might want to use that diff.
> 
> Signed-off-by: Stephen Finucane <stephen@that.guru>
> Fixes: 2a915efd ("parser: fix wrong parsing of diff comments")
> Closes-bug: #95
> Cc: Ralf Baechle <ralf@linux-mips.org>

I'm pretty confident with this fix. Applied so I can work on finishing
touches for 2.0.

Stephen

Patch

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 7e2067e..3cd0f97 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -342,10 +342,15 @@  def parse_version(subject, subject_prefixes):
     return 1
 
 
-def find_content(mail):
-    """Extract a comment and potential diff from a mail."""
-    patchbuf = None
-    commentbuf = ''
+def _find_content(mail):
+    """Extract the payload(s) from a mail.
+
+    Handles various payload types.
+
+    :returns: A list of tuples, corresponding the payload and subtype
+        of payload.
+    """
+    results = []
 
     for part in mail.walk():
         if part.get_content_maintype() != 'text':
@@ -381,8 +386,19 @@  def find_content(mail):
 
             # Could not find a valid decoded payload.  Fail.
             if payload is None:
-                return None, None
+                continue
+
+        results.append((payload, subtype))
+
+    return results
+
 
+def find_patch_content(mail):
+    """Extract a comment and potential diff from a mail."""
+    patchbuf = None
+    commentbuf = ''
+
+    for payload, subtype in _find_content(mail):
         if subtype in ['x-patch', 'x-diff']:
             patchbuf = payload
         elif subtype == 'plain':
@@ -399,6 +415,22 @@  def find_content(mail):
     return patchbuf, commentbuf
 
 
+def find_comment_content(mail):
+    """Extract content from a mail."""
+    commentbuf = ''
+
+    for payload, _ in _find_content(mail):
+        if not payload:
+            continue
+
+        commentbuf += payload.strip() + '\n'
+
+    commentbuf = clean_content(commentbuf)
+
+    # keep the method signature the same as find_patch_content
+    return None, commentbuf
+
+
 def find_submission_for_comment(project, refs):
     for ref in refs:
         # first, check for a direct reply
@@ -759,12 +791,7 @@  def parse_mail(mail, list_id=None):
         logger.error('Failed to find a project for email')
         return
 
-    # parse content
-
-    diff, message = find_content(mail)
-
-    if not (diff or message):
-        return  # nothing to work with
+    # parse metadata
 
     msgid = mail.get('Message-Id').strip()
     author = find_author(mail)
@@ -776,6 +803,17 @@  def parse_mail(mail, list_id=None):
     refs = find_references(mail)
     date = find_date(mail)
     headers = find_headers(mail)
+
+    # parse content
+
+    if not is_comment:
+        diff, message = find_patch_content(mail)
+    else:
+        diff, message = find_comment_content(mail)
+
+    if not (diff or message):
+        return  # nothing to work with
+
     pull_url = parse_pull_request(message)
 
     # build objects
@@ -914,9 +952,6 @@  def parse_mail(mail, list_id=None):
     if not submission:
         return
 
-    if is_comment and diff:
-        message += diff
-
     author.save()
 
     comment = Comment(