diff mbox

[04/10] bin/parsemail: fix wrong parsing of diff comments

Message ID 1471943023-1112-5-git-send-email-wengpingbo@gmail.com
State Superseded
Headers show

Commit Message

WEN Pingbo Aug. 23, 2016, 9:03 a.m. UTC
If the subject of a submission is prefixed by 'Re:', then it can't be a
patch or cover letter.

Signed-off-by: WEN Pingbo <wengpingbo@gmail.com>
---
 patchwork/bin/parsemail.py | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

Comments

Stephen Finucane Aug. 29, 2016, 11:09 p.m. UTC | #1
On 23 Aug 17:03, WEN Pingbo wrote:
> If the subject of a submission is prefixed by 'Re:', then it can't be a
> patch or cover letter.
> 
> Signed-off-by: WEN Pingbo <wengpingbo@gmail.com>

Good idea. Some comments below, but I'm mostly happy with this. Thanks
:)

Stephen

PS: This is significant enough to submit by itself (outside of the
series) IMO

> ---
>  patchwork/bin/parsemail.py | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/patchwork/bin/parsemail.py b/patchwork/bin/parsemail.py
> index 48f809f..6f644da 100755
> --- a/patchwork/bin/parsemail.py
> +++ b/patchwork/bin/parsemail.py
> @@ -341,6 +341,7 @@ def clean_subject(subject, drop_prefixes=None):
>            from the subject
>      """
>      re_re = re.compile(r'^(re|fwd?)[:\s]\s*', re.I)
> +    comment_re = re.compile(r'^(re)[:\s]\s*', re.I)
>      prefix_re = re.compile(r'^\[([^\]]*)\]\s*(.*)$')
>      subject = clean_header(subject)
>  
> @@ -351,6 +352,7 @@ def clean_subject(subject, drop_prefixes=None):
>  
>      drop_prefixes.append('patch')
>  
> +    is_comment = comment_re.match(subject)
>      # remove Re:, Fwd:, etc
>      subject = re_re.sub(' ', subject)
>  
> @@ -374,7 +376,7 @@ def clean_subject(subject, drop_prefixes=None):
>      if prefixes:
>          subject = '[%s] %s' % (','.join(prefixes), subject)
>  
> -    return (subject, prefixes)
> +    return (is_comment, subject, prefixes)

Perhaps this is a bit of a nit, but I don't think a function called
'clean_subject' should be reporting something like 'is_comment'. Could
you add a new function and call that instead?

>  
>  
>  def clean_content(content):
> @@ -479,7 +481,7 @@ def parse_mail(mail, list_id=None):
>  
>      msgid = mail.get('Message-Id').strip()
>      author = find_author(mail)
> -    name, prefixes = clean_subject(mail.get('Subject'), [project.linkname])
> +    is_comment, name, prefixes = clean_subject(mail.get('Subject'), [project.linkname])
>      x, n = parse_series_marker(prefixes)
>      refs = find_references(mail)
>      date = find_date(mail)
> @@ -488,7 +490,7 @@ def parse_mail(mail, list_id=None):
>  
>      # build objects
>  
> -    if diff or pull_url:  # patches or pull requests
> +    if not is_comment and (diff or pull_url):  # patches or pull requests
>          # we delay the saving until we know we have a patch.
>          author.save()
>  
> @@ -518,16 +520,18 @@ def parse_mail(mail, list_id=None):
>          # however, we need to see if a match already exists and, if
>          # not, assume that it is indeed a new cover letter
>          is_cover_letter = False
> -        if not refs == []:
> -            try:
> -                CoverLetter.objects.all().get(name=name)
> -            except CoverLetter.DoesNotExist:  # no match => new cover
> +
> +        if not is_comment:
> +            if not refs == []:
> +                try:
> +                    CoverLetter.objects.all().get(name=name)
> +                except CoverLetter.DoesNotExist:  # no match => new cover
> +                    is_cover_letter = True
> +                except CoverLetter.MultipleObjectsReturned:
> +                    # if multiple cover letters are found, just ignore
> +                    pass
> +            else:
>                  is_cover_letter = True
> -            except CoverLetter.MultipleObjectsReturned:
> -                # if multiple cover letters are found, just ignore
> -                pass
> -        else:
> -            is_cover_letter = True
>  
>          if is_cover_letter:
>              author.save()
> @@ -552,6 +556,9 @@ def parse_mail(mail, list_id=None):
>      if not submission:
>          return
>  
> +    if is_comment and diff:
> +        message += diff
> +
>      author.save()
>  
>      comment = Comment(
> -- 
> 1.9.1
> 
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Kalle Valo Aug. 30, 2016, 2:28 p.m. UTC | #2
Stephen Finucane <stephenfinucane@hotmail.com> writes:

> On 23 Aug 17:03, WEN Pingbo wrote:
>> If the subject of a submission is prefixed by 'Re:', then it can't be a
>> patch or cover letter.
>> 
>> Signed-off-by: WEN Pingbo <wengpingbo@gmail.com>
>
> Good idea. Some comments below, but I'm mostly happy with this. Thanks
> :)
>
> Stephen
>
> PS: This is significant enough to submit by itself (outside of the
> series) IMO

Indeed, this is a really annoying feature and splits the discussion
unnecessarily in patchwork. I would say this is on my top three
wishlist.
Johannes Berg Aug. 30, 2016, 5:15 p.m. UTC | #3
On Tue, 2016-08-23 at 17:03 +0800, WEN Pingbo wrote:
> If the subject of a submission is prefixed by 'Re:', then it can't be
> a patch or cover letter.
> 
Some (very stupid) email clients, I think some versions of Outlook or
Outlook Express, internationalize/localize the "Re:" string [1].

Should we attempt to handle that here as well?

johannes

[1] which leads to stupid prefixing when somebody I email with uses the
German version - "AW: Re: AW: Re: AW: Re: [...]" since it's not even
smart enough to detect the "Re:" and replace it with "AW:" or similar.

I have another thread here that just kept growing "AW:" since it *did*
remove the "Re:", but *didn't* find the existing "AW:" afterwards, that
was with "Microsoft Outlook 15.0".
Daniel Axtens Aug. 31, 2016, 12:04 a.m. UTC | #4
> On Tue, 2016-08-23 at 17:03 +0800, WEN Pingbo wrote:
>> If the subject of a submission is prefixed by 'Re:', then it can't be
>> a patch or cover letter.
>> 
> Some (very stupid) email clients, I think some versions of Outlook or
> Outlook Express, internationalize/localize the "Re:" string [1].
>
> Should we attempt to handle that here as well?
>
> johannes
>
> [1] which leads to stupid prefixing when somebody I email with uses the
> German version - "AW: Re: AW: Re: AW: Re: [...]" since it's not even
> smart enough to detect the "Re:" and replace it with "AW:" or similar.
>
> I have another thread here that just kept growing "AW:" since it *did*
> remove the "Re:", but *didn't* find the existing "AW:" afterwards, that
> was with "Microsoft Outlook 15.0".

I suppose we could have a configurable list of these stored either in
code or in the database... Let's get basic Re: support in and then grow
it from there.

Regards,
Daniel

> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Stephen Finucane Sept. 1, 2016, 6:35 p.m. UTC | #5
On 31 Aug 10:04, Daniel Axtens wrote:
> > On Tue, 2016-08-23 at 17:03 +0800, WEN Pingbo wrote:
> >> If the subject of a submission is prefixed by 'Re:', then it can't be
> >> a patch or cover letter.
> >> 
> > Some (very stupid) email clients, I think some versions of Outlook or
> > Outlook Express, internationalize/localize the "Re:" string [1].
> >
> > Should we attempt to handle that here as well?
> >
> > johannes
> >
> > [1] which leads to stupid prefixing when somebody I email with uses the
> > German version - "AW: Re: AW: Re: AW: Re: [...]" since it's not even
> > smart enough to detect the "Re:" and replace it with "AW:" or similar.
> >
> > I have another thread here that just kept growing "AW:" since it *did*
> > remove the "Re:", but *didn't* find the existing "AW:" afterwards, that
> > was with "Microsoft Outlook 15.0".
> 
> I suppose we could have a configurable list of these stored either in
> code or in the database... Let's get basic Re: support in and then grow
> it from there.

Agreed. If you could follow this up with a patch that added additional
prefixes, I'd be more than happy to accept it.

Stephen
Johannes Berg Sept. 2, 2016, 6:01 a.m. UTC | #6
> Agreed. If you could follow this up with a patch that added
> additional prefixes, I'd be more than happy to accept it.
> 

I can't promise a patch right now, but I just found this:

https://en.wikipedia.org/wiki/List_of_email_subject_abbreviations#Abbreviations_in_other_languages

Seeing there that it can actually be turned off, perhaps I should
educate people more. :)

johannes
WEN Pingbo Sept. 2, 2016, 1 p.m. UTC | #7
> 在 2016年8月30日,07:09,Stephen Finucane <stephenfinucane@hotmail.com> 写道:
> 
> On 23 Aug 17:03, WEN Pingbo wrote:
>> If the subject of a submission is prefixed by 'Re:', then it can't be a
>> patch or cover letter.
>> 
>> Signed-off-by: WEN Pingbo <wengpingbo@gmail.com>
> 
> Good idea. Some comments below, but I'm mostly happy with this. Thanks
> :)
> 
> Stephen
> 
> PS: This is significant enough to submit by itself (outside of the
> series) IMO
> 
>> ---
>> patchwork/bin/parsemail.py | 31 +++++++++++++++++++------------
>> 1 file changed, 19 insertions(+), 12 deletions(-)
>> 
>> diff --git a/patchwork/bin/parsemail.py b/patchwork/bin/parsemail.py
>> index 48f809f..6f644da 100755
>> --- a/patchwork/bin/parsemail.py
>> +++ b/patchwork/bin/parsemail.py
>> @@ -341,6 +341,7 @@ def clean_subject(subject, drop_prefixes=None):
>>           from the subject
>>     """
>>     re_re = re.compile(r'^(re|fwd?)[:\s]\s*', re.I)
>> +    comment_re = re.compile(r'^(re)[:\s]\s*', re.I)
>>     prefix_re = re.compile(r'^\[([^\]]*)\]\s*(.*)$')
>>     subject = clean_header(subject)
>> 
>> @@ -351,6 +352,7 @@ def clean_subject(subject, drop_prefixes=None):
>> 
>>     drop_prefixes.append('patch')
>> 
>> +    is_comment = comment_re.match(subject)
>>     # remove Re:, Fwd:, etc
>>     subject = re_re.sub(' ', subject)
>> 
>> @@ -374,7 +376,7 @@ def clean_subject(subject, drop_prefixes=None):
>>     if prefixes:
>>         subject = '[%s] %s' % (','.join(prefixes), subject)
>> 
>> -    return (subject, prefixes)
>> +    return (is_comment, subject, prefixes)
> 
> Perhaps this is a bit of a nit, but I don't think a function called
> 'clean_subject' should be reporting something like 'is_comment'. Could
> you add a new function and call that instead?

OK, I will send a separate patch with this comment.

Pingbo
Stephen Finucane Sept. 24, 2016, 11:38 p.m. UTC | #8
On 02 Sep 21:00, 文平波 wrote:
> 
> > 在 2016年8月30日,07:09,Stephen Finucane <stephenfinucane@hotmail.com> 写道:
> > 
> > On 23 Aug 17:03, WEN Pingbo wrote:
> >> If the subject of a submission is prefixed by 'Re:', then it can't be a
> >> patch or cover letter.
> >> 
> >> Signed-off-by: WEN Pingbo <wengpingbo@gmail.com>
> > 
> > Good idea. Some comments below, but I'm mostly happy with this. Thanks
> > :)
> > 
> > Stephen
> > 
> > PS: This is significant enough to submit by itself (outside of the
> > series) IMO
> > 
> >> ---
> >> patchwork/bin/parsemail.py | 31 +++++++++++++++++++------------
> >> 1 file changed, 19 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/patchwork/bin/parsemail.py b/patchwork/bin/parsemail.py
> >> index 48f809f..6f644da 100755
> >> --- a/patchwork/bin/parsemail.py
> >> +++ b/patchwork/bin/parsemail.py
> >> @@ -341,6 +341,7 @@ def clean_subject(subject, drop_prefixes=None):
> >>           from the subject
> >>     """
> >>     re_re = re.compile(r'^(re|fwd?)[:\s]\s*', re.I)
> >> +    comment_re = re.compile(r'^(re)[:\s]\s*', re.I)
> >>     prefix_re = re.compile(r'^\[([^\]]*)\]\s*(.*)$')
> >>     subject = clean_header(subject)
> >> 
> >> @@ -351,6 +352,7 @@ def clean_subject(subject, drop_prefixes=None):
> >> 
> >>     drop_prefixes.append('patch')
> >> 
> >> +    is_comment = comment_re.match(subject)
> >>     # remove Re:, Fwd:, etc
> >>     subject = re_re.sub(' ', subject)
> >> 
> >> @@ -374,7 +376,7 @@ def clean_subject(subject, drop_prefixes=None):
> >>     if prefixes:
> >>         subject = '[%s] %s' % (','.join(prefixes), subject)
> >> 
> >> -    return (subject, prefixes)
> >> +    return (is_comment, subject, prefixes)
> > 
> > Perhaps this is a bit of a nit, but I don't think a function called
> > 'clean_subject' should be reporting something like 'is_comment'. Could
> > you add a new function and call that instead?
> 
> OK, I will send a separate patch with this comment.
> 
> Pingbo

I can't find V2 of the patch in my inbox, but I merged it after
applying some fixes.

Stephen
diff mbox

Patch

diff --git a/patchwork/bin/parsemail.py b/patchwork/bin/parsemail.py
index 48f809f..6f644da 100755
--- a/patchwork/bin/parsemail.py
+++ b/patchwork/bin/parsemail.py
@@ -341,6 +341,7 @@  def clean_subject(subject, drop_prefixes=None):
           from the subject
     """
     re_re = re.compile(r'^(re|fwd?)[:\s]\s*', re.I)
+    comment_re = re.compile(r'^(re)[:\s]\s*', re.I)
     prefix_re = re.compile(r'^\[([^\]]*)\]\s*(.*)$')
     subject = clean_header(subject)
 
@@ -351,6 +352,7 @@  def clean_subject(subject, drop_prefixes=None):
 
     drop_prefixes.append('patch')
 
+    is_comment = comment_re.match(subject)
     # remove Re:, Fwd:, etc
     subject = re_re.sub(' ', subject)
 
@@ -374,7 +376,7 @@  def clean_subject(subject, drop_prefixes=None):
     if prefixes:
         subject = '[%s] %s' % (','.join(prefixes), subject)
 
-    return (subject, prefixes)
+    return (is_comment, subject, prefixes)
 
 
 def clean_content(content):
@@ -479,7 +481,7 @@  def parse_mail(mail, list_id=None):
 
     msgid = mail.get('Message-Id').strip()
     author = find_author(mail)
-    name, prefixes = clean_subject(mail.get('Subject'), [project.linkname])
+    is_comment, name, prefixes = clean_subject(mail.get('Subject'), [project.linkname])
     x, n = parse_series_marker(prefixes)
     refs = find_references(mail)
     date = find_date(mail)
@@ -488,7 +490,7 @@  def parse_mail(mail, list_id=None):
 
     # build objects
 
-    if diff or pull_url:  # patches or pull requests
+    if not is_comment and (diff or pull_url):  # patches or pull requests
         # we delay the saving until we know we have a patch.
         author.save()
 
@@ -518,16 +520,18 @@  def parse_mail(mail, list_id=None):
         # however, we need to see if a match already exists and, if
         # not, assume that it is indeed a new cover letter
         is_cover_letter = False
-        if not refs == []:
-            try:
-                CoverLetter.objects.all().get(name=name)
-            except CoverLetter.DoesNotExist:  # no match => new cover
+
+        if not is_comment:
+            if not refs == []:
+                try:
+                    CoverLetter.objects.all().get(name=name)
+                except CoverLetter.DoesNotExist:  # no match => new cover
+                    is_cover_letter = True
+                except CoverLetter.MultipleObjectsReturned:
+                    # if multiple cover letters are found, just ignore
+                    pass
+            else:
                 is_cover_letter = True
-            except CoverLetter.MultipleObjectsReturned:
-                # if multiple cover letters are found, just ignore
-                pass
-        else:
-            is_cover_letter = True
 
         if is_cover_letter:
             author.save()
@@ -552,6 +556,9 @@  def parse_mail(mail, list_id=None):
     if not submission:
         return
 
+    if is_comment and diff:
+        message += diff
+
     author.save()
 
     comment = Comment(