[v6,1/2] parsemail: Convert to a management command

Message ID	BLU437-SMTP14C341A24DA164D29133DFA3F40@phx.gbl
State	Not Applicable
Headers	show Return-Path: <patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> Message-ID: <BLU437-SMTP14C341A24DA164D29133DFA3F40@phx.gbl> Date: Tue, 20 Sep 2016 00:08:55 +0100 From: Stephen Finucane <stephenfinucane@hotmail.com> To: Daniel Axtens <dja@axtens.net> Subject: Re: [PATCH v6 1/2] parsemail: Convert to a management command References: <1473620125-19914-1-git-send-email-stephenfinucane@hotmail.com> <BLU436-SMTP322A0BFDAA3CB0051017DAA3FC0@phx.gbl> <871t0gyxcf.fsf@possimpible.ozlabs.ibm.com> <87y42nykvz.fsf@possimpible.ozlabs.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87y42nykvz.fsf@possimpible.ozlabs.ibm.com> User-Agent: Mutt/1.7.0 (2016-08-17) Precedence: list Cc: patchwork@lists.ozlabs.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Patchwork" <patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>

Message ID

BLU437-SMTP14C341A24DA164D29133DFA3F40@phx.gbl

State

Not Applicable

Headers

Message-ID: <BLU437-SMTP14C341A24DA164D29133DFA3F40@phx.gbl>
Date: Tue, 20 Sep 2016 00:08:55 +0100
From: Stephen Finucane <stephenfinucane@hotmail.com>
To: Daniel Axtens <dja@axtens.net>
Subject: Re: [PATCH v6 1/2] parsemail: Convert to a management command
References: <1473620125-19914-1-git-send-email-stephenfinucane@hotmail.com>
	<BLU436-SMTP322A0BFDAA3CB0051017DAA3FC0@phx.gbl>
	<871t0gyxcf.fsf@possimpible.ozlabs.ibm.com>
	<87y42nykvz.fsf@possimpible.ozlabs.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <87y42nykvz.fsf@possimpible.ozlabs.ibm.com>
User-Agent: Mutt/1.7.0 (2016-08-17)
Precedence: list
Cc: patchwork@lists.ozlabs.org
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org
Sender: "Patchwork"
	<patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>

Commit Message

Stephen Finucane Sept. 19, 2016, 11:08 p.m. UTC

On 20 Sep 01:22, Daniel Axtens wrote:
> So, umm, I went ahead and had a crack at this.
> 
> It turns out this is hideously difficult to get right. But this plus my
> other patch to fix Thomas' problem should have things working on Py2 and
> Py3 with this series.
> 
> It's a bit of a work in progress: I need to close the file at the end
> of the function, the logging needs to be added again, etc.
> 
> Tests to come.
> 
> Stephen: I can do this up into a proper patch if you like or you can
> fold it into your series.
> 
> Regards,
> Daniel

I don't have the offending patch on hand, but isn't the issue with the
headers. If so, would something like the below do (I haven't tested
it - there could be typos).

I'll review this if not.

Stephen

Comments

Daniel Axtens Sept. 20, 2016, 12:03 a.m. UTC | #1

>
> I don't have the offending patch on hand, but isn't the issue with the
> headers. If so, would something like the below do (I haven't tested
> it - there could be typos).

This is a bit more general - it deals with UTF-8 characters in the body
for Py3 as well, which is broken even with the other patch I sent.

I'll send some tests which will make it a bit clearer.

Regards,
Daniel

>
> I'll review this if not.
>
> Stephen
>
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 1805df8..7917e97 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -21,7 +21,8 @@
>  
>  import codecs
>  import datetime
> -from email.header import Header, decode_header
> +from email.header import decode_header
> +from email.header import make_header
>  from email.utils import parsedate_tz, mktime_tz
>  from fnmatch import fnmatch
>  from functools import reduce
> @@ -155,10 +156,10 @@ def find_date(mail):
>  
>  
>  def find_headers(mail):
> -    return reduce(operator.__concat__,
> -                  ['%s: %s\n' % (k, Header(v, header_name=k,
> -                                           continuation_ws='\t').encode())
> -                   for (k, v) in list(mail.items())])
> +    headers = {key: decode_header(value) for key, value in list(mail.items())}
> +    return '\n'.join(['%s: %s' % (key, make_header(value[0], header_name=key,
> +                                                   continuation_wd='\t'))
> +                      for key, value in headers])
>  
>  
>  def find_references(mail):

Thomas Monjalon Sept. 21, 2016, 12:18 p.m. UTC | #2

Hi,

2016-09-20 00:08, Stephen Finucane:
> On 20 Sep 01:22, Daniel Axtens wrote:
> > So, umm, I went ahead and had a crack at this.
> > 
> > It turns out this is hideously difficult to get right. But this plus my
> > other patch to fix Thomas' problem should have things working on Py2 and
> > Py3 with this series.

Thanks for taking care.

[...]
> I don't have the offending patch on hand, but isn't the issue with the
> headers. If so, would something like the below do (I haven't tested
> it - there could be typos).
[...]

Please would it be possible to fix this bug in the stable branch also?
Thanks a lot

Daniel Axtens Sept. 21, 2016, 2:29 p.m. UTC | #3

Hi Stephen,

>  def find_headers(mail):
> -    return reduce(operator.__concat__,
> -                  ['%s: %s\n' % (k, Header(v, header_name=k,
> -                                           continuation_ws='\t').encode())
> -                   for (k, v) in list(mail.items())])
> +    headers = {key: decode_header(value) for key, value in list(mail.items())}
> +    return '\n'.join(['%s: %s' % (key, make_header(value[0], header_name=key,
> +                                                   continuation_wd='\t'))
> +                      for key, value in headers])

This works beautifully in Python3. In Python2, not so much:


patchwork@652f47a766fc:~/patchwork$ python2
Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.header import decode_header, make_header
>>> snowman_header = u'Snowman: \u2603'
>>> snowman_utf_8 = snowman_header.encode('utf-8')
>>> print(snowman_utf_8)
Snowman: ☃
>>> print(decode_header(snowman_header))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/email/header.py", line 73, in decode_header
    header = str(header)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2603' in position 9: ordinal not in range(128)
>>> print(decode_header(snowman_utf_8))
[('Snowman: \xe2\x98\x83', None)]
>>> print(make_header(decode_header(snowman_utf_8)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/email/header.py", line 139, in make_header
    h.append(s, charset)
  File "/usr/lib/python2.7/email/header.py", line 267, in append
    ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
>>> 

So I suppose some half-breed of the two approaches is the way to go.

Regards,
Daniel

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 1805df8..7917e97 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -21,7 +21,8 @@ 
 
 import codecs
 import datetime
-from email.header import Header, decode_header
+from email.header import decode_header
+from email.header import make_header
 from email.utils import parsedate_tz, mktime_tz
 from fnmatch import fnmatch
 from functools import reduce
@@ -155,10 +156,10 @@  def find_date(mail):
 
 
 def find_headers(mail):
-    return reduce(operator.__concat__,
-                  ['%s: %s\n' % (k, Header(v, header_name=k,
-                                           continuation_ws='\t').encode())
-                   for (k, v) in list(mail.items())])
+    headers = {key: decode_header(value) for key, value in list(mail.items())}
+    return '\n'.join(['%s: %s' % (key, make_header(value[0], header_name=key,
+                                                   continuation_wd='\t'))
+                      for key, value in headers])
 
 
 def find_references(mail):