diff mbox

[U-Boot] patman: encode CC list to UTF-8

Message ID 1492608257-924-1-git-send-email-philipp.tomsich@theobroma-systems.com
State Accepted
Commit 21caa558ca1811a9995ed1c1b0e2c01cbdf25662
Delegated to: Simon Glass
Headers show

Commit Message

Philipp Tomsich April 19, 2017, 1:24 p.m. UTC
This change encodes the CC list to UTF-8 to avoid failures on
maintainer-addresses that include non-ASCII characters (observed on
Debian 7.11 with Python 2.7.3).

Without this, I get the following failure:
  Traceback (most recent call last):
    File "tools/patman/patman", line 159, in <module>
      options.add_maintainers)
    File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile
      print(commit.patch, ', '.join(set(list)), file=fd)
  UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128)
from Heiko's email address:
  [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...]

While with this change added this encodes to:
  "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>"

Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
---

 tools/patman/series.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Simon Glass April 22, 2017, 11:53 p.m. UTC | #1
+Tom

On 19 April 2017 at 07:24, Philipp Tomsich
<philipp.tomsich@theobroma-systems.com> wrote:
>
> This change encodes the CC list to UTF-8 to avoid failures on
> maintainer-addresses that include non-ASCII characters (observed on
> Debian 7.11 with Python 2.7.3).
>
> Without this, I get the following failure:
>   Traceback (most recent call last):
>     File "tools/patman/patman", line 159, in <module>
>       options.add_maintainers)
>     File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile
>       print(commit.patch, ', '.join(set(list)), file=fd)
>   UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128)
> from Heiko's email address:
>   [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...]
>
> While with this change added this encodes to:
>   "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>"
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
> ---
>
>  tools/patman/series.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Simon Glass <sjg@chromium.org>
Tom Rini April 25, 2017, 5:12 p.m. UTC | #2
On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote:
> +Tom
> 
> On 19 April 2017 at 07:24, Philipp Tomsich
> <philipp.tomsich@theobroma-systems.com> wrote:
> >
> > This change encodes the CC list to UTF-8 to avoid failures on
> > maintainer-addresses that include non-ASCII characters (observed on
> > Debian 7.11 with Python 2.7.3).
> >
> > Without this, I get the following failure:
> >   Traceback (most recent call last):
> >     File "tools/patman/patman", line 159, in <module>
> >       options.add_maintainers)
> >     File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile
> >       print(commit.patch, ', '.join(set(list)), file=fd)
> >   UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128)
> > from Heiko's email address:
> >   [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...]
> >
> > While with this change added this encodes to:
> >   "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>"
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
> > ---
> >
> >  tools/patman/series.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Reviewed-by: Simon Glass <sjg@chromium.org>

Please put this in a PR for me, along with any other critical fixes to
the various python tools we have, thanks!

And also, do we need to perhaps whack something at a higher level, and
more consistently, about unicode?  This is, I gather, doing UTF-8 right.
In buildman we have a few patches to just translate to latin-1 instead.
We should do the same thing I think, and perhaps there's a higher level
up in the code where we need to do it too?  I don't know..
Simon Glass April 25, 2017, 8:31 p.m. UTC | #3
Hi Tom,

On 25 April 2017 at 11:12, Tom Rini <trini@konsulko.com> wrote:
>
> On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote:
> > +Tom
> >
> > On 19 April 2017 at 07:24, Philipp Tomsich
> > <philipp.tomsich@theobroma-systems.com> wrote:
> > >
> > > This change encodes the CC list to UTF-8 to avoid failures on
> > > maintainer-addresses that include non-ASCII characters (observed on
> > > Debian 7.11 with Python 2.7.3).
> > >
> > > Without this, I get the following failure:
> > >   Traceback (most recent call last):
> > >     File "tools/patman/patman", line 159, in <module>
> > >       options.add_maintainers)
> > >     File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile
> > >       print(commit.patch, ', '.join(set(list)), file=fd)
> > >   UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128)
> > > from Heiko's email address:
> > >   [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...]
> > >
> > > While with this change added this encodes to:
> > >   "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>"
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
> > > ---
> > >
> > >  tools/patman/series.py | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > Reviewed-by: Simon Glass <sjg@chromium.org>
>
> Please put this in a PR for me, along with any other critical fixes to
> the various python tools we have, thanks!
>
> And also, do we need to perhaps whack something at a higher level, and
> more consistently, about unicode?  This is, I gather, doing UTF-8 right.
> In buildman we have a few patches to just translate to latin-1 instead.
> We should do the same thing I think, and perhaps there's a higher level
> up in the code where we need to do it too?  I don't know..

Actually I don't think we are quite there yet. This really needs a
test with all the different places strings can come from, to make sure
patman does the right thing.

Regards,
Simon
Philipp Tomsich April 25, 2017, 10:27 p.m. UTC | #4
Hi Simon,

> On 25 Apr 2017, at 22:31, Simon Glass <sjg@chromium.org> wrote:
> 
> Hi Tom,
> 
> On 25 April 2017 at 11:12, Tom Rini <trini@konsulko.com> wrote:
>> 
>> On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote:
>>> +Tom
>>> 
>>> On 19 April 2017 at 07:24, Philipp Tomsich
>>> <philipp.tomsich@theobroma-systems.com> wrote:
>>>> 
>>>> This change encodes the CC list to UTF-8 to avoid failures on
>>>> maintainer-addresses that include non-ASCII characters (observed on
>>>> Debian 7.11 with Python 2.7.3).
>>>> 
>>>> Without this, I get the following failure:
>>>>  Traceback (most recent call last):
>>>>    File "tools/patman/patman", line 159, in <module>
>>>>      options.add_maintainers)
>>>>    File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile
>>>>      print(commit.patch, ', '.join(set(list)), file=fd)
>>>>  UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128)
>>>> from Heiko's email address:
>>>>  [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...]
>>>> 
>>>> While with this change added this encodes to:
>>>>  "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>"
>>>> 
>>>> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
>>>> ---
>>>> 
>>>> tools/patman/series.py | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>> 
>>> Reviewed-by: Simon Glass <sjg@chromium.org>
>> 
>> Please put this in a PR for me, along with any other critical fixes to
>> the various python tools we have, thanks!
>> 
>> And also, do we need to perhaps whack something at a higher level, and
>> more consistently, about unicode?  This is, I gather, doing UTF-8 right.
>> In buildman we have a few patches to just translate to latin-1 instead.
>> We should do the same thing I think, and perhaps there's a higher level
>> up in the code where we need to do it too?  I don't know..
> 
> Actually I don't think we are quite there yet. This really needs a
> test with all the different places strings can come from, to make sure
> patman does the right thing.

On the topic of ‘different places strings can come from’, here’s another
change from my WIP tree that fixes some other UTF-8 issues in patman
and may point you towards another trouble spot:

@@ -229,14 +229,16 @@ class Series(dict):
                                            raise_on_error=raise_on_error)
             if add_maintainers:
                 list += get_maintainer.GetMaintainer(commit.patch)
+            list = [s.encode('utf-8') for s in list]
             all_ccs += list
-            print(commit.patch, ', '.join(set(list)).encode('utf-8'), file=fd)
+            print(commit.patch, ', '.join(set(list)), file=fd)
             self._generated_cc[commit.patch] = list
 
         if cover_fname:
             cover_cc = gitutil.BuildEmailList(self.get('cover_cc', ''))
-            cc_list = ', '.join([x.decode('utf-8') for x in set(cover_cc + all_ccs)])
-            print(cover_fname, cc_list.encode('utf-8'), file=fd)
+            cover_cc = [s.encode('utf-8') for s in cover_cc]
+            cc_list = ', '.join([x for x in set(cover_cc + all_ccs)])
+            print(cover_fname, cc_list, file=fd)
 
         fd.close()
         return fname


Regards,
Philipp.
diff mbox

Patch

diff --git a/tools/patman/series.py b/tools/patman/series.py
index c1b8652..134a381 100644
--- a/tools/patman/series.py
+++ b/tools/patman/series.py
@@ -119,7 +119,7 @@  class Series(dict):
                     email = col.Color(col.YELLOW, "<alias '%s' not found>"
                             % tag)
                 if email:
-                    print('      Cc: ', email)
+                    print('      Cc: ', email.encode('utf-8'))
         print
         for item in to_set:
             print('To:\t ', item)
@@ -230,7 +230,7 @@  class Series(dict):
             if add_maintainers:
                 list += get_maintainer.GetMaintainer(commit.patch)
             all_ccs += list
-            print(commit.patch, ', '.join(set(list)), file=fd)
+            print(commit.patch, ', '.join(set(list)).encode('utf-8'), file=fd)
             self._generated_cc[commit.patch] = list
 
         if cover_fname: