Message ID | 1492608257-924-1-git-send-email-philipp.tomsich@theobroma-systems.com |
---|---|
State | Accepted |
Commit | 21caa558ca1811a9995ed1c1b0e2c01cbdf25662 |
Delegated to: | Simon Glass |
Headers | show |
+Tom On 19 April 2017 at 07:24, Philipp Tomsich <philipp.tomsich@theobroma-systems.com> wrote: > > This change encodes the CC list to UTF-8 to avoid failures on > maintainer-addresses that include non-ASCII characters (observed on > Debian 7.11 with Python 2.7.3). > > Without this, I get the following failure: > Traceback (most recent call last): > File "tools/patman/patman", line 159, in <module> > options.add_maintainers) > File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile > print(commit.patch, ', '.join(set(list)), file=fd) > UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128) > from Heiko's email address: > [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...] > > While with this change added this encodes to: > "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>" > > Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> > --- > > tools/patman/series.py | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Simon Glass <sjg@chromium.org>
On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote: > +Tom > > On 19 April 2017 at 07:24, Philipp Tomsich > <philipp.tomsich@theobroma-systems.com> wrote: > > > > This change encodes the CC list to UTF-8 to avoid failures on > > maintainer-addresses that include non-ASCII characters (observed on > > Debian 7.11 with Python 2.7.3). > > > > Without this, I get the following failure: > > Traceback (most recent call last): > > File "tools/patman/patman", line 159, in <module> > > options.add_maintainers) > > File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile > > print(commit.patch, ', '.join(set(list)), file=fd) > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128) > > from Heiko's email address: > > [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...] > > > > While with this change added this encodes to: > > "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>" > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> > > --- > > > > tools/patman/series.py | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > Reviewed-by: Simon Glass <sjg@chromium.org> Please put this in a PR for me, along with any other critical fixes to the various python tools we have, thanks! And also, do we need to perhaps whack something at a higher level, and more consistently, about unicode? This is, I gather, doing UTF-8 right. In buildman we have a few patches to just translate to latin-1 instead. We should do the same thing I think, and perhaps there's a higher level up in the code where we need to do it too? I don't know..
Hi Tom, On 25 April 2017 at 11:12, Tom Rini <trini@konsulko.com> wrote: > > On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote: > > +Tom > > > > On 19 April 2017 at 07:24, Philipp Tomsich > > <philipp.tomsich@theobroma-systems.com> wrote: > > > > > > This change encodes the CC list to UTF-8 to avoid failures on > > > maintainer-addresses that include non-ASCII characters (observed on > > > Debian 7.11 with Python 2.7.3). > > > > > > Without this, I get the following failure: > > > Traceback (most recent call last): > > > File "tools/patman/patman", line 159, in <module> > > > options.add_maintainers) > > > File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile > > > print(commit.patch, ', '.join(set(list)), file=fd) > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128) > > > from Heiko's email address: > > > [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...] > > > > > > While with this change added this encodes to: > > > "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>" > > > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> > > > --- > > > > > > tools/patman/series.py | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > Reviewed-by: Simon Glass <sjg@chromium.org> > > Please put this in a PR for me, along with any other critical fixes to > the various python tools we have, thanks! > > And also, do we need to perhaps whack something at a higher level, and > more consistently, about unicode? This is, I gather, doing UTF-8 right. > In buildman we have a few patches to just translate to latin-1 instead. > We should do the same thing I think, and perhaps there's a higher level > up in the code where we need to do it too? I don't know.. Actually I don't think we are quite there yet. This really needs a test with all the different places strings can come from, to make sure patman does the right thing. Regards, Simon
Hi Simon, > On 25 Apr 2017, at 22:31, Simon Glass <sjg@chromium.org> wrote: > > Hi Tom, > > On 25 April 2017 at 11:12, Tom Rini <trini@konsulko.com> wrote: >> >> On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote: >>> +Tom >>> >>> On 19 April 2017 at 07:24, Philipp Tomsich >>> <philipp.tomsich@theobroma-systems.com> wrote: >>>> >>>> This change encodes the CC list to UTF-8 to avoid failures on >>>> maintainer-addresses that include non-ASCII characters (observed on >>>> Debian 7.11 with Python 2.7.3). >>>> >>>> Without this, I get the following failure: >>>> Traceback (most recent call last): >>>> File "tools/patman/patman", line 159, in <module> >>>> options.add_maintainers) >>>> File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile >>>> print(commit.patch, ', '.join(set(list)), file=fd) >>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128) >>>> from Heiko's email address: >>>> [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...] >>>> >>>> While with this change added this encodes to: >>>> "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>" >>>> >>>> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> >>>> --- >>>> >>>> tools/patman/series.py | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> Reviewed-by: Simon Glass <sjg@chromium.org> >> >> Please put this in a PR for me, along with any other critical fixes to >> the various python tools we have, thanks! >> >> And also, do we need to perhaps whack something at a higher level, and >> more consistently, about unicode? This is, I gather, doing UTF-8 right. >> In buildman we have a few patches to just translate to latin-1 instead. >> We should do the same thing I think, and perhaps there's a higher level >> up in the code where we need to do it too? I don't know.. > > Actually I don't think we are quite there yet. This really needs a > test with all the different places strings can come from, to make sure > patman does the right thing. On the topic of ‘different places strings can come from’, here’s another change from my WIP tree that fixes some other UTF-8 issues in patman and may point you towards another trouble spot: @@ -229,14 +229,16 @@ class Series(dict): raise_on_error=raise_on_error) if add_maintainers: list += get_maintainer.GetMaintainer(commit.patch) + list = [s.encode('utf-8') for s in list] all_ccs += list - print(commit.patch, ', '.join(set(list)).encode('utf-8'), file=fd) + print(commit.patch, ', '.join(set(list)), file=fd) self._generated_cc[commit.patch] = list if cover_fname: cover_cc = gitutil.BuildEmailList(self.get('cover_cc', '')) - cc_list = ', '.join([x.decode('utf-8') for x in set(cover_cc + all_ccs)]) - print(cover_fname, cc_list.encode('utf-8'), file=fd) + cover_cc = [s.encode('utf-8') for s in cover_cc] + cc_list = ', '.join([x for x in set(cover_cc + all_ccs)]) + print(cover_fname, cc_list, file=fd) fd.close() return fname Regards, Philipp.
diff --git a/tools/patman/series.py b/tools/patman/series.py index c1b8652..134a381 100644 --- a/tools/patman/series.py +++ b/tools/patman/series.py @@ -119,7 +119,7 @@ class Series(dict): email = col.Color(col.YELLOW, "<alias '%s' not found>" % tag) if email: - print(' Cc: ', email) + print(' Cc: ', email.encode('utf-8')) print for item in to_set: print('To:\t ', item) @@ -230,7 +230,7 @@ class Series(dict): if add_maintainers: list += get_maintainer.GetMaintainer(commit.patch) all_ccs += list - print(commit.patch, ', '.join(set(list)), file=fd) + print(commit.patch, ', '.join(set(list)).encode('utf-8'), file=fd) self._generated_cc[commit.patch] = list if cover_fname:
This change encodes the CC list to UTF-8 to avoid failures on maintainer-addresses that include non-ASCII characters (observed on Debian 7.11 with Python 2.7.3). Without this, I get the following failure: Traceback (most recent call last): File "tools/patman/patman", line 159, in <module> options.add_maintainers) File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile print(commit.patch, ', '.join(set(list)), file=fd) UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128) from Heiko's email address: [..., u'"Heiko St\xfcbner" <heiko@sntech.de>', ...] While with this change added this encodes to: "=?UTF-8?q?Heiko=20St=C3=BCbner?= <heiko@sntech.de>" Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> --- tools/patman/series.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)