Message ID | 20170405124604.28022-1-robin.jarry@6wind.com |
---|---|
State | Superseded |
Headers | show |
Gentle bump :-) Robin
On Wed, 2017-04-05 at 14:46 +0200, Robin Jarry wrote: > Replacing sys.stdout and sys.stderr can cause obscure crashes when > trying to write non unicode data. The interpreter is terminated with > SIGINT without any specific error writen on the console. > > rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0}, > {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0 > > This happens easily when there is an untrapped exception which should > lead to printing a traceback on stderr. > > The only way to prevent UnicodeEncodeErrors is to make sure that > PYTHONIOENCODING is set with the ':replace' suffix and this can only > be > done *before* starting the interpreter as the initialization is made > very early on and the encoding cannot be set or modified after. > > From the official documentation: > > PYTHONIOENCODING > > Overrides the encoding used for stdin/stdout/stderr, in the syntax > encodingname:errorhandler. The :errorhandler part is optional and > has the same meaning as in str.encode(). > > https://docs.python.org/2/using/cmdline.html > https://docs.python.org/3/using/cmdline.html > > Of course, for proper encoding of unicode characters, one of the > locale-related environment variables (LC_ALL, LANG, LANGUAGE, etc.) > must > be set. Python will use the correct encoding accordingly and > PYTHONIOENCODING will be set to "encoding:replace". > > Examples: > > $ grep utf8 ~/.Xresources > xterm*utf8: 2 > > $ env - PYTHONIOENCODING=utf-8:replace python2 -c "print > u's\u00e9duisante'" > séduisante > $ env - PYTHONIOENCODING=utf-8:replace python3 -c > "print('s\u00e9duisante')" > séduisante > > $ env - PYTHONIOENCODING=ascii:replace python2 -c "print > u's\u00e9duisante'" > s?duisante > $ env - PYTHONIOENCODING=ascii:replace python3 -c > "print('s\u00e9duisante')" > s?duisante > > $ env - PYTHONIOENCODING=ISO-8859-1:replace python2 -c "print > u's\u00e9duisante'" > s�duisante > $ env - PYTHONIOENCODING=ISO-8859-1:replace python3 -c > "print('s\u00e9duisante')" > s�duisante > > Fixes: 046419a3bf8f ("pwclient: Fix encoding problems") > Signed-off-by: Robin Jarry <robin.jarry@6wind.com> I understand the issue now - thanks for clarifying :) Sorry for the delay in replying also. Now, while I understand what you're doing here, I'm wondering if we could simply revert the original patch instead. I clearly merged that without fully understanding the issue and not only does it cause the issues you've trying to resolve here but it seems it's not necessary for a lot of people (including me. Fedora 24). For those users that are affected, there are less hacky workarounds. For example, a simple script like so would so the job, I imagine. $ cat pwclient2 #!/usr/bin/env bash PYTHONIOENCODING=UTF-8 pwclient I'd be happy to include an epilog detailing this issue that would appear when you'd run '--help'. I realize that were still be an extra step for affected users, but the vast majority aren't and wouldn't be affected. Would this be acceptable? Stephen PS: What kind of environment are you encountering this issue in? > --- > v2: > > - Always set PYTHONIOENCODING=<enc>:replace (on python 2 and 3) to > prevent from > UnicodeEncodeErrors > > patchwork/bin/pwclient | 35 ++++++++++++++++++++++++++--------- > 1 file changed, 26 insertions(+), 9 deletions(-) > > diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient > index ed0351bf5288..5a7b6723afe3 100755 > --- a/patchwork/bin/pwclient > +++ b/patchwork/bin/pwclient > @@ -41,16 +41,7 @@ except ImportError: > import shutil > import re > import io > -import locale > > -if sys.version_info.major == 2: > - # hack to make writing unicode to standard output/error work on > Python 2 > - OUT_ENCODING = (sys.stdout.encoding or > locale.getpreferredencoding() or > - os.getenv('PYTHONIOENCODING', 'utf-8')) > - sys.stdout = io.open(sys.stdout.fileno(), mode='w', > - encoding=OUT_ENCODING, errors='replace') > - sys.stderr = io.open(sys.stderr.fileno(), mode='w', > - encoding=OUT_ENCODING, errors='replace') > > # Default Patchwork remote XML-RPC server URL > # This script will check the PW_XMLRPC_URL environment variable > @@ -821,5 +812,31 @@ def main(): > sys.exit(1) > > > +def force_io_encoding(): > + """ > + Force PYTHONIOENCODING ":errorhandler" to avoid > UnicodeEncodeErrors. The > + only way to do it is to set the environment variable *before* > starting the > + interpreter. From the python docs: > + > + PYTHONIOENCODING > + > + Overrides the encoding used for stdin/stdout/stderr, in the > syntax > + encodingname:errorhandler. The :errorhandler part is optional > and has the > + same meaning as in str.encode(). > + > + Note that this only prevents interpreter crashes, it does not > exempt from > + correctly setting the LANG or LC_ALL variables in order to have > valid > + output. > + """ > + if 'PYTHONIOENCODING' in os.environ: > + return > + > + encoding = sys.stdout.encoding or 'utf-8' > + > + os.environ['PYTHONIOENCODING'] = encoding + ':replace' > + os.execvp(sys.executable, [sys.executable] + sys.argv) # no > return > + > + > if __name__ == "__main__": > + force_io_encoding() > main()
Hi Stephen, On Apr 28, 2017 11:46 PM, "Stephen Finucane" <stephen@that.guru> wrote: > > I understand the issue now - thanks for clarifying :) Sorry for the > delay in replying also. > > Now, while I understand what you're doing here, I'm wondering if we > could simply revert the original patch instead. I clearly merged that > without fully understanding the issue and not only does it cause the > issues you've trying to resolve here but it seems it's not necessary > for a lot of people (including me. Fedora 24). For those users that are > affected, there are less hacky workarounds. For example, a simple > script like so would so the job, I imagine. > > $ cat pwclient2 > #!/usr/bin/env bash > PYTHONIOENCODING=UTF-8 pwclient > > I'd be happy to include an epilog detailing this issue that would > appear when you'd run '--help'. I realize that were still be an extra > step for affected users, but the vast majority aren't and wouldn't be > affected. > > Would this be acceptable? I wouldn't revert the whole patch, I remember I fixed some minor things which should remain. However, in addition of a small help messge, we could also wrap the call to main() to display a more useful error message: try: main() except UnicodeEncodeError as e traceback.print_traceback() sys.stderr.write('you should export the LANG or LC_ALL environment variables\n') sys.exit(1) what do you think? > Stephen > > PS: What kind of environment are you encountering this issue in? On debian 8 (stable). Robin
On Sun, 2017-04-30 at 21:54 +0200, Robin Jarry wrote: > Hi Stephen, > > On Apr 28, 2017 11:46 PM, "Stephen Finucane" <stephen@that.guru> > wrote: > > > > I understand the issue now - thanks for clarifying :) Sorry for the > > delay in replying also. > > > > Now, while I understand what you're doing here, I'm wondering if we > > could simply revert the original patch instead. I clearly merged > > that without fully understanding the issue and not only does it > > cause the issues you've trying to resolve here but it seems it's > > not necessary for a lot of people (including me. Fedora 24). For > > those users that are affected, there are less hacky workarounds. > > For example, a simple script like so would so the job, I imagine. > > > > $ cat pwclient2 > > #!/usr/bin/env bash > > PYTHONIOENCODING=UTF-8 pwclient > > > > I'd be happy to include an epilog detailing this issue that would > > appear when you'd run '--help'. I realize that were still be an > > extra step for affected users, but the vast majority aren't and > > wouldn't be affected. > > > > Would this be acceptable? > > I wouldn't revert the whole patch, I remember I fixed some minor > things which should remain. However, in addition of a small help > messge, we could also wrap the call to main() to display a more > useful error message: > > try: > main() > except UnicodeEncodeError as e > traceback.print_traceback() > sys.stderr.write('you should export the LANG or LC_ALL > environment variables\n') > sys.exit(1) > > what do you think? Yup. That's definitely worth including. I guess you'd want to export PYTHONIOENCODING too, and maybe suggest they check out the help page for more information if it's failed? Do you want to put that patch together as a v3? Stephen
diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient index ed0351bf5288..5a7b6723afe3 100755 --- a/patchwork/bin/pwclient +++ b/patchwork/bin/pwclient @@ -41,16 +41,7 @@ except ImportError: import shutil import re import io -import locale -if sys.version_info.major == 2: - # hack to make writing unicode to standard output/error work on Python 2 - OUT_ENCODING = (sys.stdout.encoding or locale.getpreferredencoding() or - os.getenv('PYTHONIOENCODING', 'utf-8')) - sys.stdout = io.open(sys.stdout.fileno(), mode='w', - encoding=OUT_ENCODING, errors='replace') - sys.stderr = io.open(sys.stderr.fileno(), mode='w', - encoding=OUT_ENCODING, errors='replace') # Default Patchwork remote XML-RPC server URL # This script will check the PW_XMLRPC_URL environment variable @@ -821,5 +812,31 @@ def main(): sys.exit(1) +def force_io_encoding(): + """ + Force PYTHONIOENCODING ":errorhandler" to avoid UnicodeEncodeErrors. The + only way to do it is to set the environment variable *before* starting the + interpreter. From the python docs: + + PYTHONIOENCODING + + Overrides the encoding used for stdin/stdout/stderr, in the syntax + encodingname:errorhandler. The :errorhandler part is optional and has the + same meaning as in str.encode(). + + Note that this only prevents interpreter crashes, it does not exempt from + correctly setting the LANG or LC_ALL variables in order to have valid + output. + """ + if 'PYTHONIOENCODING' in os.environ: + return + + encoding = sys.stdout.encoding or 'utf-8' + + os.environ['PYTHONIOENCODING'] = encoding + ':replace' + os.execvp(sys.executable, [sys.executable] + sys.argv) # no return + + if __name__ == "__main__": + force_io_encoding() main()
Replacing sys.stdout and sys.stderr can cause obscure crashes when trying to write non unicode data. The interpreter is terminated with SIGINT without any specific error writen on the console. rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0}, {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0 This happens easily when there is an untrapped exception which should lead to printing a traceback on stderr. The only way to prevent UnicodeEncodeErrors is to make sure that PYTHONIOENCODING is set with the ':replace' suffix and this can only be done *before* starting the interpreter as the initialization is made very early on and the encoding cannot be set or modified after. From the official documentation: PYTHONIOENCODING Overrides the encoding used for stdin/stdout/stderr, in the syntax encodingname:errorhandler. The :errorhandler part is optional and has the same meaning as in str.encode(). https://docs.python.org/2/using/cmdline.html https://docs.python.org/3/using/cmdline.html Of course, for proper encoding of unicode characters, one of the locale-related environment variables (LC_ALL, LANG, LANGUAGE, etc.) must be set. Python will use the correct encoding accordingly and PYTHONIOENCODING will be set to "encoding:replace". Examples: $ grep utf8 ~/.Xresources xterm*utf8: 2 $ env - PYTHONIOENCODING=utf-8:replace python2 -c "print u's\u00e9duisante'" séduisante $ env - PYTHONIOENCODING=utf-8:replace python3 -c "print('s\u00e9duisante')" séduisante $ env - PYTHONIOENCODING=ascii:replace python2 -c "print u's\u00e9duisante'" s?duisante $ env - PYTHONIOENCODING=ascii:replace python3 -c "print('s\u00e9duisante')" s?duisante $ env - PYTHONIOENCODING=ISO-8859-1:replace python2 -c "print u's\u00e9duisante'" s�duisante $ env - PYTHONIOENCODING=ISO-8859-1:replace python3 -c "print('s\u00e9duisante')" s�duisante Fixes: 046419a3bf8f ("pwclient: Fix encoding problems") Signed-off-by: Robin Jarry <robin.jarry@6wind.com> --- v2: - Always set PYTHONIOENCODING=<enc>:replace (on python 2 and 3) to prevent from UnicodeEncodeErrors patchwork/bin/pwclient | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-)