[v2] pwclient: Fix silent crash on python 2

Submitted by Robin Jarry on April 5, 2017, 12:46 p.m.

Details

Message ID 20170405124604.28022-1-robin.jarry@6wind.com
State New
Headers show

Commit Message

Robin Jarry April 5, 2017, 12:46 p.m.
Replacing sys.stdout and sys.stderr can cause obscure crashes when
trying to write non unicode data. The interpreter is terminated with
SIGINT without any specific error writen on the console.

  rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
  {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0

This happens easily when there is an untrapped exception which should
lead to printing a traceback on stderr.

The only way to prevent UnicodeEncodeErrors is to make sure that
PYTHONIOENCODING is set with the ':replace' suffix and this can only be
done *before* starting the interpreter as the initialization is made
very early on and the encoding cannot be set or modified after.

From the official documentation:

  PYTHONIOENCODING

  Overrides the encoding used for stdin/stdout/stderr, in the syntax
  encodingname:errorhandler. The :errorhandler part is optional and
  has the same meaning as in str.encode().

  https://docs.python.org/2/using/cmdline.html
  https://docs.python.org/3/using/cmdline.html

Of course, for proper encoding of unicode characters, one of the
locale-related environment variables (LC_ALL, LANG, LANGUAGE, etc.) must
be set. Python will use the correct encoding accordingly and
PYTHONIOENCODING will be set to "encoding:replace".

Examples:

  $ grep utf8 ~/.Xresources
  xterm*utf8: 2

  $ env - PYTHONIOENCODING=utf-8:replace python2 -c "print u's\u00e9duisante'"
  séduisante
  $ env - PYTHONIOENCODING=utf-8:replace python3 -c "print('s\u00e9duisante')"
  séduisante

  $ env - PYTHONIOENCODING=ascii:replace python2 -c "print u's\u00e9duisante'"
  s?duisante
  $ env - PYTHONIOENCODING=ascii:replace python3 -c "print('s\u00e9duisante')"
  s?duisante

  $ env - PYTHONIOENCODING=ISO-8859-1:replace python2 -c "print u's\u00e9duisante'"
  s�duisante
  $ env - PYTHONIOENCODING=ISO-8859-1:replace python3 -c "print('s\u00e9duisante')"
  s�duisante

Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
Signed-off-by: Robin Jarry <robin.jarry@6wind.com>
---
v2:

- Always set PYTHONIOENCODING=<enc>:replace (on python 2 and 3) to prevent from
  UnicodeEncodeErrors

 patchwork/bin/pwclient | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

Comments

Robin Jarry April 22, 2017, 11:32 a.m.
Gentle bump :-)

Robin

Patch hide | download patch | download mbox

diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient
index ed0351bf5288..5a7b6723afe3 100755
--- a/patchwork/bin/pwclient
+++ b/patchwork/bin/pwclient
@@ -41,16 +41,7 @@  except ImportError:
 import shutil
 import re
 import io
-import locale
 
-if sys.version_info.major == 2:
-    # hack to make writing unicode to standard output/error work on Python 2
-    OUT_ENCODING = (sys.stdout.encoding or locale.getpreferredencoding() or
-                    os.getenv('PYTHONIOENCODING', 'utf-8'))
-    sys.stdout = io.open(sys.stdout.fileno(), mode='w',
-                         encoding=OUT_ENCODING, errors='replace')
-    sys.stderr = io.open(sys.stderr.fileno(), mode='w',
-                         encoding=OUT_ENCODING, errors='replace')
 
 # Default Patchwork remote XML-RPC server URL
 # This script will check the PW_XMLRPC_URL environment variable
@@ -821,5 +812,31 @@  def main():
         sys.exit(1)
 
 
+def force_io_encoding():
+    """
+    Force PYTHONIOENCODING ":errorhandler" to avoid UnicodeEncodeErrors. The
+    only way to do it is to set the environment variable *before* starting the
+    interpreter. From the python docs:
+
+      PYTHONIOENCODING
+
+      Overrides the encoding used for stdin/stdout/stderr, in the syntax
+      encodingname:errorhandler. The :errorhandler part is optional and has the
+      same meaning as in str.encode().
+
+    Note that this only prevents interpreter crashes, it does not exempt from
+    correctly setting the LANG or LC_ALL variables in order to have valid
+    output.
+    """
+    if 'PYTHONIOENCODING' in os.environ:
+        return
+
+    encoding = sys.stdout.encoding or 'utf-8'
+
+    os.environ['PYTHONIOENCODING'] = encoding + ':replace'
+    os.execvp(sys.executable, [sys.executable] + sys.argv)  # no return
+
+
 if __name__ == "__main__":
+    force_io_encoding()
     main()