[v2] pwclient: Fix silent crash on python 2

Submitted by Robin Jarry on April 5, 2017, 12:46 p.m.

Details

Message ID 20170405124604.28022-1-robin.jarry@6wind.com
State Superseded
Headers show

Commit Message

Robin Jarry April 5, 2017, 12:46 p.m.
Replacing sys.stdout and sys.stderr can cause obscure crashes when
trying to write non unicode data. The interpreter is terminated with
SIGINT without any specific error writen on the console.

  rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
  {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0

This happens easily when there is an untrapped exception which should
lead to printing a traceback on stderr.

The only way to prevent UnicodeEncodeErrors is to make sure that
PYTHONIOENCODING is set with the ':replace' suffix and this can only be
done *before* starting the interpreter as the initialization is made
very early on and the encoding cannot be set or modified after.

From the official documentation:

  PYTHONIOENCODING

  Overrides the encoding used for stdin/stdout/stderr, in the syntax
  encodingname:errorhandler. The :errorhandler part is optional and
  has the same meaning as in str.encode().

  https://docs.python.org/2/using/cmdline.html
  https://docs.python.org/3/using/cmdline.html

Of course, for proper encoding of unicode characters, one of the
locale-related environment variables (LC_ALL, LANG, LANGUAGE, etc.) must
be set. Python will use the correct encoding accordingly and
PYTHONIOENCODING will be set to "encoding:replace".

Examples:

  $ grep utf8 ~/.Xresources
  xterm*utf8: 2

  $ env - PYTHONIOENCODING=utf-8:replace python2 -c "print u's\u00e9duisante'"
  séduisante
  $ env - PYTHONIOENCODING=utf-8:replace python3 -c "print('s\u00e9duisante')"
  séduisante

  $ env - PYTHONIOENCODING=ascii:replace python2 -c "print u's\u00e9duisante'"
  s?duisante
  $ env - PYTHONIOENCODING=ascii:replace python3 -c "print('s\u00e9duisante')"
  s?duisante

  $ env - PYTHONIOENCODING=ISO-8859-1:replace python2 -c "print u's\u00e9duisante'"
  s�duisante
  $ env - PYTHONIOENCODING=ISO-8859-1:replace python3 -c "print('s\u00e9duisante')"
  s�duisante

Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
Signed-off-by: Robin Jarry <robin.jarry@6wind.com>
---
v2:

- Always set PYTHONIOENCODING=<enc>:replace (on python 2 and 3) to prevent from
  UnicodeEncodeErrors

 patchwork/bin/pwclient | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

Comments

Robin Jarry April 22, 2017, 11:32 a.m.
Gentle bump :-)

Robin
Stephen Finucane April 28, 2017, 9:46 p.m.
On Wed, 2017-04-05 at 14:46 +0200, Robin Jarry wrote:
> Replacing sys.stdout and sys.stderr can cause obscure crashes when
> trying to write non unicode data. The interpreter is terminated with
> SIGINT without any specific error writen on the console.
> 
>   rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
>   {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0
> 
> This happens easily when there is an untrapped exception which should
> lead to printing a traceback on stderr.
> 
> The only way to prevent UnicodeEncodeErrors is to make sure that
> PYTHONIOENCODING is set with the ':replace' suffix and this can only
> be
> done *before* starting the interpreter as the initialization is made
> very early on and the encoding cannot be set or modified after.
> 
> From the official documentation:
> 
>   PYTHONIOENCODING
> 
>   Overrides the encoding used for stdin/stdout/stderr, in the syntax
>   encodingname:errorhandler. The :errorhandler part is optional and
>   has the same meaning as in str.encode().
> 
>   https://docs.python.org/2/using/cmdline.html
>   https://docs.python.org/3/using/cmdline.html
> 
> Of course, for proper encoding of unicode characters, one of the
> locale-related environment variables (LC_ALL, LANG, LANGUAGE, etc.)
> must
> be set. Python will use the correct encoding accordingly and
> PYTHONIOENCODING will be set to "encoding:replace".
> 
> Examples:
> 
>   $ grep utf8 ~/.Xresources
>   xterm*utf8: 2
> 
>   $ env - PYTHONIOENCODING=utf-8:replace python2 -c "print
> u's\u00e9duisante'"
>   séduisante
>   $ env - PYTHONIOENCODING=utf-8:replace python3 -c
> "print('s\u00e9duisante')"
>   séduisante
> 
>   $ env - PYTHONIOENCODING=ascii:replace python2 -c "print
> u's\u00e9duisante'"
>   s?duisante
>   $ env - PYTHONIOENCODING=ascii:replace python3 -c
> "print('s\u00e9duisante')"
>   s?duisante
> 
>   $ env - PYTHONIOENCODING=ISO-8859-1:replace python2 -c "print
> u's\u00e9duisante'"
>   s�duisante
>   $ env - PYTHONIOENCODING=ISO-8859-1:replace python3 -c
> "print('s\u00e9duisante')"
>   s�duisante
> 
> Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
> Signed-off-by: Robin Jarry <robin.jarry@6wind.com>

I understand the issue now - thanks for clarifying :) Sorry for the
delay in replying also.

Now, while I understand what you're doing here, I'm wondering if we
could simply revert the original patch instead. I clearly merged that
without fully understanding the issue and not only does it cause the
issues you've trying to resolve here but it seems it's not necessary
for a lot of people (including me. Fedora 24). For those users that are
affected, there are less hacky workarounds. For example, a simple
script like so would so the job, I imagine.

    $ cat pwclient2
    #!/usr/bin/env bash
    PYTHONIOENCODING=UTF-8 pwclient

I'd be happy to include an epilog detailing this issue that would
appear when you'd run '--help'. I realize that were still be an extra
step for affected users, but the vast majority aren't and wouldn't be
affected.

Would this be acceptable?

Stephen

PS: What kind of environment are you encountering this issue in?

> ---
> v2:
> 
> - Always set PYTHONIOENCODING=<enc>:replace (on python 2 and 3) to
> prevent from
>   UnicodeEncodeErrors
> 
>  patchwork/bin/pwclient | 35 ++++++++++++++++++++++++++---------
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient
> index ed0351bf5288..5a7b6723afe3 100755
> --- a/patchwork/bin/pwclient
> +++ b/patchwork/bin/pwclient
> @@ -41,16 +41,7 @@ except ImportError:
>  import shutil
>  import re
>  import io
> -import locale
>  
> -if sys.version_info.major == 2:
> -    # hack to make writing unicode to standard output/error work on
> Python 2
> -    OUT_ENCODING = (sys.stdout.encoding or
> locale.getpreferredencoding() or
> -                    os.getenv('PYTHONIOENCODING', 'utf-8'))
> -    sys.stdout = io.open(sys.stdout.fileno(), mode='w',
> -                         encoding=OUT_ENCODING, errors='replace')
> -    sys.stderr = io.open(sys.stderr.fileno(), mode='w',
> -                         encoding=OUT_ENCODING, errors='replace')
>  
>  # Default Patchwork remote XML-RPC server URL
>  # This script will check the PW_XMLRPC_URL environment variable
> @@ -821,5 +812,31 @@ def main():
>          sys.exit(1)
>  
>  
> +def force_io_encoding():
> +    """
> +    Force PYTHONIOENCODING ":errorhandler" to avoid
> UnicodeEncodeErrors. The
> +    only way to do it is to set the environment variable *before*
> starting the
> +    interpreter. From the python docs:
> +
> +      PYTHONIOENCODING
> +
> +      Overrides the encoding used for stdin/stdout/stderr, in the
> syntax
> +      encodingname:errorhandler. The :errorhandler part is optional
> and has the
> +      same meaning as in str.encode().
> +
> +    Note that this only prevents interpreter crashes, it does not
> exempt from
> +    correctly setting the LANG or LC_ALL variables in order to have
> valid
> +    output.
> +    """
> +    if 'PYTHONIOENCODING' in os.environ:
> +        return
> +
> +    encoding = sys.stdout.encoding or 'utf-8'
> +
> +    os.environ['PYTHONIOENCODING'] = encoding + ':replace'
> +    os.execvp(sys.executable, [sys.executable] + sys.argv)  # no
> return
> +
> +
>  if __name__ == "__main__":
> +    force_io_encoding()
>      main()
Robin Jarry April 30, 2017, 7:54 p.m.
Hi Stephen,

On Apr 28, 2017 11:46 PM, "Stephen Finucane" <stephen@that.guru> wrote:
>
> I understand the issue now - thanks for clarifying :) Sorry for the
> delay in replying also.
>
> Now, while I understand what you're doing here, I'm wondering if we
> could simply revert the original patch instead. I clearly merged that
> without fully understanding the issue and not only does it cause the
> issues you've trying to resolve here but it seems it's not necessary
> for a lot of people (including me. Fedora 24). For those users that are
> affected, there are less hacky workarounds. For example, a simple
> script like so would so the job, I imagine.
>
>     $ cat pwclient2
>     #!/usr/bin/env bash
>     PYTHONIOENCODING=UTF-8 pwclient
>
> I'd be happy to include an epilog detailing this issue that would
> appear when you'd run '--help'. I realize that were still be an extra
> step for affected users, but the vast majority aren't and wouldn't be
> affected.
>
> Would this be acceptable?

I wouldn't revert the whole patch, I remember I fixed some minor things
which should remain.

However, in addition of a small help messge, we could also wrap the call to
main() to display a more useful error message:

  try:
      main()
  except UnicodeEncodeError as e
      traceback.print_traceback()
      sys.stderr.write('you should export the LANG or LC_ALL environment
variables\n')
      sys.exit(1)

what do you think?

> Stephen
>
> PS: What kind of environment are you encountering this issue in?

On debian 8 (stable).

Robin
Stephen Finucane May 2, 2017, 9:31 a.m.
On Sun, 2017-04-30 at 21:54 +0200, Robin Jarry wrote:
> Hi Stephen,
>
> On Apr 28, 2017 11:46 PM, "Stephen Finucane" <stephen@that.guru>
> wrote:
> >
> > I understand the issue now - thanks for clarifying :) Sorry for the
> > delay in replying also.
> >
> > Now, while I understand what you're doing here, I'm wondering if we
> > could simply revert the original patch instead. I clearly merged
> > that without fully understanding the issue and not only does it 
> > cause the issues you've trying to resolve here but it seems it's
> > not necessary for a lot of people (including me. Fedora 24). For 
> > those users that are affected, there are less hacky workarounds.
> > For example, a simple script like so would so the job, I imagine.
> >
> >     $ cat pwclient2
> >     #!/usr/bin/env bash
> >     PYTHONIOENCODING=UTF-8 pwclient
> >
> > I'd be happy to include an epilog detailing this issue that would
> > appear when you'd run '--help'. I realize that were still be an 
> > extra step for affected users, but the vast majority aren't and
> > wouldn't be affected.
> >
> > Would this be acceptable?
>
> I wouldn't revert the whole patch, I remember I fixed some minor
> things which should remain. However, in addition of a small help
> messge, we could also wrap the call to main() to display a more
> useful error message:
>
>   try:
>       main()
>   except UnicodeEncodeError as e
>       traceback.print_traceback()
>       sys.stderr.write('you should export the LANG or LC_ALL
> environment variables\n')
>       sys.exit(1)
>
> what do you think?

Yup. That's definitely worth including. I guess you'd want to export
PYTHONIOENCODING too, and maybe suggest they check out the help page
for more information if it's failed?

Do you want to put that patch together as a v3?

Stephen

Patch hide | download patch | download mbox

diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient
index ed0351bf5288..5a7b6723afe3 100755
--- a/patchwork/bin/pwclient
+++ b/patchwork/bin/pwclient
@@ -41,16 +41,7 @@  except ImportError:
 import shutil
 import re
 import io
-import locale
 
-if sys.version_info.major == 2:
-    # hack to make writing unicode to standard output/error work on Python 2
-    OUT_ENCODING = (sys.stdout.encoding or locale.getpreferredencoding() or
-                    os.getenv('PYTHONIOENCODING', 'utf-8'))
-    sys.stdout = io.open(sys.stdout.fileno(), mode='w',
-                         encoding=OUT_ENCODING, errors='replace')
-    sys.stderr = io.open(sys.stderr.fileno(), mode='w',
-                         encoding=OUT_ENCODING, errors='replace')
 
 # Default Patchwork remote XML-RPC server URL
 # This script will check the PW_XMLRPC_URL environment variable
@@ -821,5 +812,31 @@  def main():
         sys.exit(1)
 
 
+def force_io_encoding():
+    """
+    Force PYTHONIOENCODING ":errorhandler" to avoid UnicodeEncodeErrors. The
+    only way to do it is to set the environment variable *before* starting the
+    interpreter. From the python docs:
+
+      PYTHONIOENCODING
+
+      Overrides the encoding used for stdin/stdout/stderr, in the syntax
+      encodingname:errorhandler. The :errorhandler part is optional and has the
+      same meaning as in str.encode().
+
+    Note that this only prevents interpreter crashes, it does not exempt from
+    correctly setting the LANG or LC_ALL variables in order to have valid
+    output.
+    """
+    if 'PYTHONIOENCODING' in os.environ:
+        return
+
+    encoding = sys.stdout.encoding or 'utf-8'
+
+    os.environ['PYTHONIOENCODING'] = encoding + ':replace'
+    os.execvp(sys.executable, [sys.executable] + sys.argv)  # no return
+
+
 if __name__ == "__main__":
+    force_io_encoding()
     main()