diff mbox

pwclient: Fix silent crash on python 2

Message ID 20170328092624.21943-1-robin.jarry@6wind.com
State Superseded
Headers show

Commit Message

Robin Jarry March 28, 2017, 9:26 a.m. UTC
Replacing sys.stdout and sys.stderr can cause obscure crashes when
trying to write non unicode data. The interpreter is terminated with
SIGINT without any specific error writen on the console.

  rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
  {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0

This happens easily when there is an untrapped exception which should
lead to printing a traceback on stderr.

To fix this, the only way is to make sure that the PYTHONIOENCODING env
variable is set *before* starting the interpreter as the initialization
is made very early on and the encoding cannot be set or modified after.

On python 3, the default IO encoding is already properly set according
to locale.

Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
Signed-off-by: Robin Jarry <robin.jarry@6wind.com>
---
 patchwork/bin/pwclient | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

Comments

Stephen Finucane April 5, 2017, 10:15 a.m. UTC | #1
On Tue, 2017-03-28 at 11:26 +0200, Robin Jarry wrote:
> Replacing sys.stdout and sys.stderr can cause obscure crashes when
> trying to write non unicode data. The interpreter is terminated with
> SIGINT without any specific error writen on the console.
> 
>   rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
>   {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0
> 
> This happens easily when there is an untrapped exception which should
> lead to printing a traceback on stderr.
> 
> To fix this, the only way is to make sure that the PYTHONIOENCODING
> env
> variable is set *before* starting the interpreter as the
> initialization
> is made very early on and the encoding cannot be set or modified
> after.
> 
> On python 3, the default IO encoding is already properly set
> according
> to locale.
> 
> Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
> Signed-off-by: Robin Jarry <robin.jarry@6wind.com>

I've no idea how I could even go about testing this. However, it looks
sane and you've tested it locally so...

Acked-by: Stephen Finucane <stephen@that.guru>

...and applied.
Stephen Finucane April 5, 2017, 10:20 a.m. UTC | #2
On Wed, 2017-04-05 at 11:15 +0100, Stephen Finucane wrote:
> On Tue, 2017-03-28 at 11:26 +0200, Robin Jarry wrote:
> > Replacing sys.stdout and sys.stderr can cause obscure crashes when
> > trying to write non unicode data. The interpreter is terminated
> > with
> > SIGINT without any specific error writen on the console.
> > 
> >   rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f964820e8d0},
> >   {0x559f50, [], SA_RESTORER, 0x7f964820e8d0}, 8) = 0
> > 
> > This happens easily when there is an untrapped exception which
> > should
> > lead to printing a traceback on stderr.
> > 
> > To fix this, the only way is to make sure that the PYTHONIOENCODING
> > env
> > variable is set *before* starting the interpreter as the
> > initialization
> > is made very early on and the encoding cannot be set or modified
> > after.
> > 
> > On python 3, the default IO encoding is already properly set
> > according
> > to locale.
> > 
> > Fixes: 046419a3bf8f ("pwclient: Fix encoding problems")
> > Signed-off-by: Robin Jarry <robin.jarry@6wind.com>
> 
> I've no idea how I could even go about testing this. However, it
> looks
> sane and you've tested it locally so...
> 
> Acked-by: Stephen Finucane <stephen@that.guru>
> 
> ...and applied.

Actually, I take that back. Let's see if there's a less hacky way to do
this. Seems this is a fairly common problem with a few fixes available.
Would a fix like [1] do the job?

Also, could you provide some guidance as to how I could reproduce the
issue myself? Is the Patchwork instance using Python 2 or 3? What
environment is pwclient running in?

Stephen

[1] http://stackoverflow.com/a/11742574/613428

> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Robin Jarry April 5, 2017, 11:35 a.m. UTC | #3
Hi Stephen,

thanks for your reply. It has made me look deeper and I found that my
fix is not completely satisfying. Python 3 is also concerned. I will
submit another version ASAP.

The problem that I was trying to fix with my previous patch only appears
when neither the LANG nor LC_ALL env vars are set. Python (2 and 3)
assumes ASCII encoding by default which causes nasty errors like the
following:

    $ env -u LC_ALL -u LANG python2 -c "print u's\u00e9duisante'"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9'

    $ env -u LC_ALL -u LANG python3 -c "print('s\u00e9duisante')"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character '\xe9'

When either LANG or LC_ALL are set to something else than 'C', python
tries to encode accordingly.

    $ env -u LC_ALL - LANG=C.UTF-8 python3 -c "print(u's\u00e9duisante')"
    séduisante

    $ env -u LC_ALL - LANG=français python3 -c "print(u's\u00e9duisante')"
    s�duisante

My new patch makes sure that the default encoding is never 'ascii' to
avoid the previous problems. But without the nasty hack that replaces
stdout and stderr.

2017-04-05, Stephen Finucane:
> Actually, I take that back. Let's see if there's a less hacky way to do
> this. Seems this is a fairly common problem with a few fixes available.
> Would a fix like [1] do the job?
> 
> [1] http://stackoverflow.com/a/11742574/613428

This approach is risky. sys.defaultencoding is used to decode python
source. Have a look at these links:

http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

> Also, could you provide some guidance as to how I could reproduce the
> issue myself? Is the Patchwork instance using Python 2 or 3? What
> environment is pwclient running in?

The issue can be reproduced with pwclient executed with python 2 (server
python version does not matter). Simply put an invalid URL in
~/.pwclientrc (note the missing /):

    $ cat ~/.pwclientrc
    [options]
    default = dpdk
    [dpdk]
    url= http:/dpdk.org/dev/patchwork/xmlrpc/

Then run this:

    $ git describe
    v1.1.0-384-g21e2926743a1
    $ python2 ./patchwork/bin/pwclient projects
    $ echo $?
    1
diff mbox

Patch

diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient
index ed0351bf5288..7a5c32fd66eb 100755
--- a/patchwork/bin/pwclient
+++ b/patchwork/bin/pwclient
@@ -43,14 +43,6 @@  import re
 import io
 import locale
 
-if sys.version_info.major == 2:
-    # hack to make writing unicode to standard output/error work on Python 2
-    OUT_ENCODING = (sys.stdout.encoding or locale.getpreferredencoding() or
-                    os.getenv('PYTHONIOENCODING', 'utf-8'))
-    sys.stdout = io.open(sys.stdout.fileno(), mode='w',
-                         encoding=OUT_ENCODING, errors='replace')
-    sys.stderr = io.open(sys.stderr.fileno(), mode='w',
-                         encoding=OUT_ENCODING, errors='replace')
 
 # Default Patchwork remote XML-RPC server URL
 # This script will check the PW_XMLRPC_URL environment variable
@@ -821,5 +813,30 @@  def main():
         sys.exit(1)
 
 
+def force_io_encoding():
+    """
+    Force python IO (stdout/err) encoding to something else than ASCII to
+    support printing unicode strings.
+
+    The only way to do it is via an environment variable *before* starting the
+    interpreter.
+
+    If PYTHONIOENCODING is not set, initialize it from locale or set default
+    value. Then re-execute the program so that IO encoding is properly
+    initialized.
+
+    The ":replace" suffix is to make python not crash on encoding errors but
+    replace them by printable characters.
+    """
+    if 'PYTHONIOENCODING' in os.environ or sys.version_info.major > 2:
+        return
+
+    encoding = locale.getpreferredencoding() or 'utf-8'
+
+    os.environ['PYTHONIOENCODING'] = encoding + ':replace'
+    os.execvp(sys.executable, [sys.executable] + sys.argv)  # no return
+
+
 if __name__ == "__main__":
+    force_io_encoding()
     main()