diff mbox

gtk: use setlocale() for LC_MESSAGES only

Message ID 1450445013.15674.38.camel@redhat.com
State New
Headers show

Commit Message

Gerd Hoffmann Dec. 18, 2015, 1:23 p.m. UTC
On Fr, 2015-12-18 at 12:38 +0100, Kevin Wolf wrote:
> Am 10.09.2015 um 17:19 hat Alberto Garcia geschrieben:
> > The QEMU code is not internationalized and assumes that it runs under
> > the C locale, but if we use the GTK+ UI we'll end up importing the
> > locale settings from the environment. This can break things, such as
> > the JSON generator and iotest 120 in locales that use a decimal comma.
> > 
> > We do however have translations for a few simple strings for the GTK+
> > menu items, so in order to run QEMU using the C locale, and yet have a
> > translated UI let's use setlocale() for LC_MESSAGES only.
> > 
> > Signed-off-by: Alberto Garcia <berto@igalia.com>
> 
> Not sure why I noticed it only now and if it's related to any recent
> package upgrade on my side (using RHEL 7), but I noticed that non-ASCII
> characters in the GTK UI strings are broken for me and git bisect
> pointed to this commit.

I guess we need to set LC_CTYPE too.
Can you try whenever the attached patch fixes the issue?

thanks,
  Gerd

Comments

Kevin Wolf Dec. 18, 2015, 3:38 p.m. UTC | #1
Am 18.12.2015 um 14:23 hat Gerd Hoffmann geschrieben:
> On Fr, 2015-12-18 at 12:38 +0100, Kevin Wolf wrote:
> > Am 10.09.2015 um 17:19 hat Alberto Garcia geschrieben:
> > > The QEMU code is not internationalized and assumes that it runs under
> > > the C locale, but if we use the GTK+ UI we'll end up importing the
> > > locale settings from the environment. This can break things, such as
> > > the JSON generator and iotest 120 in locales that use a decimal comma.
> > > 
> > > We do however have translations for a few simple strings for the GTK+
> > > menu items, so in order to run QEMU using the C locale, and yet have a
> > > translated UI let's use setlocale() for LC_MESSAGES only.
> > > 
> > > Signed-off-by: Alberto Garcia <berto@igalia.com>
> > 
> > Not sure why I noticed it only now and if it's related to any recent
> > package upgrade on my side (using RHEL 7), but I noticed that non-ASCII
> > characters in the GTK UI strings are broken for me and git bisect
> > pointed to this commit.
> 
> I guess we need to set LC_CTYPE too.
> Can you try whenever the attached patch fixes the issue?

Yes, that works for me.

Tested-by: Kevin Wolf <kwolf@redhat.com>
Alberto Garcia Dec. 18, 2015, 6:04 p.m. UTC | #2
>> > We do however have translations for a few simple strings for the GTK+
>> > menu items, so in order to run QEMU using the C locale, and yet have a
>> > translated UI let's use setlocale() for LC_MESSAGES only.
>> > 
>> Not sure why I noticed it only now and if it's related to any recent
>> package upgrade on my side (using RHEL 7), but I noticed that
>> non-ASCII characters in the GTK UI strings are broken for me and git
>> bisect pointed to this commit.
>
> I guess we need to set LC_CTYPE too.

That affects functions in ctype.h (isalpha(), islower(), isupper(), ...)
I guess that's safe?

> @@ -2044,8 +2044,9 @@ void gtk_display_init(DisplayState *ds, bool full_screen, bool grab_on_hover)
>  
>      s->free_scale = FALSE;
>  
> -    /* LC_MESSAGES only. See early_gtk_display_init() for details */
> +    /* LC_MESSAGES+LC_CTYPE only. See early_gtk_display_init() for details */
>      setlocale(LC_MESSAGES, "");
> +    setlocale(LC_CTYPE, "");
>      bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
>      textdomain("qemu");

You can also modify the comment in early_gtk_display_init() to say that
" we support importing LC_MESSAGES and LC_CTYPE from the environment ".

Berto
Markus Armbruster Dec. 18, 2015, 7:55 p.m. UTC | #3
Alberto Garcia <berto@igalia.com> writes:

>>> > We do however have translations for a few simple strings for the GTK+
>>> > menu items, so in order to run QEMU using the C locale, and yet have a
>>> > translated UI let's use setlocale() for LC_MESSAGES only.
>>> > 
>>> Not sure why I noticed it only now and if it's related to any recent
>>> package upgrade on my side (using RHEL 7), but I noticed that
>>> non-ASCII characters in the GTK UI strings are broken for me and git
>>> bisect pointed to this commit.
>>
>> I guess we need to set LC_CTYPE too.
>
> That affects functions in ctype.h (isalpha(), islower(), isupper(), ...)
> I guess that's safe?

If we're guessing, then I guess it isn't.  But we shouldn't be guessing.

"LC_CTYPE affects the behavior of the character handling functions and
the multibyte and wide character functions."

I doubt there's much use for the latter in QEMU itself, but in
libraries, all bets are off.  I guess this is what actually screws up
GTK.

We do use the former.  LC_CTYPE set to some sufficiently funky locale is
bound to upset these uses.

In short: nope, we can't just set LC_CTYPE, at least not without further
analysis.

We should've stayed out of the GUI business.

[...]
Eric Blake Dec. 21, 2015, 5:49 p.m. UTC | #4
On 12/18/2015 12:55 PM, Markus Armbruster wrote:
> Alberto Garcia <berto@igalia.com> writes:
> 
>>>>> We do however have translations for a few simple strings for the GTK+
>>>>> menu items, so in order to run QEMU using the C locale, and yet have a
>>>>> translated UI let's use setlocale() for LC_MESSAGES only.
>>>>>
>>>> Not sure why I noticed it only now and if it's related to any recent
>>>> package upgrade on my side (using RHEL 7), but I noticed that
>>>> non-ASCII characters in the GTK UI strings are broken for me and git
>>>> bisect pointed to this commit.
>>>
>>> I guess we need to set LC_CTYPE too.
>>
>> That affects functions in ctype.h (isalpha(), islower(), isupper(), ...)
>> I guess that's safe?

Gnulib introduces functions named c_isalpha(), c_islower(), and so
forth, which behave identically regardless of the current locale,
precisely because locale-dependent definitions on which byte sequences
form a valid character can cause undesirable behavior.  I don't know if
glib does the same, but it does indeed have the potential to affect us,
in at least util/id.c:id_wellformed().  It would be weird to let the
user's choice of locale determine which ids they can create.

> 
> If we're guessing, then I guess it isn't.  But we shouldn't be guessing.
> 
> "LC_CTYPE affects the behavior of the character handling functions and
> the multibyte and wide character functions."
> 
> I doubt there's much use for the latter in QEMU itself, but in
> libraries, all bets are off.  I guess this is what actually screws up
> GTK.
> 
> We do use the former.  LC_CTYPE set to some sufficiently funky locale is
> bound to upset these uses.
> 
> In short: nope, we can't just set LC_CTYPE, at least not without further
> analysis.

In fact, if LC_CTYPE and LC_COLLATE are incompatible, then strcoll() has
undefined behavior.  GNU coreutils warns:


    Unless otherwise specified, all comparisons use the character
    collating sequence specified by the ‘LC_COLLATE’ locale.(1)
    [...]
    (1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to
    ‘en_US’), then ‘sort’ may produce output that is sorted differently than
    you’re accustomed to.  In that case, set the ‘LC_ALL’ environment
    variable to ‘C’.  Note that setting only ‘LC_COLLATE’ has two problems.
    First, it is ineffective if ‘LC_ALL’ is also set.  Second, it has
    undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is
    set to an incompatible value.  For example, you get undefined behavior
    if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’.

Off-hand, we are specifically NOT calling setlocale() for the categories
that we want to leave in the C locale, so we don't have to worry about
LC_ALL throwing us off.  And I'm hard-pressed to think of an example
where LC_COLLATE=C while LC_CTYPE is a multibyte character will cause
unusual sorting artifacts (the one that coreutils is warning against is
when you have two incompatibly different multibyte character sets
involved, where our case is a multibyte character set for display but a
unibyte set for collation).  But it is indeed a can of worms, that
requires special analysis.
diff mbox

Patch

From 54821a4b405ca31c997485b563ec5c43dd53e4ed Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Fri, 18 Dec 2015 14:15:56 +0100
Subject: [PATCH] gtk: fix utf8 strings in the ui

Commit "2cb5d2a gtk: use setlocale() for LC_MESSAGES only" restricts
locate settings to LC_MESSAGES, to avoid bugs caused by locale-specific
number printing (LC_NUMERIC) and possibly others.

We need LC_CTYPE too to make messages with chars outside us-ascii work
correctly.  Add it.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 ui/gtk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 47b37e1..30407a5 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -2044,8 +2044,9 @@  void gtk_display_init(DisplayState *ds, bool full_screen, bool grab_on_hover)
 
     s->free_scale = FALSE;
 
-    /* LC_MESSAGES only. See early_gtk_display_init() for details */
+    /* LC_MESSAGES+LC_CTYPE only. See early_gtk_display_init() for details */
     setlocale(LC_MESSAGES, "");
+    setlocale(LC_CTYPE, "");
     bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
     textdomain("qemu");
 
-- 
1.8.3.1