From patchwork Mon Feb 26 15:11:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mike FABIAN X-Patchwork-Id: 877940 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-90590-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="N/zHfUsW"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zqldx1HWTz9s19 for ; Tue, 27 Feb 2018 02:11:16 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=praEP6tjSF/5teW270oq4+CDsvCfijrBTp2eAhIZr23oYdX6vcYmC mydjuLWZbkG3GCIwGqgdDk3UZ86mYjraoC858M1iOa9F3MrLoPFOlNktiY1LUI9s A+9sxQqinczSZX4vilcr3wSfL9ETwNl6jB+kduDz1c5YelFBy4E/lI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :mime-version:content-type:content-transfer-encoding; s=default; bh=zhq/TTMpnB5/zFw8O+08j0ZNnSc=; b=N/zHfUsWDsUswRVck+TKS4TG8/y/ PcBipNja2/n40H9lFc49ZpANmJr8sDpUbZe0iHJGXCKY67RfdtU7/kS4CxzBlFfM OOmeZrV5TSUwTTTKxt7+aelhtbChrvO0t/oLXf2X/N3M0CkEQkNbISYUA2Tc+mYK TB2zQC9t20m6GtM= Received: (qmail 104044 invoked by alias); 26 Feb 2018 15:11:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 103929 invoked by uid 89); 26 Feb 2018 15:11:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=z, Hx-languages-length:6515 X-HELO: mail-wr0-f174.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:organization:date:message-id :user-agent:mime-version:content-disposition :content-transfer-encoding; bh=xdDkhKa2js8a+u0TfnUHZlQlTsqo5jPDurm5d12pYGQ=; b=mE68zfIi18Dp/fQh4mremRTD4QuogVX4/97tW7lSNqRbqQHGgMWed+tv0AoaSW+5Ve nfpZy+UUC/QoHSAjCuM2nDBQmc130Ogq1r2PdzQN38EtiEAKgOeAhRiZ+la4xaT4QdiD JrRJ3Ghf4CFu1kLUVOlKJR3V34+T/Nfn0J748ak8bmzogZz0Rt9DSfdZlhE9iFKK+ETA 6pzAdJamehDiym/vlfgmJKMpuvAZ2U/a+rFnwudXdeeqocCdmbg4uZcbrwiHb2VIXEoi r//1Mts5qfmIZHSfsndwsZdfYIzIlSRMyBRYFvp7T7fAwXYhhp9fCOMwclUn1ysLe08G y4Gg== X-Gm-Message-State: APf1xPBtdKfzZlhuxAFqQZqW6Tair+rE5tx8qxDOZeaS7pPkaoddw9FF fAri6oy0Ac+DFEccVr8C9VGRx4uJMw== X-Google-Smtp-Source: AH8x227ljau3TvjJ9oD1InsQqOja33KN/vxC6/ySzECvVioIOZvTmzNil97nb+tdteDQgNwBwEt1MQ== X-Received: by 10.223.136.44 with SMTP id d41mr10250014wrd.127.1519657863714; Mon, 26 Feb 2018 07:11:03 -0800 (PST) From: Mike FABIAN To: libc-alpha@sourceware.org Cc: "Dmitry V. Levin" Subject: [Patch v4 11/14] [BZ #14095] update collation data from Unicode / ISO 14651 Date: Mon, 26 Feb 2018 16:11:02 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Disposition: inline; filename=0011-Fix-test-cases-tst-fnmatch-and-tst-regexloc-for-the-.patch Reviewed-by: Carlos O'Donell From 19460537f923c9b1ba7668de3b7ac7fa75ce687b Mon Sep 17 00:00:00 2001 From: Mike FABIAN Date: Tue, 23 Jan 2018 17:29:36 +0100 Subject: [PATCH 11/14] Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_common file. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit See: http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html > A range expression represents the set of collating elements that fall > between two elements in the current collation sequence, > inclusively. It is expressed as the starting point and the ending > point separated by a hyphen (-). > > Range expressions must not be used in portable applications because > their behaviour is dependent on the collating sequence. Ranges will be > treated according to the current collating sequence, and include such > characters that fall within the range based on that collating > sequence, regardless of character values. This, however, means that > the interpretation will differ depending on collating sequence. If, > for instance, one collating sequence defines ä as a variant of a, > while another defines it as a letter following z, then the expression > [ä-z] is valid in the first language and invalid in the second. Therefore, using [a-z] does not make much sense except in the C/POSIX locale. The new iso14651_t1_common lists upper case and lower case Latin characters in a different order than the old one which causes surprising results for example in the de_DE locale: [a-z] now includes A because A comes after a in iso14651_t1_common but does not include Z because that comes after z in iso14651_t1_common. * posix/tst-fnmatch.input: Use range expressions only in C locale. * posix/tst-regexloc.c: Do not use a range expression for de_DE.ISO-8859-1 locale. --- posix/tst-fnmatch.input | 58 +++++++++++++++++++++++++++++++++++-------------- posix/tst-regexloc.c | 4 ++-- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/posix/tst-fnmatch.input b/posix/tst-fnmatch.input index 88b3f739a5..589fb2a940 100644 --- a/posix/tst-fnmatch.input +++ b/posix/tst-fnmatch.input @@ -418,21 +418,47 @@ C "-" "[Z-\\]]" NOMATCH # Following are tests outside the scope of IEEE 2003.2 since they are using # locales other than the C locale. The main focus of the tests is on the # handling of ranges and the recognition of character (vs bytes). +# +# See: +# +# http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html +# +# > A range expression represents the set of collating elements that fall +# > between two elements in the current collation sequence, +# > inclusively. It is expressed as the starting point and the ending +# > point separated by a hyphen (-). +# > +# > Range expressions must not be used in portable applications because +# > their behaviour is dependent on the collating sequence. Ranges will be +# > treated according to the current collating sequence, and include such +# > characters that fall within the range based on that collating +# > sequence, regardless of character values. This, however, means that +# > the interpretation will differ depending on collating sequence. If, +# > for instance, one collating sequence defines ä as a variant of a, +# > while another defines it as a letter following z, then the expression +# > [ä-z] is valid in the first language and invalid in the second. +# +# Therefore, using [a-z] does not make much sense except in the C/POSIX locale. +# The new iso14651_t1_common lists upper case and lower case Latin characters +# in a different order than the old one which causes surprising results +# for example in the de_DE locale: [a-z] now includes A because A comes +# after a in iso14651_t1_common but does not include Z because that comes +# after z in iso14651_t1_common. de_DE.ISO-8859-1 "a" "[a-z]" 0 de_DE.ISO-8859-1 "z" "[a-z]" 0 de_DE.ISO-8859-1 "ä" "[a-z]" 0 de_DE.ISO-8859-1 "ö" "[a-z]" 0 de_DE.ISO-8859-1 "ü" "[a-z]" 0 -de_DE.ISO-8859-1 "A" "[a-z]" NOMATCH +de_DE.ISO-8859-1 "A" "[a-z]" 0 # surprising but correct! de_DE.ISO-8859-1 "Z" "[a-z]" NOMATCH -de_DE.ISO-8859-1 "Ä" "[a-z]" NOMATCH -de_DE.ISO-8859-1 "Ö" "[a-z]" NOMATCH -de_DE.ISO-8859-1 "Ü" "[a-z]" NOMATCH +de_DE.ISO-8859-1 "Ä" "[a-z]" 0 # surprising but correct! +de_DE.ISO-8859-1 "Ö" "[a-z]" 0 # surprising but correct! +de_DE.ISO-8859-1 "Ü" "[a-z]" 0 # surprising but correct! de_DE.ISO-8859-1 "a" "[A-Z]" NOMATCH -de_DE.ISO-8859-1 "z" "[A-Z]" NOMATCH -de_DE.ISO-8859-1 "ä" "[A-Z]" NOMATCH -de_DE.ISO-8859-1 "ö" "[A-Z]" NOMATCH -de_DE.ISO-8859-1 "ü" "[A-Z]" NOMATCH +de_DE.ISO-8859-1 "z" "[A-Z]" 0 # surprising but correct! +de_DE.ISO-8859-1 "ä" "[A-Z]" 0 # surprising but correct! +de_DE.ISO-8859-1 "ö" "[A-Z]" 0 # surprising but correct! +de_DE.ISO-8859-1 "ü" "[A-Z]" 0 # surprising but correct! de_DE.ISO-8859-1 "A" "[A-Z]" 0 de_DE.ISO-8859-1 "Z" "[A-Z]" 0 de_DE.ISO-8859-1 "Ä" "[A-Z]" 0 @@ -515,16 +541,16 @@ de_DE.UTF-8 "z" "[a-z]" 0 de_DE.UTF-8 "ä" "[a-z]" 0 de_DE.UTF-8 "ö" "[a-z]" 0 de_DE.UTF-8 "ü" "[a-z]" 0 -de_DE.UTF-8 "A" "[a-z]" NOMATCH +de_DE.UTF-8 "A" "[a-z]" 0 # surprising but correct! de_DE.UTF-8 "Z" "[a-z]" NOMATCH -de_DE.UTF-8 "Ä" "[a-z]" NOMATCH -de_DE.UTF-8 "Ö" "[a-z]" NOMATCH -de_DE.UTF-8 "Ãœ" "[a-z]" NOMATCH +de_DE.UTF-8 "Ä" "[a-z]" 0 # surprising but correct! +de_DE.UTF-8 "Ö" "[a-z]" 0 # surprising but correct! +de_DE.UTF-8 "Ãœ" "[a-z]" 0 # surprising but correct! de_DE.UTF-8 "a" "[A-Z]" NOMATCH -de_DE.UTF-8 "z" "[A-Z]" NOMATCH -de_DE.UTF-8 "ä" "[A-Z]" NOMATCH -de_DE.UTF-8 "ö" "[A-Z]" NOMATCH -de_DE.UTF-8 "ü" "[A-Z]" NOMATCH +de_DE.UTF-8 "z" "[A-Z]" 0 # surprising but correct! +de_DE.UTF-8 "ä" "[A-Z]" 0 # surprising but correct! +de_DE.UTF-8 "ö" "[A-Z]" 0 # surprising but correct! +de_DE.UTF-8 "ü" "[A-Z]" 0 # surprising but correct! de_DE.UTF-8 "A" "[A-Z]" 0 de_DE.UTF-8 "Z" "[A-Z]" 0 de_DE.UTF-8 "Ä" "[A-Z]" 0 diff --git a/posix/tst-regexloc.c b/posix/tst-regexloc.c index 60235b4d3b..7fbc496d0c 100644 --- a/posix/tst-regexloc.c +++ b/posix/tst-regexloc.c @@ -29,8 +29,8 @@ do_test (void) if (setlocale (LC_ALL, "de_DE.ISO-8859-1") == NULL) puts ("cannot set locale"); - else if (regcomp (&re, "[a-f]*", 0) != REG_NOERROR) - puts ("cannot compile expression \"[a-f]*\""); + else if (regcomp (&re, "[abcdef]*", 0) != REG_NOERROR) + puts ("cannot compile expression \"[abcdef]*\""); else if (regexec (&re, "abcdefCDEF", 1, mat, 0) == REG_NOMATCH) puts ("no match"); else -- 2.14.3