From patchwork Wed Dec 30 20:15:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1421440 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=sourceware.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=xRi4/m8J; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4D5jFj2B4Qz9sRf for ; Thu, 31 Dec 2020 07:15:20 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4FE28385042E; Wed, 30 Dec 2020 20:15:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4FE28385042E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1609359315; bh=H+k3pqoJjWu2cTUOL2uSXLK8Rd19WA79DWUyMAxZIGg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=xRi4/m8JqpvS7L0dQ+w5vwaK29uyOAAMGlfrlaxpAyQcziZp/rTeNx8dLCo8v3Aeg o8CpmVKVyip7B4Euui/j+l6g2/LJtndCh4OUPKZrdrLzIz0G5W/GzFY2VyQjPRU9Al 5AI2AfXdPf4AevhyshVC36IDW79s8RvL9/ofAHNw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by sourceware.org (Postfix) with ESMTPS id D56183858038 for ; Wed, 30 Dec 2020 20:15:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D56183858038 Received: by mail-qt1-x835.google.com with SMTP id a6so11684320qtw.6 for ; Wed, 30 Dec 2020 12:15:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=H+k3pqoJjWu2cTUOL2uSXLK8Rd19WA79DWUyMAxZIGg=; b=FtTV57qD12QKeS769KmIblljz8SKhLZyVLFQLZUeXcr8sQ8gWZ6YEryZwvLNQmWFp4 +45D0ny4LPYAqo8U32tIO766H4LxviqXQBc+LuOFVD2tW08PGq0ti2dx2+8W8q3S3llI olMnjo1yr4bFEO6rmUV8BSVxZ5wb0KU3MYCQ1ZWWQ3hHH5YOhcBXtqYUSnbVQtjUs2OG wGVIsW5N4UPBYSnFycM3iebrEKyz5BhT/VK46VIXrsMAtRgG2z1ow8gZkylOhzpk7uZ5 3fjI6obpsOxg3ckKHYwt1Ig9V3tFzxyJZZW9pZVRmYzmLDW4Sto02xLhwbfGJvOgDIrJ FOfw== X-Gm-Message-State: AOAM531XzupWi2I1AtFSmJ1XZ2HxlI/q25SkwI1qsyUnjGXH7iXZbAkq cnoP+96xeI4MXHaIaLoBLskh886kUEOprA== X-Google-Smtp-Source: ABdhPJygwetgULcV3frqeEFOM9eAIa6ABiNFml4VwzfhfWpJj7Ts6UR0utjlnxf3ocErww64IIDi9A== X-Received: by 2002:ac8:6b59:: with SMTP id x25mr54565220qts.301.1609359312123; Wed, 30 Dec 2020 12:15:12 -0800 (PST) Received: from localhost.localdomain ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id v145sm5949057qka.27.2020.12.30.12.15.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Dec 2020 12:15:11 -0800 (PST) To: libc-alpha@sourceware.org, Paul Eggert Subject: [PATCH 1/5] posix: Sync regex code with gnulib Date: Wed, 30 Dec 2020 17:15:03 -0300 Message-Id: <20201230201507.2755086-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Cc: bug-gnulib@gnu.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" It sync with gnulib commit 43ee1a6bf. The main change is 9682f18e9. (which does not have a meaniful description). Checked on x86_64-linux-gnu. --- posix/regcomp.c | 2 +- posix/regex.h | 17 ++++++++++++----- posix/regex_internal.c | 19 ++++++++++--------- posix/regex_internal.h | 16 ++++++++++++---- 4 files changed, 35 insertions(+), 19 deletions(-) diff --git a/posix/regcomp.c b/posix/regcomp.c index 93bb0a0538..692928b0db 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -558,7 +558,7 @@ weak_alias (__regerror, regerror) static const bitset_t utf8_sb_map = { /* Set the first 128 bits. */ -# if defined __GNUC__ && !defined __STRICT_ANSI__ +# if (defined __GNUC__ || __clang_major__ >= 4) && !defined __STRICT_ANSI__ [0 ... 0x80 / BITSET_WORD_BITS - 1] = BITSET_WORD_MAX # else # if 4 * BITSET_WORD_BITS < ASCII_CHARS diff --git a/posix/regex.h b/posix/regex.h index 5fe41c8685..7418e6c76f 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -612,7 +612,9 @@ extern int re_exec (const char *); 'configure' might #define 'restrict' to those words, so pick a different name. */ #ifndef _Restrict_ -# if defined __restrict || 2 < __GNUC__ + (95 <= __GNUC_MINOR__) +# if defined __restrict \ + || 2 < __GNUC__ + (95 <= __GNUC_MINOR__) \ + || __clang_major__ >= 3 # define _Restrict_ __restrict # elif 199901L <= __STDC_VERSION__ || defined restrict # define _Restrict_ restrict @@ -620,13 +622,18 @@ extern int re_exec (const char *); # define _Restrict_ # endif #endif -/* For [restrict], use glibc's __restrict_arr if available. - Otherwise, GCC 3.1 (not in C++ mode) and C99 support [restrict]. */ +/* For the ISO C99 syntax + array_name[restrict] + use glibc's __restrict_arr if available. + Otherwise, GCC 3.1 and clang support this syntax (but not in C++ mode). + Other ISO C99 compilers support it as well. */ #ifndef _Restrict_arr_ # ifdef __restrict_arr # define _Restrict_arr_ __restrict_arr -# elif ((199901L <= __STDC_VERSION__ || 3 < __GNUC__ + (1 <= __GNUC_MINOR__)) \ - && !defined __GNUG__) +# elif ((199901L <= __STDC_VERSION__ \ + || 3 < __GNUC__ + (1 <= __GNUC_MINOR__) \ + || __clang_major__ >= 3) \ + && !defined __cplusplus) # define _Restrict_arr_ _Restrict_ # else # define _Restrict_arr_ diff --git a/posix/regex_internal.c b/posix/regex_internal.c index e1b6b4d5af..ed0a13461b 100644 --- a/posix/regex_internal.c +++ b/posix/regex_internal.c @@ -300,18 +300,20 @@ build_wcs_upper_buffer (re_string_t *pstr) while (byte_idx < end_idx) { wchar_t wc; + unsigned char ch = pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]; - if (isascii (pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]) - && mbsinit (&pstr->cur_state)) + if (isascii (ch) && mbsinit (&pstr->cur_state)) { - /* In case of a singlebyte character. */ - pstr->mbs[byte_idx] - = toupper (pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]); /* The next step uses the assumption that wchar_t is encoded ASCII-safe: all ASCII values can be converted like this. */ - pstr->wcs[byte_idx] = (wchar_t) pstr->mbs[byte_idx]; - ++byte_idx; - continue; + wchar_t wcu = __towupper (ch); + if (isascii (wcu)) + { + pstr->mbs[byte_idx] = wcu; + pstr->wcs[byte_idx] = wcu; + byte_idx++; + continue; + } } remain_len = end_idx - byte_idx; @@ -348,7 +350,6 @@ build_wcs_upper_buffer (re_string_t *pstr) { /* It is an invalid character, an incomplete character at the end of the string, or '\0'. Just use the byte. */ - int ch = pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]; pstr->mbs[byte_idx] = ch; /* And also cast it to wide char. */ pstr->wcs[byte_idx++] = (wchar_t) ch; diff --git a/posix/regex_internal.h b/posix/regex_internal.h index 8c42586c42..4a3cf779bf 100644 --- a/posix/regex_internal.h +++ b/posix/regex_internal.h @@ -77,6 +77,14 @@ # define isblank(ch) ((ch) == ' ' || (ch) == '\t') #endif +/* regex code assumes isascii has its usual numeric meaning, + even if the portable character set uses EBCDIC encoding, + and even if wint_t is wider than int. */ +#ifndef _LIBC +# undef isascii +# define isascii(c) (((c) & ~0x7f) == 0) +#endif + #ifdef _LIBC # ifndef _RE_DEFINE_LOCALE_FUNCTIONS # define _RE_DEFINE_LOCALE_FUNCTIONS 1 @@ -335,7 +343,7 @@ typedef struct Idx idx; /* for BACK_REF */ re_context_type ctx_type; /* for ANCHOR */ } opr; -#if __GNUC__ >= 2 && !defined __STRICT_ANSI__ +#if (__GNUC__ >= 2 || defined __clang__) && !defined __STRICT_ANSI__ re_token_type_t type : 8; #else re_token_type_t type; @@ -841,10 +849,10 @@ re_string_elem_size_at (const re_string_t *pstr, Idx idx) #endif /* RE_ENABLE_I18N */ #ifndef FALLTHROUGH -# if __GNUC__ < 7 -# define FALLTHROUGH ((void) 0) -# else +# if (__GNUC__ >= 7) || (__clang_major__ >= 10) # define FALLTHROUGH __attribute__ ((__fallthrough__)) +# else +# define FALLTHROUGH ((void) 0) # endif #endif