From patchwork Tue Apr 7 05:40:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: develop--- via Libc-alpha X-Patchwork-Id: 1267199 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=sourceware.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=RSn6mT2a; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48xGTz2zBGz9sP7 for ; Tue, 7 Apr 2020 15:40:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F3041385C019; Tue, 7 Apr 2020 05:40:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F3041385C019 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1586238052; bh=3HrpF0NtH3mu4b8zpLTE6CcY6sIGwx8ipislLYvg0LE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=RSn6mT2a0/voFz6hJdMwPoWh7fWpX4xpIB5doL87DtnXEFClwhQVS9y2CPlG9NQv1 k8rn7+v2V7dhwXI0GoXYcJIdz4uNIInVqjVuhjk4KlFnkdh08n2VVOqPcnsPK5I5s4 SWAHq+Iw66J03zUOwTeCIknmcrL4H6qz3N58mPpQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp81.iad3a.emailsrvr.com (smtp81.iad3a.emailsrvr.com [173.203.187.81]) by sourceware.org (Postfix) with ESMTPS id D9289385BF83 for ; Tue, 7 Apr 2020 05:40:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D9289385BF83 X-Auth-ID: tom@honermann.net Received: by smtp3.relay.iad3a.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 72CF622F9F; Tue, 7 Apr 2020 01:40:49 -0400 (EDT) X-Sender-Id: tom@honermann.net Received: from [192.168.1.13] (pool-74-110-208-227.rcmdva.fios.verizon.net [74.110.208.227]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:587 (trex/5.7.12); Tue, 07 Apr 2020 01:40:49 -0400 To: libc-alpha , Carlos O'Donell , Andreas Schwab Subject: [PATCH] Correct the Big5-HKSCS converter to preserve low order state bits (bug 25744) Message-ID: Date: Tue, 7 Apr 2020 01:40:49 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: e33cc7e6-2c2e-465d-a4a2-b4b7eeff2df1-1-1 X-Spam-Status: No, score=-24.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: develop--- via Libc-alpha Reply-To: Tom Honermann Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Enclosed is a patch for bug 25744 [1] that updates the Big5-HKSCS converter to properly maintain the lowest 3 bits of the mbstate_t __count data member. This change is necessary to ensure that state is correctly preserved when the converter encounters an incomplete multibyte character. More details are available in bug 25744 [1]. This patch depends on Carlos O'Donell's patch for bug 25734 [2] as posted at [3]. The code changes are styled to match how these bits are maintained by other converters such as iso-2022-jp.c, ibm930.c, and others. Running 'grep __count' in the 'iconvdata' directory suggests that a number of other converters, euc-jisx0213.c for example, also fail to preserve these bits in some cases, though it may be that negative effects are not observed for those converters. This patch does not attempt to address such issues with other converters. Tested on Linux x86_64. Carlos, assuming you have no objections to this patch, would you be so kind as to commit it on my behalf? Thank you in advance. ChangeLog: Correct Big5-HKSCS to preserve low order state bits (Bug 25744) * iconvdata/big5hkscs.c: Modified. * iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c: Modified. Tom. [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=25744 [2]: https://sourceware.org/bugzilla/show_bug.cgi?id=25734 [3]: https://sourceware.org/pipermail/libc-alpha/2020-April/112551.html diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c index ef325119b1..b4e556be98 100644 --- a/iconvdata/big5hkscs.c +++ b/iconvdata/big5hkscs.c @@ -17771,7 +17771,7 @@ static struct the output state to the initial state. This has to be done during the flushing. */ #define EMIT_SHIFT_TO_INIT \ - if (data->__statep->__count != 0) \ + if ((data->__statep->__count >> 3) != 0) \ { \ if (FROM_DIRECTION) \ { \ @@ -17780,7 +17780,7 @@ static struct /* Write out the last character. */ \ *((uint32_t *) outbuf) = data->__statep->__count >> 3; \ outbuf += sizeof (uint32_t); \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17794,7 +17794,7 @@ static struct uint32_t lasttwo = data->__statep->__count >> 3; \ *outbuf++ = (lasttwo >> 8) & 0xff; \ *outbuf++ = lasttwo & 0xff; \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17880,7 +17880,7 @@ static struct \ /* Otherwise store only the first character now, and \ put the second one into the queue. */ \ - *statep = ch2 << 3; \ + *statep = (ch2 << 3) | (*statep & 7); \ /* Tell the caller why we terminate the loop. */ \ result = __GCONV_FULL_OUTPUT; \ break; \ @@ -17897,7 +17897,7 @@ static struct } \ else \ /* Clear the queue and proceed to output the saved character. */ \ - *statep = 0; \ + *statep &= 7; \ \ put32 (outptr, ch); \ outptr += 4; \ @@ -17948,7 +17948,7 @@ static struct } \ *outptr++ = (ch >> 8) & 0xff; \ *outptr++ = ch & 0xff; \ - *statep = 0; \ + *statep &= 7; \ inptr += 4; \ continue; \ \ @@ -17961,7 +17961,7 @@ static struct } \ *outptr++ = (lasttwo >> 8) & 0xff; \ *outptr++ = lasttwo & 0xff; \ - *statep = 0; \ + *statep &= 7; \ continue; \ } \ \ @@ -17998,7 +17998,7 @@ static struct /* Check for possible combining character. */ \ if (__glibc_unlikely (ch == 0xca || ch == 0xea)) \ { \ - *statep = ((cp[0] << 8) | cp[1]) << 3; \ + *statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7); \ inptr += 4; \ continue; \ } \ diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c index 8389adebf2..8141722595 100644 --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c @@ -128,6 +128,71 @@ check_conversion (struct testdata test) printf ("error: Result of third conversion was wrong.\n"); err++; } + + /* Now perform the same test as above consuming one byte at a time. */ + mbs = test.input; + memset (&st, 0, sizeof (st)); + + /* Consume the first byte; expect an incomplete multibyte character. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != -2) + { + printf ("error: First byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the first consumed byte. */ + mbs += 1; + /* Consume the second byte; expect the first wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Second byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the second consumed byte. */ + mbs += 1; + if (wc != test.expected[0]) + { + printf ("error: Result of first wchar_t conversion was wrong.\n"); + err++; + } + /* Consume no bytes; expect the second wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 0) + { + printf ("error: First attempt of third byte conversion returned %zd.\n", ret); + err++; + } + /* Do not advance past the third byte. */ + mbs += 0; + if (wc != test.expected[1]) + { + printf ("error: Result of second wchar_t conversion was wrong.\n"); + err++; + } + /* After the second wchar_t conversion, the converter should be in + the initial state since the two input BIG5-HKSCS bytes have been + consumed and the two wchar_t's have been output. */ + if (mbsinit (&st) == 0) + { + printf ("error: Converter not in initial state.\n"); + err++; + } + /* Consume the third byte; expect the third wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Third byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the third consumed byte. */ + mbs += 1; + if (wc != test.expected[2]) + { + printf ("error: Result of third wchar_t conversion was wrong.\n"); + err++; + } + /* Return 0 if we saw no errors. */ return err; }