From patchwork Sat Feb 27 13:08:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heinrich Schuchardt X-Patchwork-Id: 1445174 X-Patchwork-Delegate: xypron.glpk@gmx.de Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.denx.de (client-ip=2a01:238:438b:c500:173d:9f52:ddab:ee01; helo=phobos.denx.de; envelope-from=u-boot-bounces@lists.denx.de; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=gmx.net header.i=@gmx.net header.a=rsa-sha256 header.s=badeba3b8450 header.b=AjM8YHaX; dkim-atps=neutral Received: from phobos.denx.de (phobos.denx.de [IPv6:2a01:238:438b:c500:173d:9f52:ddab:ee01]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Dnn1w43fPz9sBJ for ; Sun, 28 Feb 2021 00:10:12 +1100 (AEDT) Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id B4DFB8201A; Sat, 27 Feb 2021 14:10:08 +0100 (CET) Authentication-Results: phobos.denx.de; dmarc=fail (p=none dis=none) header.from=gmx.de Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de Authentication-Results: phobos.denx.de; dkim=pass (1024-bit key; secure) header.d=gmx.net header.i=@gmx.net header.b="AjM8YHaX"; dkim-atps=neutral Received: by phobos.denx.de (Postfix, from userid 109) id 5E2CE81FBA; Sat, 27 Feb 2021 14:09:17 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on phobos.denx.de X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,FREEMAIL_FROM,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by phobos.denx.de (Postfix) with ESMTPS id 2C25581FC3 for ; Sat, 27 Feb 2021 14:09:06 +0100 (CET) Authentication-Results: phobos.denx.de; dmarc=pass (p=none dis=none) header.from=gmx.de Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=xypron.glpk@gmx.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1614431345; bh=e5Ksr6IKbG6PqqPmQ78QGY1mG9bfOuHNK6crIhhn1gk=; h=X-UI-Sender-Class:From:To:Cc:Subject:Date:In-Reply-To:References; b=AjM8YHaXBJelkubLwyK/Rs9NfTkrtQkuT1GE0eY1LgxuwVI7NDabkyd+lVWFOJgsV nxfJsX9I91971x2KdqhQPF0ObC3wRG2xd63xnUWkaD2udLDNSYnvuonED0/Lu+5seH QEcXJTLxCYg+c8Fw006KqZBtXwr0UglOKCeaeLtI= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from LT02.fritz.box ([62.143.246.89]) by mail.gmx.net (mrgmx105 [212.227.17.174]) with ESMTPSA (Nemesis) id 1N3bSt-1lxoig0N6J-010dVi; Sat, 27 Feb 2021 14:09:05 +0100 From: Heinrich Schuchardt To: Alexander Graf , Anatolij Gustschin Cc: u-boot@lists.denx.de, Heinrich Schuchardt Subject: [PATCH 4/6] lib/charset: UTF-8 stream conversion Date: Sat, 27 Feb 2021 14:08:38 +0100 Message-Id: <20210227130840.166193-5-xypron.glpk@gmx.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210227130840.166193-1-xypron.glpk@gmx.de> References: <20210227130840.166193-1-xypron.glpk@gmx.de> MIME-Version: 1.0 X-Provags-ID: V03:K1:JwqacGwAvEuZI7/yuONOEri6GYLsULrN/2lISEVv00cx4ID4WVC LMWcMaastYMGrn+rzQzWi0eb27dfMrupqD0QebyVUc1qy9qmob7jpcgIZ+aQu3P26ateVvx e/B65uUeP0KlUjrc58nNcrqazNJzHV6f0fJC+LdiszAqNeBAzfPDYPl9ut5w7lv6cwkkoqh CSYPo7twG+tjzm6loSM+w== X-UI-Out-Filterresults: notjunk:1;V03:K0:d6y3c+XMUi8=:59Xls68cyg4ga8wA1/L5Vt ZSg5EAgiqcNEhArVDQqQBw00P/EZtBX/1OaFgYCcj3V32iCySxSsS+KjsPP2p2MrWpKL6pLSZ VblqVwINDDJpUj77LHyBxTVUOtNIrbsH2KvDMl7qDTD+E3eo02TJaSbUIWBOZGaWXncWj6jGF hGrMGpJG64TilmY+DM79IRP21IBDi6mpTyCppUNoumBxOVaIG4TA/OTYfO5cHvZrnZ3P2+qtk E2V9Bxrrn8SLBy84sRlRm97F7iOEpjTQ2BjZBx4kSX2juvhh3Op78PWM6YXa11qiYF/SyrpiV w99ORKlCMu2vnP4F41TzRLybwxi9pRXmZuOsPFSdHgJBeTbgymTpIbLX98r1Bz3O8hodhQDgb 7lLWX4vP5S4Uwrg+XLhrXROnW7vXpXY0DSLD9/1HwUzPTsQZvnlMc6Q4CXZ68shxb6B1V8e9W 6uoVfkkCSH+izCE926SWvzd7MLyvdanfc39grSwigGPdFUS1SyJSCKT3aL6Yip+mk8Y62XQ0O /lrWfkPn2PYu6BEQhN33MpxytqEtSHsiqckrxFNykViKDZ08PU3j69QdScCKmjp9p2Ph6qXRg 0b+tS+XwMXsvgNTurJ1fhUaC0qcuv3BRmxg2Vu4Avog5K7qKOCF85yTm6pM96rIjQpNBZkRXY BZOr+k5BA4/TWMGDed3T+OVifkPK000gASC9WTAU2oNxIFoUpaFV9IppvCOEhEXjpS21PyM3P xV6VR5SJ5GVoin23nZD7861CN1IGiPA1AmUtHFM5VRwSgVohkko7ztivyYEY5z+dGGWW5/hoc n/qkCNUd4rtjyFe8w+dtOzwqg07eXxS92wKFqbsHV19S6d4V+kQzXnyHYJd9I1TSPAAJ8EOra Y7tUp2sseuFr1JAWgoTQ== X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.34 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.102.4 at phobos.denx.de X-Virus-Status: Clean Provide functions to convert an UTF-8 stream to code page 437 or UTF-32. Add unit tests. Signed-off-by: Heinrich Schuchardt --- include/charset.h | 18 +++++++++++ lib/charset.c | 55 +++++++++++++++++++++++++++------ test/unicode_ut.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 142 insertions(+), 9 deletions(-) -- 2.30.0 diff --git a/include/charset.h b/include/charset.h index 52e7d1474e..a911160f19 100644 --- a/include/charset.h +++ b/include/charset.h @@ -286,4 +286,22 @@ uint8_t *utf16_to_utf8(uint8_t *dest, const uint16_t *src, size_t size); */ int utf_to_cp(s32 *c, const u16 *codepage); +/** + * utf8_to_cp437_stream() - convert UTF-8 stream to codepage 437 + * + * @c: next UTF-8 character to convert + * @buffer: buffer, at least 5 characters + * Return: next codepage 437 character or 0 + */ +int utf8_to_cp437_stream(u8 c, char *buffer); + +/** + * utf8_to_utf32_stream() - convert UTF-8 stream to UTF-32 + * + * @c: next UTF-8 character to convert + * @buffer: buffer, at least 5 characters + * Return: next codepage 437 character or 0 + */ +int utf8_to_utf32_stream(u8 c, char *buffer); + #endif /* __CHARSET_H_ */ diff --git a/lib/charset.c b/lib/charset.c index 946d5ee23e..f44c58d9d8 100644 --- a/lib/charset.c +++ b/lib/charset.c @@ -481,15 +481,6 @@ uint8_t *utf16_to_utf8(uint8_t *dest, const uint16_t *src, size_t size) return dest; } -/** - * utf_to_cp() - translate Unicode code point to 8bit codepage - * - * Codepoints that do not exist in the codepage are rendered as question mark. - * - * @c: pointer to Unicode code point to be translated - * @codepage: Unicode to codepage translation table - * Return: 0 on success, -ENOENT if codepoint cannot be translated - */ int utf_to_cp(s32 *c, const u16 *codepage) { if (*c >= 0x80) { @@ -507,3 +498,49 @@ int utf_to_cp(s32 *c, const u16 *codepage) } return 0; } + +int utf8_to_cp437_stream(u8 c, char *buffer) +{ + char *end; + const char *pos; + s32 s; + int ret; + + for (;;) { + pos = buffer; + end = buffer + strlen(buffer); + *end++ = c; + *end = 0; + s = utf8_get(&pos); + if (s > 0) { + *buffer = 0; + ret = utf_to_cp(&s, codepage_437); + return s; + } + if (pos == end) + return 0; + *buffer = 0; + } +} + +int utf8_to_utf32_stream(u8 c, char *buffer) +{ + char *end; + const char *pos; + s32 s; + + for (;;) { + pos = buffer; + end = buffer + strlen(buffer); + *end++ = c; + *end = 0; + s = utf8_get(&pos); + if (s > 0) { + *buffer = 0; + return s; + } + if (pos == end) + return 0; + *buffer = 0; + } +} diff --git a/test/unicode_ut.c b/test/unicode_ut.c index 154361aea7..6f6aea5f60 100644 --- a/test/unicode_ut.c +++ b/test/unicode_ut.c @@ -47,6 +47,9 @@ static const char d3[] = {0xe6, 0xbd, 0x9c, 0xe6, 0xb0, 0xb4, 0xe8, 0x89, /* Three letters translating to two utf-16 word each */ static const char d4[] = {0xf0, 0x90, 0x92, 0x8d, 0xf0, 0x90, 0x92, 0x96, 0xf0, 0x90, 0x92, 0x87, 0x00}; +/* Letter not in code page 437 */ +static const char d5[] = {0xCE, 0x92, 0x20, 0x69, 0x73, 0x20, 0x6E, 0x6F, + 0x74, 0x20, 0x42, 0x00}; /* Illegal utf-8 strings */ static const char j1[] = {0x6a, 0x31, 0xa1, 0x6c, 0x00}; @@ -631,6 +634,81 @@ static int unicode_test_utf_to_cp(struct unit_test_state *uts) } UNICODE_TEST(unicode_test_utf_to_cp); +static void utf8_to_cp437_stream_helper(const char *in, char *out) +{ + char buffer[5]; + int ret; + + *buffer = 0; + for (; *in; ++in) { + ret = utf8_to_cp437_stream(*in, buffer); + if (ret) + *out++ = ret; + } + *out = 0; +} + +static int unicode_test_utf8_to_cp437_stream(struct unit_test_state *uts) +{ + char buf[16]; + + utf8_to_cp437_stream_helper(d1, buf); + ut_asserteq_str("U-Boot", buf); + utf8_to_cp437_stream_helper(d2, buf); + ut_asserteq_str("kafb\xa0tur", buf); + utf8_to_cp437_stream_helper(d5, buf); + ut_asserteq_str("? is not B", buf); + utf8_to_cp437_stream_helper(j2, buf); + ut_asserteq_str("j2l", buf); + + return 0; +} +UNICODE_TEST(unicode_test_utf8_to_cp437_stream); + +static void utf8_to_utf32_stream_helper(const char *in, s32 *out) +{ + char buffer[5]; + int ret; + + *buffer = 0; + for (; *in; ++in) { + ret = utf8_to_utf32_stream(*in, buffer); + if (ret) + *out++ = ret; + } + *out = 0; +} + +static int unicode_test_utf8_to_utf32_stream(struct unit_test_state *uts) +{ + s32 buf[16]; + + const u32 u1[] = {0x55, 0x2D, 0x42, 0x6F, 0x6F, 0x74, 0x0000}; + const u32 u2[] = {0x6B, 0x61, 0x66, 0x62, 0xE1, 0x74, 0x75, 0x72, 0x00}; + const u32 u3[] = {0x0392, 0x20, 0x69, 0x73, 0x20, 0x6E, 0x6F, 0x74, + 0x20, 0x42, 0x00}; + const u32 u4[] = {0x6A, 0x32, 0x6C, 0x00}; + + memset(buf, 0, sizeof(buf)); + utf8_to_utf32_stream_helper(d1, buf); + ut_asserteq_mem(u1, buf, sizeof(u1)); + + memset(buf, 0, sizeof(buf)); + utf8_to_utf32_stream_helper(d2, buf); + ut_asserteq_mem(u2, buf, sizeof(u2)); + + memset(buf, 0, sizeof(buf)); + utf8_to_utf32_stream_helper(d5, buf); + ut_asserteq_mem(u3, buf, sizeof(u3)); + + memset(buf, 0, sizeof(buf)); + utf8_to_utf32_stream_helper(j2, buf); + ut_asserteq_mem(u4, buf, sizeof(u4)); + + return 0; +} +UNICODE_TEST(unicode_test_utf8_to_utf32_stream); + #ifdef CONFIG_EFI_LOADER static int unicode_test_efi_create_indexed_name(struct unit_test_state *uts) {