From patchwork Sun Oct 26 22:25:47 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Schmerge X-Patchwork-Id: 403333 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5996114003E for ; Mon, 27 Oct 2014 09:26:01 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:content-type; q=dns; s=default; b=Kvk7qS9G4247TpfzWal8Z ec4cZUI6RvMQIi8OFVT2eiWKdrWZFm+y/5uizOmYg3Reh103eZvAFXCNcH2ZGscn 3brfxeYU9FR3dk52wFmep8ILqca8sns8wI+rln66Pjx5ARK8+BTcuCTZk1Cje4Xh R40X1akln2dVGLbQfC7vV4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:content-type; s=default; bh=pIxy7p9ehyxK9GQKtCLPyrU3czc =; b=miyzU930wovute7olhGIpDLutwnG4GK5NQTVgRX9y1R4zNg1RlpA92Q9E73 nb+QZuZUBI9cJcFSDmLbPa4VN7F4b6uRP6xyowDLPiYt0b+euHly4xkSoOV2JxIQ Skj25AwgBPi1rzyyoLDK1AMBqYG+47bEdaJa7O8yAsBim6Mw= Received: (qmail 7222 invoked by alias); 26 Oct 2014 22:25:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7209 invoked by uid 89); 26 Oct 2014 22:25:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qc0-f173.google.com Received: from mail-qc0-f173.google.com (HELO mail-qc0-f173.google.com) (209.85.216.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 26 Oct 2014 22:25:49 +0000 Received: by mail-qc0-f173.google.com with SMTP id x3so655445qcv.32 for ; Sun, 26 Oct 2014 15:25:47 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.229.172.134 with SMTP id l6mr5218959qcz.8.1414362347484; Sun, 26 Oct 2014 15:25:47 -0700 (PDT) Received: by 10.140.25.182 with HTTP; Sun, 26 Oct 2014 15:25:47 -0700 (PDT) In-Reply-To: References: Date: Sun, 26 Oct 2014 18:25:47 -0400 Message-ID: Subject: Fwd: g++ off-by-one bug in utf16 conversion From: John Schmerge To: gcc-patches@gcc.gnu.org I believe I sent this yesterday to the incorrect list... ---------- Forwarded message ---------- From: John Schmerge Date: Sun, Oct 26, 2014 at 1:58 AM Subject: g++ off-by-one bug in utf16 conversion To: gcc-bugs@gcc.gnu.org Hey guys, I came across this bug earlier today in implementing some unit tests for utf8/16 conversions... The following c++ fragment gives the wrong result: int main() { char16_t s[] = u"\uffff"; std::cout << std::hex << s[0] << " " << s[1] << std::endl; } it prints: d7ff dfff where as it should print: ffff 0 For those unfamiliar with utf16, all unicode values less than or equal to 0xffff remain 16 bit values and no conversion is done on them, code points greater than 0xffff get converted to a pair of 16-bit shorts, where the 1st is in the range 0xd800-dbff and the 2nd is in the range 0xdc00-dffff. Clearly this is an off-by-one issue. I traced it down to a use of a less-than operator vs less-than-equal operator in libcpp/charset.c I have verified this is a bug with versions 4.4.7 (rhel 6.5), 4.8.2 (linaro/ubuntu/mint) and g++ (GCC) 5.0.0 20141025... I am a bit surprised that this has gone so many years unnoticed or at least unresolved. Attached is a patch against gcc 4.8.2 from the gcc website for the issue to $gcc-root/libcpp/charset.c that fixes the issue by my tests. Thanks, John --- libcpp/charset.c 2014-10-26 01:24:10.583796875 -0400 +++ libcpp/charset.c.old 2014-10-26 01:23:50.103796842 -0400 @@ -353,7 +353,7 @@ return EILSEQ; } - if (s <= 0xFFFF) + if (s < 0xFFFF) { if (*outbytesleftp < 2) {