From patchwork Wed Apr 15 20:19:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 1271421 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=trmlmnKq; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 492Ybv3XPJz9sQx for ; Thu, 16 Apr 2020 06:19:21 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6E19A384A028; Wed, 15 Apr 2020 20:19:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6E19A384A028 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1586981959; bh=da7OffPhZ/96F705shQbaHYNiE3a+Q7KoHLiyYH56jY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=trmlmnKqE6E31M5jNfIC0HHxsSbOiIlKKt50M/71pPVUFGhKoEvqCIc39NAHRTnzx g4htVHomMeO/5/0iaHKybf1MvrL+xK1wnzLJXrg0HgtOBcFHDoOrwmno4Zce2+34yW rXTJBHSAvMs8QNo5UD5o2D/N5Ac70JiZQlQaNl/c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by sourceware.org (Postfix) with ESMTPS id AD408384A029 for ; Wed, 15 Apr 2020 20:19:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AD408384A029 Received: by mail-qv1-xf2a.google.com with SMTP id bu9so810268qvb.13 for ; Wed, 15 Apr 2020 13:19:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:content-transfer-encoding:user-agent; bh=da7OffPhZ/96F705shQbaHYNiE3a+Q7KoHLiyYH56jY=; b=YbWV+leIp3rCwi6oOCdyvq2Qs0YTfqQl0QwPt7xSiPshWTHh76NdRBaI1DwmxQGf4D xBxOgZvWZ5/aWOqw4n1r77RyCwcJvtljXSWeYCnkD4hkdrRgxVycqsmMi4+KJMmqNjjq l2addKWNZz9f/SNuVffRLn2mxZHTeXoh32ToxeTOnbbIN6dxy4Qi/s7vT3wOyfMbQLFb 7JCoBDhj5X2/XjimWk1yc+GVN4PzboGZMztQq+4Ftvh5wYBFA+NRbXnODFeOFRQ7AL15 SU+CWc3RcQAitrv09MnNZQZIRI3AQ06gqqkdtLbjtW/lfxLJbSp1cWa6RQloHsppR1Rr /D1w== X-Gm-Message-State: AGi0Puben57pWfL2AcHcv2X5z0BTs1AUjHcAiKA45cvo+iCOb2m8Bgkz pReYtzrNnEO2m83zxIauRAE= X-Google-Smtp-Source: APiQypIa6fLqO/MwLSyD1W9YA1GkyMQlgOG1yiafOUl4aQa9EqPp74hcbCuQX2bQrQVJDtHHblZR/w== X-Received: by 2002:a05:6214:150e:: with SMTP id e14mr6853228qvy.65.1586981956241; Wed, 15 Apr 2020 13:19:16 -0700 (PDT) Received: from ldh-imac.local (pool-173-54-19-91.nwrknj.fios.verizon.net. [173.54.19.91]) by smtp.gmail.com with ESMTPSA id j8sm14255272qtk.85.2020.04.15.13.19.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Apr 2020 13:19:15 -0700 (PDT) Date: Wed, 15 Apr 2020 16:19:14 -0400 To: dmalcolm@redhat.com Subject: Thoughts on applying a short diagnostics patch for GCC 10 Message-ID: <20200415201914.GA96820@ldh-imac.local> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.12.1 (2019-06-15) X-Spam-Status: No, score=-23.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hello David- I would appreciate hearing your thoughts please on the following (relatively minor) issue... There is a little bug with colorization in diagnostics: in case a location only points to the first byte of a multibyte sequence, the colorization produces corrupted output as it interrupts the UTF-8 sequence. This seems to happen with errors that come out of cpplib for instance. A test case is: ------- int ٩x; ------- compiled with -x c -std=c99. The fix for this issue is essentially two lines plus a test + comments, and I had originally submitted it here: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg00915.html Subsequent to that, I also submitted patches to implement tab expansion in diagnostics. Those patches naturally fixed the colorization issue already in a different way, because they rewrote this whole function, so I indicated we might as well forget the simpler older patch. But then we decided to hold off on the tab expansion feature until GCC 11. That makes sense for sure, but then I think regarding the two-line fix for the colorization bug, there is still potentially a good case to push it now? It is not strictly a regression, but because we added support for UTF-8 identifiers in GCC 10, it is going to be more apparent now than it was before. Given that and the fact that it's a pretty small change, would it make sense to go ahead and get this in? For reference I attached the patch and ChangeLog again here as well. Bootstrap all languages on x86-64 linux looks good with all reg tests the same before + after: FAIL 94 94 PASS 473413 473413 UNSUPPORTED 11503 11503 UNTESTED 195 195 XFAIL 1818 1818 XPASS 36 36 Thanks! -Lewis gcc/ChangeLog: 2020-04-15 Lewis Hyatt * diagnostic-show-locus.c (layout::print_source_line): Do not emit colorization codes in the middle of a UTF-8 sequence. (test_one_liner_colorized_utf8): New test. (test_diagnostic_show_locus_one_liner_utf8): Call the new test. commit 5e2f6ec837fdf808ba20096bd632be9025f7526c Author: Lewis Hyatt Date: Wed Apr 15 11:17:33 2020 -0400 diagnostics: Fix colorization interrupting UTF-8 sequences diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c index 4618b4edb7d..0aa42e236ee 100644 --- a/gcc/diagnostic-show-locus.c +++ b/gcc/diagnostic-show-locus.c @@ -1494,6 +1494,8 @@ layout::print_source_line (linenum_type row, const char *line, int line_bytes, int last_non_ws = 0; for (int col_byte = 1 + x_offset_bytes; col_byte <= line_bytes; col_byte++) { + char c = *line; + /* Assuming colorization is enabled for the caret and underline characters, we may also colorize the associated characters within the source line. @@ -1505,8 +1507,13 @@ layout::print_source_line (linenum_type row, const char *line, int line_bytes, For frontends that only generate carets, we don't colorize the characters above them, since this would look strange (e.g. - colorizing just the first character in a token). */ - if (m_colorize_source_p) + colorizing just the first character in a token). + + We need to avoid inserting color codes ahead of UTF-8 continuation + bytes to avoid corrupting the output, in case the location range + points only to the first byte of a multibyte sequence. */ + + if (m_colorize_source_p && (((unsigned int) c) & 0xC0) != 0x80) { bool in_range_p; point_state state; @@ -1519,7 +1526,6 @@ layout::print_source_line (linenum_type row, const char *line, int line_bytes, else m_colorizer.set_normal_text (); } - char c = *line; if (c == '\0' || c == '\t' || c == '\r') c = ' '; if (c != ' ') @@ -3854,6 +3860,27 @@ test_one_liner_labels_utf8 () } } +/* Make sure that colorization codes don't interrupt a multibyte + sequence, which would corrupt it. */ +static void +test_one_liner_colorized_utf8 () +{ + test_diagnostic_context dc; + dc.colorize_source_p = true; + diagnostic_color_init (&dc, DIAGNOSTICS_COLOR_YES); + const location_t pi = linemap_position_for_column (line_table, 12); + rich_location richloc (line_table, pi); + diagnostic_show_locus (&dc, &richloc, DK_ERROR); + + /* In order to avoid having the test depend on exactly how the colorization + was effected, just confirm there are two pi characters in the output. */ + const char *result = pp_formatted_text (dc.printer); + const char *null_term = result + strlen (result); + const char *first_pi = strstr (result, "\xcf\x80"); + ASSERT_TRUE (first_pi && first_pi <= null_term - 2); + ASSERT_STR_CONTAINS (first_pi + 2, "\xcf\x80"); +} + /* Run the various one-liner tests. */ static void @@ -3900,6 +3927,7 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_) test_one_liner_many_fixits_1_utf8 (); test_one_liner_many_fixits_2_utf8 (); test_one_liner_labels_utf8 (); + test_one_liner_colorized_utf8 (); } /* Verify that gcc_rich_location::add_location_if_nearby works. */