From patchwork Thu Sep 10 20:28:28 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 516443 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id DA8F914030C for ; Fri, 11 Sep 2015 06:31:58 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=YcgaxGlQ; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=cqhun4tl869Is3tZXkMfLN6c+T4H6zE4nHS7OFFQdtbCejoauV4dw BXzdrffxle94/yKkT1CFDpDT+0flEucACA4jVEPurIcR7uGExn7NiUXtaPC/hdz3 eFb858XkqY09Rpdv8HOz44v5X4x7J7fCZndXisbHg6CYYVuiHgh5nQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; s= default; bh=5aiY5dh+uXScx+lrsKEjeAkhVUI=; b=YcgaxGlQCY8xWdnamuzS y5zHjsg3y0OFK6CO4o1t9Su9lWJ4YaC18HAocNpOUJg7cGs6gnnu2FhJaD+xqyqX j2WHoU7X+O+IpEpUgMcorrmCSueiQfeFAt6dX+r9vWsvd7cUh2oKV9n1xjNPVfkO nUWQyNqGb9RJVWPa5KDvWOk= Received: (qmail 87841 invoked by alias); 10 Sep 2015 20:31:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87828 invoked by uid 89); 10 Sep 2015 20:31:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.0 required=5.0 tests=AWL, BAYES_50, KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 X-HELO: eggs.gnu.org Received: from eggs.gnu.org (HELO eggs.gnu.org) (208.118.235.92) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Thu, 10 Sep 2015 20:31:39 +0000 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Za8D6-0003hE-5w for gcc-patches@gcc.gnu.org; Thu, 10 Sep 2015 16:12:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48668) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Za8D5-0003h4-Q6 for gcc-patches@gcc.gnu.org; Thu, 10 Sep 2015 16:12:44 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id 6528B4DB01 for ; Thu, 10 Sep 2015 20:12:43 +0000 (UTC) Received: from c64.redhat.com (vpn-239-137.phx2.redhat.com [10.3.239.137]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t8AKCWaZ003473; Thu, 10 Sep 2015 16:12:42 -0400 From: David Malcolm To: gcc-patches@gcc.gnu.org Cc: David Malcolm Subject: [PATCH 17/22] libcpp: add location tracking within string literals Date: Thu, 10 Sep 2015 16:28:28 -0400 Message-Id: <1441916913-11547-18-git-send-email-dmalcolm@redhat.com> In-Reply-To: <1441916913-11547-1-git-send-email-dmalcolm@redhat.com> References: <1441916913-11547-1-git-send-email-dmalcolm@redhat.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 X-IsSubscribed: yes This has not been optimized yet. gcc/c-family/ChangeLog: * c-common.c (fname_as_string): Initialize loc field of "cstr", and call init_raw on strname.loc. * c-lex.c (cb_ident): Initialize loc field of "cstr". libcpp/ChangeLog: * charset.c (struct _cpp_strbuf): Add cpp_string_location field "loc". (conversion_loop): Add "loc_reader" param and, if non-NULL, call its add_char_at method. (convert_utf8_utf16): Add "loc_reader" param and pass it to conversion_loop. (convert_utf8_utf32): Likewise. (convert_utf16_utf8): Likewise. (convert_utf32_utf8): Likewise. (convert_no_conversion): Add "loc_reader" param and, if non-NULL, call its add_n_chars_at method. (convert_using_iconv): Add dummy cpp_string_location_reader * param. (APPLY_CONVERSION): Add LOC_READER param. (cpp_host_to_exec_charset): Call init on tbuf's loc. (_cpp_valid_ucn): Add "char_range" and "loc_reader" params. Write back to "char_range". (convert_ucn): Add "char_range" and "loc_reader" params, passing them to _cpp_valid_ucn call and to APPLY_CONVERSION site. (convert_hex): Add "char_range" and "loc_reader" params; use them to track source range information. (convert_oct): Likewise. (convert_escape): Add loc_reader param and use it to track source range information. (cpp_interpret_string): Initialize tbuf.loc. Create an on-stack cpp_string_location_reader and use it to track source range information. (cpp_interpret_charconst): Initialize str.loc. (_cpp_convert_input): Initialize to.loc. Add NULL when calling APPLY_CONVERSION. (cpp_string_location::init): New method. (cpp_string_location::init_raw): New method. (cpp_string_location::add_char_at): New method. (cpp_string_location::add_n_chars_at): New method. (cpp_string_location::get_loc_at_index): New method. (cpp_string_location::get_range_at_index): New method. (cpp_string_location::trivial_p): New method. (cpp_string_location_reader::cpp_string_location_reader): New ctors. (cpp_string_location_reader::get_next): New method. * directives.c (do_line): Initialize s.loc; (do_linemarker): Likewise. * expr.c (_cpp_parse_expr): Call init_raw on the token's str.loc. * include/cpplib.h (struct cpp_string_fragment_location): New struct. (struct cpp_string_location): New struct. (class cpp_string_location_reader): New class. (struct cpp_string): Add field "loc", a cpp_string_location. * internal.h (convert_f): Add cpp_string_location_reader * param. (_cpp_valid_ucn): Add source_range * param. * lex.c (forms_identifier_p): Add NULL argument to _cpp_valid_ucn. (lex_number): Initialize number->loc. (create_literal): Call init_raw on the token's str.loc. * macro.c (new_string_token): Call init on the token's str.loc. --- gcc/c-family/c-common.c | 3 +- gcc/c-family/c-lex.c | 2 +- libcpp/charset.c | 345 ++++++++++++++++++++++++++++++++++++++++++------ libcpp/directives.c | 4 +- libcpp/expr.c | 2 + libcpp/include/cpplib.h | 134 +++++++++++++++++++ libcpp/internal.h | 7 +- libcpp/lex.c | 12 +- libcpp/macro.c | 1 + 9 files changed, 465 insertions(+), 45 deletions(-) diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 77962fc..a430bee 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -935,7 +935,7 @@ fname_as_string (int pretty_p) const char *name = "top level"; char *namep; int vrb = 2, len; - cpp_string cstr = { 0, 0 }, strname; + cpp_string cstr = { 0, 0, {NULL, 0, 0} }, strname; if (!pretty_p) { @@ -952,6 +952,7 @@ fname_as_string (int pretty_p) snprintf (namep, len, "\"%s\"", name); strname.text = (unsigned char *) namep; strname.len = len - 1; + strname.loc.init_raw (UNKNOWN_LOCATION, len, 1, line_table); if (cpp_interpret_string (parse_in, &strname, 1, &cstr, CPP_STRING)) { diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index 1334994..f457199 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -171,7 +171,7 @@ cb_ident (cpp_reader * ARG_UNUSED (pfile), if (!flag_no_ident) { /* Convert escapes in the string. */ - cpp_string cstr = { 0, 0 }; + cpp_string cstr = { 0, 0, { NULL, 0, 0 } }; if (cpp_interpret_string (pfile, str, 1, &cstr, CPP_STRING)) { targetm.asm_out.output_ident ((const char *) cstr.text); diff --git a/libcpp/charset.c b/libcpp/charset.c index 5a1c929..3ae7916 100644 --- a/libcpp/charset.c +++ b/libcpp/charset.c @@ -99,6 +99,7 @@ struct _cpp_strbuf uchar *text; size_t asize; size_t len; + cpp_string_location loc; }; /* This is enough to hold any string that fits on a single 80-column @@ -453,7 +454,8 @@ one_utf16_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, static inline bool conversion_loop (int (*const one_conversion)(iconv_t, const uchar **, size_t *, uchar **, size_t *), - iconv_t cd, const uchar *from, size_t flen, struct _cpp_strbuf *to) + iconv_t cd, const uchar *from, size_t flen, struct _cpp_strbuf *to, + cpp_string_location_reader *loc_reader) { const uchar *inbuf; uchar *outbuf; @@ -468,8 +470,13 @@ conversion_loop (int (*const one_conversion)(iconv_t, const uchar **, size_t *, for (;;) { do - rval = one_conversion (cd, &inbuf, &inbytesleft, - &outbuf, &outbytesleft); + { + rval = one_conversion (cd, &inbuf, &inbytesleft, + &outbuf, &outbytesleft); + if (loc_reader) + to->loc.add_char_at (loc_reader->get_next (), + loc_reader->get_line_maps ()); + } while (inbytesleft && !rval); if (__builtin_expect (inbytesleft == 0, 1)) @@ -503,36 +510,37 @@ conversion_loop (int (*const one_conversion)(iconv_t, const uchar **, size_t *, /* These four use the custom conversion code above. */ static bool convert_utf8_utf16 (iconv_t cd, const uchar *from, size_t flen, - struct _cpp_strbuf *to) + struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader) { - return conversion_loop (one_utf8_to_utf16, cd, from, flen, to); + return conversion_loop (one_utf8_to_utf16, cd, from, flen, to, loc_reader); } static bool convert_utf8_utf32 (iconv_t cd, const uchar *from, size_t flen, - struct _cpp_strbuf *to) + struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader) { - return conversion_loop (one_utf8_to_utf32, cd, from, flen, to); + return conversion_loop (one_utf8_to_utf32, cd, from, flen, to, loc_reader); } static bool convert_utf16_utf8 (iconv_t cd, const uchar *from, size_t flen, - struct _cpp_strbuf *to) + struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader) { - return conversion_loop (one_utf16_to_utf8, cd, from, flen, to); + return conversion_loop (one_utf16_to_utf8, cd, from, flen, to, loc_reader); } static bool convert_utf32_utf8 (iconv_t cd, const uchar *from, size_t flen, - struct _cpp_strbuf *to) + struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader) { - return conversion_loop (one_utf32_to_utf8, cd, from, flen, to); + return conversion_loop (one_utf32_to_utf8, cd, from, flen, to, loc_reader); } /* Identity conversion, used when we have no alternative. */ static bool convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED, - const uchar *from, size_t flen, struct _cpp_strbuf *to) + const uchar *from, size_t flen, struct _cpp_strbuf *to, + cpp_string_location_reader *loc_reader) { if (to->len + flen > to->asize) { @@ -542,6 +550,7 @@ convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED, } memcpy (to->text + to->len, from, flen); to->len += flen; + to->loc.add_n_chars_at (flen, loc_reader); return true; } @@ -559,7 +568,8 @@ convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED, static bool convert_using_iconv (iconv_t cd, const uchar *from, size_t flen, - struct _cpp_strbuf *to) + struct _cpp_strbuf *to, + cpp_string_location_reader */*loc_reader*/) { ICONV_CONST char *inbuf; char *outbuf; @@ -606,8 +616,8 @@ convert_using_iconv (iconv_t cd, const uchar *from, size_t flen, /* Arrange for the above custom conversion logic to be used automatically when conversion between a suitable pair of character sets is requested. */ -#define APPLY_CONVERSION(CONVERTER, FROM, FLEN, TO) \ - CONVERTER.func (CONVERTER.cd, FROM, FLEN, TO) +#define APPLY_CONVERSION(CONVERTER, FROM, FLEN, TO, LOC_READER) \ + CONVERTER.func (CONVERTER.cd, FROM, FLEN, TO, LOC_READER) struct cpp_conversion { @@ -792,8 +802,9 @@ cpp_host_to_exec_charset (cpp_reader *pfile, cppchar_t c) tbuf.asize = 1; tbuf.text = XNEWVEC (uchar, tbuf.asize); tbuf.len = 0; + tbuf.loc.init (); - if (!APPLY_CONVERSION (pfile->narrow_cset_desc, sbuf, 1, &tbuf)) + if (!APPLY_CONVERSION (pfile->narrow_cset_desc, sbuf, 1, &tbuf, NULL)) { cpp_errno (pfile, CPP_DL_ICE, "converting to execution character set"); return 0; @@ -985,7 +996,9 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppchar_t c, bool _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr, const uchar *limit, int identifier_pos, - struct normalize_state *nst, cppchar_t *cp) + struct normalize_state *nst, cppchar_t *cp, + source_range *char_range, + cpp_string_location_reader *loc_reader) { cppchar_t result, c; unsigned int length; @@ -1021,6 +1034,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr, if (!ISXDIGIT (c)) break; str++; + if (char_range) + char_range->m_finish = loc_reader->get_next ().m_finish; result = (result << 4) + hex_value (c); } while (--length && str < limit); @@ -1090,7 +1105,9 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr, An advanced pointer is returned. Issues all relevant diagnostics. */ static const uchar * convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit, - struct _cpp_strbuf *tbuf, struct cset_converter cvt) + struct _cpp_strbuf *tbuf, struct cset_converter cvt, + source_range char_range, + cpp_string_location_reader *loc_reader) { cppchar_t ucn; uchar buf[6]; @@ -1100,7 +1117,12 @@ convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit, struct normalize_state nst = INITIAL_NORMALIZE_STATE; from++; /* Skip u/U. */ - _cpp_valid_ucn (pfile, &from, limit, 0, &nst, &ucn); + + /* The u/U is part of the spelling of this character. */ + char_range.m_finish = loc_reader->get_next ().m_finish; + + ucn = _cpp_valid_ucn (pfile, &from, limit, 0, &nst, + &ucn, &char_range, loc_reader); rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft); if (rval) @@ -1109,9 +1131,18 @@ convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit, cpp_errno (pfile, CPP_DL_ERROR, "converting UCN to source character set"); } - else if (!APPLY_CONVERSION (cvt, buf, 6 - bytesleft, tbuf)) - cpp_errno (pfile, CPP_DL_ERROR, - "converting UCN to execution character set"); + else + { + /* Set up a cpp_string_location_reader to supply a + location for the single character, covering all of + char_range. */ + cpp_string_location_reader buf_loc_reader + (char_range.m_start, char_range.m_finish + 1 - char_range.m_start, + loc_reader->get_line_maps ()); + if (!APPLY_CONVERSION (cvt, buf, 6 - bytesleft, tbuf, &buf_loc_reader)) + cpp_errno (pfile, CPP_DL_ERROR, + "converting UCN to execution character set"); + } return from; } @@ -1174,7 +1205,9 @@ emit_numeric_escape (cpp_reader *pfile, cppchar_t n, number. You can, e.g. generate surrogate pairs this way. */ static const uchar * convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit, - struct _cpp_strbuf *tbuf, struct cset_converter cvt) + struct _cpp_strbuf *tbuf, struct cset_converter cvt, + source_range char_range, + cpp_string_location_reader *loc_reader) { cppchar_t c, n = 0, overflow = 0; int digits_found = 0; @@ -1185,13 +1218,19 @@ convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit, cpp_warning (pfile, CPP_W_TRADITIONAL, "the meaning of '\\x' is different in traditional C"); - from++; /* Skip 'x'. */ + /* Skip 'x'. */ + from++; + + /* The 'x' is part of the spelling of this character. */ + char_range.m_finish = loc_reader->get_next ().m_finish; + while (from < limit) { c = *from; if (! hex_p (c)) break; from++; + char_range.m_finish = loc_reader->get_next ().m_finish; overflow |= n ^ (n << 4 >> 4); n = (n << 4) + hex_value (c); digits_found = 1; @@ -1213,6 +1252,9 @@ convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit, emit_numeric_escape (pfile, n, tbuf, cvt); + tbuf->loc.add_char_at (char_range, + pfile->line_table); + return from; } @@ -1224,7 +1266,9 @@ convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit, number. */ static const uchar * convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit, - struct _cpp_strbuf *tbuf, struct cset_converter cvt) + struct _cpp_strbuf *tbuf, struct cset_converter cvt, + source_range char_range, + cpp_string_location_reader *loc_reader) { size_t count = 0; cppchar_t c, n = 0; @@ -1238,6 +1282,7 @@ convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit, if (c < '0' || c > '7') break; from++; + char_range.m_finish = loc_reader->get_next ().m_finish; overflow |= n ^ (n << 3 >> 3); n = (n << 3) + c - '0'; } @@ -1251,6 +1296,9 @@ convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit, emit_numeric_escape (pfile, n, tbuf, cvt); + tbuf->loc.add_char_at (char_range, + pfile->line_table); + return from; } @@ -1260,7 +1308,8 @@ convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit, pointer. Handles all relevant diagnostics. */ static const uchar * convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit, - struct _cpp_strbuf *tbuf, struct cset_converter cvt) + struct _cpp_strbuf *tbuf, struct cset_converter cvt, + cpp_string_location_reader *loc_reader) { /* Values of \a \b \e \f \n \r \t \v respectively. */ #if HOST_CHARSET == HOST_CHARSET_ASCII @@ -1273,20 +1322,26 @@ convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit, uchar c; + /* Record the location of the backslash. */ + source_range char_range = loc_reader->get_next (); + c = *from; switch (c) { /* UCNs, hex escapes, and octal escapes are processed separately. */ case 'u': case 'U': - return convert_ucn (pfile, from, limit, tbuf, cvt); + return convert_ucn (pfile, from, limit, tbuf, cvt, + char_range, loc_reader); case 'x': - return convert_hex (pfile, from, limit, tbuf, cvt); + return convert_hex (pfile, from, limit, tbuf, cvt, + char_range, loc_reader); break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': - return convert_oct (pfile, from, limit, tbuf, cvt); + return convert_oct (pfile, from, limit, tbuf, cvt, + char_range, loc_reader); /* Various letter escapes. Get the appropriate host-charset value into C. */ @@ -1339,7 +1394,7 @@ convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit, } /* Now convert what we have to the execution character set. */ - if (!APPLY_CONVERSION (cvt, &c, 1, tbuf)) + if (!APPLY_CONVERSION (cvt, &c, 1, tbuf, loc_reader)) cpp_errno (pfile, CPP_DL_ERROR, "converting escape sequence to execution character set"); @@ -1388,14 +1443,21 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count, tbuf.asize = MAX (OUTBUF_BLOCK_SIZE, from->len); tbuf.text = XNEWVEC (uchar, tbuf.asize); tbuf.len = 0; + tbuf.loc.init (); for (i = 0; i < count; i++) { + cpp_string_location_reader loc_reader (&from[i].loc, pfile->line_table); p = from[i].text; if (*p == 'u') { - if (*++p == '8') - p++; + p++; + loc_reader.get_next (); + if (*p == '8') + { + p++; + loc_reader.get_next (); + } } else if (*p == 'L' || *p == 'U') p++; if (*p == 'R') @@ -1414,13 +1476,16 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count, /* Raw strings are all normal characters; these can be fed directly to convert_cset. */ - if (!APPLY_CONVERSION (cvt, p, limit - p, &tbuf)) + if (!APPLY_CONVERSION (cvt, p, limit - p, &tbuf, &loc_reader)) goto fail; continue; } - p++; /* Skip leading quote. */ + /* Skip leading quote. */ + p++; + loc_reader.get_next (); + limit = from[i].text + from[i].len - 1; /* Skip trailing quote. */ for (;;) @@ -1432,13 +1497,13 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count, { /* We have a run of normal characters; these can be fed directly to convert_cset. */ - if (!APPLY_CONVERSION (cvt, base, p - base, &tbuf)) + if (!APPLY_CONVERSION (cvt, base, p - base, &tbuf, &loc_reader)) goto fail; } if (p == limit) break; - p = convert_escape (pfile, p + 1, limit, &tbuf, cvt); + p = convert_escape (pfile, p + 1, limit, &tbuf, cvt, &loc_reader); } } /* NUL-terminate the 'to' buffer and translate it to a cpp_string @@ -1447,6 +1512,7 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count, tbuf.text = XRESIZEVEC (uchar, tbuf.text, tbuf.len); to->text = tbuf.text; to->len = tbuf.len; + to->loc = tbuf.loc; return true; fail: @@ -1611,7 +1677,7 @@ cppchar_t cpp_interpret_charconst (cpp_reader *pfile, const cpp_token *token, unsigned int *pchars_seen, int *unsignedp) { - cpp_string str = { 0, 0 }; + cpp_string str = { 0, 0, {NULL, 0, 0} }; bool wide = (token->type != CPP_CHAR && token->type != CPP_UTF8CHAR); int u8 = 2 * int(token->type == CPP_UTF8CHAR); cppchar_t result; @@ -1719,14 +1785,16 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_charset, to.text = input; to.asize = size; to.len = len; + to.loc.init (); } else { to.asize = MAX (65536, len); to.text = XNEWVEC (uchar, to.asize); to.len = 0; + to.loc.init (); - if (!APPLY_CONVERSION (input_cset, input, len, &to)) + if (!APPLY_CONVERSION (input_cset, input, len, &to, NULL)) cpp_error (pfile, CPP_DL_ERROR, "failure to convert %s to %s", CPP_OPTION (pfile, input_charset), SOURCE_CHARSET); @@ -1811,3 +1879,204 @@ _cpp_default_encoding (void) return current_encoding; } + +/* Implementation of class cpp_string_location and + class cpp_string_location_reader. + We put them in this source file in the hope that they can be + inlined into heavy users such as cpp_interpret_string without + requiring the compiler itself to be built with LTO. */ + +/* FIXME. */ +void +cpp_string_location::init () +{ + m_fragloc_array = NULL; + m_num_fraglocs = 0; + m_alloc_fraglocs = 0; +} + +/* FIXME. */ +void +cpp_string_location::init_raw (source_location loc, int len, int cols_per_char, + line_maps *line_table) +{ + line_map_realloc reallocator = (line_table->reallocator + ? line_table->reallocator + : (line_map_realloc) xrealloc); + m_fragloc_array = (cpp_string_fragment_location *)reallocator + (NULL, + sizeof (cpp_string_fragment_location)); + m_fragloc_array[0].m_len = len; + + /* LOC might be a macro location. It only makes sense to do + column-by-column calculations on ordinary maps, so get the + corresponding location in an ordinary map. */ + source_location ordinary_loc + = linemap_resolve_location (line_table, loc, + LRK_SPELLING_LOCATION, NULL); + m_fragloc_array[0].m_loc = ordinary_loc; + m_fragloc_array[0].m_cols_per_char = cols_per_char; + m_num_fraglocs = 1; + m_alloc_fraglocs = 1; +} + + +/* FIXME. */ +void +cpp_string_location::add_char_at (source_range range, + line_maps *line_table) +{ + if (m_fragloc_array) + { + /* Is this a simple run-on character in the next column + within the current fragment? */ + cpp_string_fragment_location *current_fragment + = get_current_fragment (); + source_range next_range = current_fragment->get_next_range (); + if (range.m_start == next_range.m_start + && range.m_finish == next_range.m_finish) + /* If so, we can simply increase the length of the current + fragment. */ + current_fragment->m_len++; + else + { + /* We need to start a new fragment. This may require growing + the underlying array. */ + if (++m_num_fraglocs > m_alloc_fraglocs) + { + m_alloc_fraglocs *= 2; + line_map_realloc reallocator = (line_table->reallocator + ? line_table->reallocator + : (line_map_realloc) xrealloc); + m_fragloc_array = (cpp_string_fragment_location *)reallocator + (m_fragloc_array, + sizeof (cpp_string_fragment_location) * m_alloc_fraglocs); + } + current_fragment = get_current_fragment (); + current_fragment->m_len = 1; + current_fragment->m_loc = range.m_start; + current_fragment->m_cols_per_char + = range.m_finish + 1 - range.m_start; + } + } + else + { + /* Begin new fragment array. */ + line_map_realloc reallocator = (line_table->reallocator + ? line_table->reallocator + : (line_map_realloc) xrealloc); + m_fragloc_array = (cpp_string_fragment_location *)reallocator + (NULL, sizeof (cpp_string_fragment_location)); + m_fragloc_array[0].m_len = 1; + m_fragloc_array[0].m_loc = range.m_start; + m_fragloc_array[0].m_cols_per_char + = range.m_finish + 1 - range.m_start; + m_num_fraglocs = 1; + m_alloc_fraglocs = 1; + } +} + +/* FIXME. */ +void +cpp_string_location::add_n_chars_at (int flen, + cpp_string_location_reader *loc_reader) +{ + if (loc_reader) + while (flen--) + add_char_at (loc_reader->get_next (), + loc_reader->get_line_maps ()); +} + +/* FIXME. */ +source_location +cpp_string_location::get_loc_at_index (unsigned int char_idx) const +{ + for (unsigned int fragment_idx = 0; + fragment_idx < m_num_fraglocs; + fragment_idx++) + { + cpp_string_fragment_location *fragment = &m_fragloc_array[fragment_idx]; + if (char_idx < fragment->m_len) + return fragment->get_char_range (char_idx).m_start; + else + char_idx -= fragment->m_len; + } + + /* Error: accessing beyond the end of the array. */ + return 0; +} + +/* FIXME. */ +source_range +cpp_string_location::get_range_at_index (unsigned int char_idx) const +{ + for (unsigned int fragment_idx = 0; + fragment_idx < m_num_fraglocs; + fragment_idx++) + { + cpp_string_fragment_location *fragment = &m_fragloc_array[fragment_idx]; + if (char_idx < fragment->m_len) + return fragment->get_char_range (char_idx); + else + char_idx -= fragment->m_len; + } + + /* Error: accessing beyond the end of the array. */ + source_range err; + err.m_start = 0; + err.m_finish = 0; + return err; +} + +/* FIXME. */ +bool +cpp_string_location::trivial_p () const +{ + if (m_num_fraglocs == 1) + if (m_fragloc_array[0].m_cols_per_char == 1) + return true; + return false; +} + +/* Constructor for iterating through the locations in + cpp_string_location. */ + +cpp_string_location_reader:: +cpp_string_location_reader (const cpp_string_location *strloc, + line_maps *line_table) +{ + /* As an optimization, we require that STRLOC must consist of a + single fragment. */ + linemap_assert (strloc->m_num_fraglocs == 1); + m_loc = strloc->m_fragloc_array[0].m_loc; + m_cols_per_char = strloc->m_fragloc_array[0].m_cols_per_char; + m_line_table = line_table; +} + +/* Constructor for iterating through an arbitrary buffer. */ + +cpp_string_location_reader:: +cpp_string_location_reader (source_location src_loc, + int cols_per_char, + line_maps *line_table) +: m_cols_per_char (cols_per_char), + m_line_table (line_table) +{ + /* LOC might be a macro location. It only makes sense to do + column-by-column calculations on ordinary maps, so get the + corresponding location in an ordinary map. */ + m_loc + = linemap_resolve_location (line_table, src_loc, + LRK_SPELLING_LOCATION, NULL); +} + +/* FIXME. */ +source_range +cpp_string_location_reader::get_next () +{ + source_range result; + result.m_start = m_loc; + result.m_finish = m_loc + m_cols_per_char - 1; + m_loc += m_cols_per_char; + return result; +} diff --git a/libcpp/directives.c b/libcpp/directives.c index 1e9bc3d..b783a7e 100644 --- a/libcpp/directives.c +++ b/libcpp/directives.c @@ -949,7 +949,7 @@ do_line (cpp_reader *pfile) token = cpp_get_token (pfile); if (token->type == CPP_STRING) { - cpp_string s = { 0, 0 }; + cpp_string s = { 0, 0, { NULL, 0, 0 } }; if (cpp_interpret_string_notranslate (pfile, &token->val.str, 1, &s, CPP_STRING)) new_file = (const char *)s.text; @@ -1006,7 +1006,7 @@ do_linemarker (cpp_reader *pfile) token = cpp_get_token (pfile); if (token->type == CPP_STRING) { - cpp_string s = { 0, 0 }; + cpp_string s = { 0, 0, { NULL, 0, 0 } }; if (cpp_interpret_string_notranslate (pfile, &token->val.str, 1, &s, CPP_STRING)) new_file = (const char *)s.text; diff --git a/libcpp/expr.c b/libcpp/expr.c index 3dc5c0b..f355646 100644 --- a/libcpp/expr.c +++ b/libcpp/expr.c @@ -1228,6 +1228,8 @@ _cpp_parse_expr (cpp_reader *pfile, bool is_if) "missing binary operator before token \"%s\"", cpp_token_as_text (pfile, op.token)); want_value = false; + ((cpp_token *)op.token)->val.str.loc.init_raw (op.loc, 1, 1, /* FIXME */ + pfile->line_table); top->value = eval_token (pfile, op.token, op.loc); continue; diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index 0b1a403..a5e5df5 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -173,10 +173,144 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, CLK_GNUCXX, CLK_CXX98, CLK_GNUCXX11, CLK_CXX11, CLK_GNUCXX14, CLK_CXX14, CLK_GNUCXX1Z, CLK_CXX1Z, CLK_ASM}; +/* Location of the individual chars in a cpp_string. + Specifically, this stores a run of characters of len, starting at loc, + with a consistent number of columns per char. + See the description below for cpp_string_location. */ +struct GTY(()) cpp_string_fragment_location { + source_location m_loc; + unsigned int m_len : 12; + unsigned int m_cols_per_char : 4; + + source_range get_char_range (int idx) const + { + source_range result; + result.m_start = m_loc + (idx * m_cols_per_char); + result.m_finish = result.m_start + m_cols_per_char - 1; + return result; + } + source_range get_next_range () const + { + return get_char_range (m_len); + } + source_range get_covered_range () const + { + source_range result; + result.m_start = m_loc; + result.m_finish = m_loc + (m_len * m_cols_per_char) - 1; + return result; + } + void debug (const char *msg) const; +}; + +class cpp_string_location_reader; + +/* Location of the individual chars in a cpp_string. + This is stored as a dynamically-allocated array of fragments. + For example, consider this call to printf: + + printf ("foo \x25\151 bar" "baz", + "not an int"); + + The string constant for the first parameter is composed of + the concatenation of two string literals, with hexadecimal + encoding of a '%' and octal encoding of a 'i', giving a + resulting STRING_CST of: + + "foo %i barbaz" + + We want to efficiently record the range of locations in the + source file of each character so that we can emit warnings about + the type mismatch between format specifier "%i" and the non-int + second argument. + + We record the locations as a series of fragments, where within + each fragment we have a contiguous run of input characters with + a consistent number of columns per character. In the example + above the fragments are: + + printf ("foo \x25\151 bar" "baz", + .........^^^^....................: fragment 0: 4 chars at 1 col per char + .............^^^^^^^^............: fragment 1: 2 chars at 4 cols per char + .....................^^^^........: fragment 2: 4 chars at 1 col per char + .............................^^^.: fragment 3: 3 chars at 1 col per char + + Note that the hex and octal chars both happen to be 4 cols per char + and are contiguous, hence both end up being in fragment 1, whereas the + "bar" and "baz" aren't contiguous and hence have to be in separate + fragments. + + Note also that having a constant cols-per-char within each fragment + means that given an index into the fragment we can directly compute + the corresponding source_range. */ + +struct GTY(()) cpp_string_location { + + void init (); + void init_raw (source_location loc, int len, int cols_per_char, + line_maps *line_table); + + void add_char_at (source_range range, + line_maps *line_table); + void add_n_chars_at (int flen, cpp_string_location_reader *loc_reader); + + source_location get_loc_at_index (unsigned int idx) const; + source_range get_range_at_index (unsigned int idx) const; + + void debug () const; + + bool trivial_p () const; + + private: + cpp_string_fragment_location *get_current_fragment () const + { + return &m_fragloc_array[m_num_fraglocs - 1]; + } + + /* Fields. + Ideally we would make these fields private, but this isn't easily + doable since gengtype generates functions in gtype-desc.c that + access them. */ + public: + + /* We seemingly can't use vec<> from libcpp, so do it "by hand" + here. */ + cpp_string_fragment_location *m_fragloc_array; + unsigned int m_num_fraglocs; + unsigned int m_alloc_fraglocs; +}; + +/* A class for iterating through the source-locations within a + string, either from a cpp_string_location, or a temporary buffer. */ +class cpp_string_location_reader { + public: + /* Constructor for iterating through the locations in + cpp_string_location. + As an optimization, we require that STRLOC must consist of a + single fragment. */ + cpp_string_location_reader (const cpp_string_location *strloc, + line_maps *line_table); + + /* Constructor for iterating through an arbitrary buffer. */ + cpp_string_location_reader (source_location src_loc, + int cols_per_char, + line_maps *line_table); + + source_range get_next (); + + line_maps *get_line_maps () const { return m_line_table; } + + private: + source_location m_loc; + int m_cols_per_char; + line_maps *m_line_table; +}; + /* Payload of a NUMBER, STRING, CHAR or COMMENT token. */ struct GTY(()) cpp_string { unsigned int len; const unsigned char *text; + cpp_string_location loc; }; /* Flags for the cpp_token structure. */ diff --git a/libcpp/internal.h b/libcpp/internal.h index abd464f..5be45f3 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -42,7 +42,8 @@ struct op; struct _cpp_strbuf; typedef bool (*convert_f) (iconv_t, const unsigned char *, size_t, - struct _cpp_strbuf *); + struct _cpp_strbuf *, + cpp_string_location_reader *loc_reader); struct cset_converter { convert_f func; @@ -747,7 +748,9 @@ struct normalize_state extern bool _cpp_valid_ucn (cpp_reader *, const unsigned char **, const unsigned char *, int, struct normalize_state *state, - cppchar_t *); + cppchar_t *, + source_range *char_range, + cpp_string_location_reader *loc_reader); extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t, diff --git a/libcpp/lex.c b/libcpp/lex.c index a84a8c0..0a6bc1c 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -1247,7 +1247,7 @@ forms_identifier_p (cpp_reader *pfile, int first, cppchar_t s; buffer->cur += 2; if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first, - state, &s)) + state, &s, NULL, NULL)) return true; buffer->cur -= 2; } @@ -1407,6 +1407,15 @@ lex_number (cpp_reader *pfile, cpp_string *number, const uchar *base; uchar *dest; + /* FIXME: should it really use a new "cpp_number", rather than cpp_string? + We need to init this, or we get a crash accessing uninited data + during GC, since, + struct GTY(()) cpp_token + has union cpp_token_u with + desc ("cpp_token_val_index (&%1)"))) + and this gives CPP_TOKEN_FLD_STR for numbers (and strings). */ + number->loc.init (); + base = pfile->buffer->cur - 1; do { @@ -1446,6 +1455,7 @@ create_literal (cpp_reader *pfile, cpp_token *token, const uchar *base, token->type = type; token->val.str.len = len; token->val.str.text = dest; + token->val.str.loc.init_raw (token->src_loc, len, 1, pfile->line_table); } /* Subroutine of lex_raw_string: Append LEN chars from BASE to the buffer diff --git a/libcpp/macro.c b/libcpp/macro.c index 786c21b..b21e218 100644 --- a/libcpp/macro.c +++ b/libcpp/macro.c @@ -216,6 +216,7 @@ new_string_token (cpp_reader *pfile, unsigned char *text, unsigned int len) token->type = CPP_STRING; token->val.str.len = len; token->val.str.text = text; + token->val.str.loc.init (); token->flags = 0; return token; }