From patchwork Thu Oct 8 19:16:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 1378856 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=acm.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=Hk+iyksc; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C6gsw2nh5z9sTq for ; Fri, 9 Oct 2020 06:16:20 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 560003943432; Thu, 8 Oct 2020 19:16:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by sourceware.org (Postfix) with ESMTPS id 79FA0394342F for ; Thu, 8 Oct 2020 19:16:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 79FA0394342F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=acm.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nathanmsidwell@gmail.com Received: by mail-qt1-x82f.google.com with SMTP id s47so6069966qth.4 for ; Thu, 08 Oct 2020 12:16:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=B//qGcHsS8MlLLgjcbOhxC1D3jomHy381xrU4cgz1HE=; b=Hk+iykscWTkbyLiMvbm+ii8zb+IaMbVkW2sw8F6BRl3w1dJx5SQm4c4nNC1ELOJ4Ic MrVB+Mp+qLk3rZ670FMnRvggg4zqFsdmUf46I4SJaAj7apvKN1rjfISLtWTSs7B++yAk c1nfvGgDTLg06q0Ul9uGffhjBn0GHzBTxLHI2u5geoPMkWc4e0qI7XlnZHSuFs1Tf3N1 UfxTe5xaDTHTrGSH2LI6DnbyrM5HjO7ByfxvnBbp/zrx2Ue6cXAO8S5l3CrMUtuRacKT M4Q4csqdNNCpkrqdCJ6MNPLo7+l6jQk8W6wgPAnuAHWS0jBNm/5XtMmucjuyKL57Tcr+ KhTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:to:from:subject:message-id:date :user-agent:mime-version:content-language; bh=B//qGcHsS8MlLLgjcbOhxC1D3jomHy381xrU4cgz1HE=; b=HGU86lAx3OrO2SK75kimadTOQ0bm623sQAc3IoNx5mVNq7ODPTC7i9CZ5CNIgKEBLE tktS+WyZ5pylQyJvO49sd2JTSsXdP/GaMwE6QTkIUkk9DtybzMtWbj4ONSzDMN4QLQx2 Fzs63k25fr5sAMy1sH52ZqloHv5VmDHg0UJ2yTj5KImgigjBDoeyDYnZNdBV84yMagNc GSQgeMZtHuFVldt3irt908vQlg8JT0oWj/8ZxLY+kblHeiOBXDX+zJHVV7jr/d1G09qv Oe4JqfhH+4BJ9zm1R4kzDYwe2ptVfWg8m+9HhEziHO8DIzjM+YBeiOJL3sAoIbqIR1Dz 9bJw== X-Gm-Message-State: AOAM530NbJeNxMZCZalDfaAYME1dVOd97MiE5aTTF6r3DDsivTzyWtl0 FRQR5yjPg+IJceS3p9x+sDA= X-Google-Smtp-Source: ABdhPJzIce+s9P8h4w9kW39D/tRrQA0Gu0fIR8YM//8RkMJZ13X3deKQJWDBRDJYB31RLlvvjatH3g== X-Received: by 2002:ac8:7c97:: with SMTP id y23mr10061127qtv.48.1602184574825; Thu, 08 Oct 2020 12:16:14 -0700 (PDT) Received: from ?IPv6:2620:10d:c0a8:1102:4232:e32:2971:cf? ([2620:10d:c091:480::1:1f32]) by smtp.googlemail.com with ESMTPSA id r12sm4564614qke.87.2020.10.08.12.16.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 08 Oct 2020 12:16:13 -0700 (PDT) To: GCC Patches From: Nathan Sidwell Subject: libcpp: Directly peek for initial line marker Message-ID: <1a000e0b-b05e-f352-e68f-8aee53d10d2b@acm.org> Date: Thu, 8 Oct 2020 15:16:12 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 Content-Language: en-US X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Using the tokenizer to sniff for an initial line marker for preprocessed input is a little brittle, particularly with -fdirectives-only. If there is no marker we'll happily munch initial comments. This patch directly sniffs the buffer. This is safe because the initial line marker was machine generated and must be right at the beginning of the file. Anything else is not such a line marker. The same is true for the initial directory marker. For that tokenizing the string is simplest, but at that point it's either a regular line marker or a directory marker. If it's a regular marker, unwinding tokens is fine. libcpp/ * internal.h (enum include_type): Rename IT_MAIN_INJECT to IT_PRE_MAIN. * init.c (cpp_read_main_file): If there is no line marker, adjust the initial line marker. (read_original_filename): Return bool, peek the buffer directly before trying to tokenize. (read_original_directory): Likewise. Directly prod the string literal. * files.c (_cpp_stack_file): Adjust for IT_PRE_MAIN change. pushing to trunk, nathan diff --git i/libcpp/files.c w/libcpp/files.c index b890b8ebf1e..5af41364d0a 100644 --- i/libcpp/files.c +++ w/libcpp/files.c @@ -948,10 +948,12 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, include_type type, /* Add line map and do callbacks. */ _cpp_do_file_change (pfile, LC_ENTER, file->path, - /* With preamble injection, start on line zero, so - the preamble doesn't appear to have been - included from line 1. */ - type == IT_MAIN_INJECT ? 0 : 1, sysp); + /* With preamble injection, start on line zero, + so the preamble doesn't appear to have been + included from line 1. Likewise when + starting preprocessed, we expect an initial + locating line. */ + type == IT_PRE_MAIN ? 0 : 1, sysp); return true; } diff --git i/libcpp/init.c w/libcpp/init.c index aba5854d357..84c0a9efa74 100644 --- i/libcpp/init.c +++ w/libcpp/init.c @@ -36,7 +36,7 @@ along with this program; see the file COPYING3. If not see static void init_library (void); static void mark_named_operators (cpp_reader *, int); -static void read_original_filename (cpp_reader *); +static bool read_original_filename (cpp_reader *); static void read_original_directory (cpp_reader *); static void post_options (cpp_reader *); @@ -681,94 +681,114 @@ cpp_read_main_file (cpp_reader *pfile, const char *fname, bool injecting) return NULL; _cpp_stack_file (pfile, pfile->main_file, - injecting ? IT_MAIN_INJECT : IT_MAIN, 0); + injecting || CPP_OPTION (pfile, preprocessed) + ? IT_PRE_MAIN : IT_MAIN, 0); /* For foo.i, read the original filename foo.c now, for the benefit of the front ends. */ if (CPP_OPTION (pfile, preprocessed)) - read_original_filename (pfile); + if (!read_original_filename (pfile)) + { + /* We're on line 1 after all. */ + auto *last = linemap_check_ordinary + (LINEMAPS_LAST_MAP (pfile->line_table, false)); + last->to_line = 1; + /* Inform of as-if a file change. */ + _cpp_do_file_change (pfile, LC_RENAME_VERBATIM, LINEMAP_FILE (last), + LINEMAP_LINE (last), LINEMAP_SYSP (last)); + } return ORDINARY_MAP_FILE_NAME (LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table)); } -/* For preprocessed files, if the first tokens are of the form # NUM. - handle the directive so we know the original file name. This will - generate file_change callbacks, which the front ends must handle - appropriately given their state of initialization. */ -static void +/* For preprocessed files, if the very first characters are + '#[01]', then handle a line directive so we know the + original file name. This will generate file_change callbacks, + which the front ends must handle appropriately given their state of + initialization. We peek directly into the character buffer, so + that we're not confused by otherwise-skipped white space & + comments. We can be very picky, because this should have been + machine-generated text (by us, no less). This way we do not + interfere with the module directive state machine. */ + +static bool read_original_filename (cpp_reader *pfile) { - const cpp_token *token, *token1; - - /* Lex ahead; if the first tokens are of the form # NUM, then - process the directive, otherwise back up. */ - token = _cpp_lex_direct (pfile); - if (token->type == CPP_HASH) + auto *buf = pfile->buffer->next_line; + + if (pfile->buffer->rlimit - buf > 4 + && buf[0] == '#' + && buf[1] == ' ' + // Also permit '1', as that's what used to be here + && (buf[2] == '0' || buf[2] == '1') + && buf[3] == ' ') { - pfile->state.in_directive = 1; - token1 = _cpp_lex_direct (pfile); - _cpp_backup_tokens (pfile, 1); - pfile->state.in_directive = 0; - - /* If it's a #line directive, handle it. */ - if (token1->type == CPP_NUMBER - && _cpp_handle_directive (pfile, token->flags & PREV_WHITE)) + const cpp_token *token = _cpp_lex_direct (pfile); + gcc_checking_assert (token->type == CPP_HASH); + if (_cpp_handle_directive (pfile, token->flags & PREV_WHITE)) { read_original_directory (pfile); - return; + return true; } } - /* Backup as if nothing happened. */ - _cpp_backup_tokens (pfile, 1); + return false; } /* For preprocessed files, if the tokens following the first filename line is of the form # "/path/name//", handle the - directive so we know the original current directory. */ + directive so we know the original current directory. + + As with the first line peeking, we can do this without lexing by + being picky. */ static void read_original_directory (cpp_reader *pfile) { - const cpp_token *hash, *token; - - /* Lex ahead; if the first tokens are of the form # NUM, then - process the directive, otherwise back up. */ - hash = _cpp_lex_direct (pfile); - if (hash->type != CPP_HASH) + auto *buf = pfile->buffer->next_line; + + if (pfile->buffer->rlimit - buf > 4 + && buf[0] == '#' + && buf[1] == ' ' + // Also permit '1', as that's what used to be here + && (buf[2] == '0' || buf[2] == '1') + && buf[3] == ' ') { - _cpp_backup_tokens (pfile, 1); - return; - } - - token = _cpp_lex_direct (pfile); + const cpp_token *hash = _cpp_lex_direct (pfile); + gcc_checking_assert (hash->type == CPP_HASH); + pfile->state.in_directive = 1; + const cpp_token *number = _cpp_lex_direct (pfile); + gcc_checking_assert (number->type == CPP_NUMBER); + const cpp_token *string = _cpp_lex_direct (pfile); + pfile->state.in_directive = 0; - if (token->type != CPP_NUMBER) - { - _cpp_backup_tokens (pfile, 2); - return; - } + const unsigned char *text = nullptr; + size_t len = 0; + if (string->type == CPP_STRING) + { + /* The string value includes the quotes. */ + text = string->val.str.text; + len = string->val.str.len; + } + if (len < 5 + || !IS_DIR_SEPARATOR (text[len - 2]) + || !IS_DIR_SEPARATOR (text[len - 3])) + { + /* That didn't work out, back out. */ + _cpp_backup_tokens (pfile, 3); + return; + } - token = _cpp_lex_direct (pfile); + if (pfile->cb.dir_change) + { + /* Smash the string directly, it's dead at this point */ + char *smashy = (char *)text; + smashy[len - 3] = 0; + + pfile->cb.dir_change (pfile, smashy + 1); + } - if (token->type != CPP_STRING - || ! (token->val.str.len >= 5 - && IS_DIR_SEPARATOR (token->val.str.text[token->val.str.len-2]) - && IS_DIR_SEPARATOR (token->val.str.text[token->val.str.len-3]))) - { - _cpp_backup_tokens (pfile, 3); - return; + /* We should be at EOL. */ } - - if (pfile->cb.dir_change) - { - char *debugdir = (char *) alloca (token->val.str.len - 3); - - memcpy (debugdir, (const char *) token->val.str.text + 1, - token->val.str.len - 4); - debugdir[token->val.str.len - 4] = '\0'; - - pfile->cb.dir_change (pfile, debugdir); - } } /* This is called at the end of preprocessing. It pops the last diff --git i/libcpp/internal.h w/libcpp/internal.h index 4bafe1cf353..b728df74562 100644 --- i/libcpp/internal.h +++ w/libcpp/internal.h @@ -124,8 +124,8 @@ enum include_type IT_CMDLINE, /* -include */ IT_DEFAULT, /* forced header */ IT_MAIN, /* main, start on line 1 */ - IT_MAIN_INJECT, /* main, but there will be an injected preamble - before line 1 */ + IT_PRE_MAIN, /* main, but there will be a preamble before line + 1 */ IT_DIRECTIVE_HWM = IT_IMPORT + 1, /* Directives below this. */ IT_HEADER_HWM = IT_DEFAULT + 1 /* Header files below this. */