From patchwork Fri Jun 30 22:59:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 1802164 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=kt835DOc; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qt9kj52LCz20b1 for ; Sat, 1 Jul 2023 08:59:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B38863857C55 for ; Fri, 30 Jun 2023 22:59:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B38863857C55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1688165994; bh=/y25ciD9KeT6Z+o3mlIUp4U/M1/XYzuNJ/0/eIoZE4c=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=kt835DOcVtcQc9Vck0PDcUcau7/MthdS91bTDbf1ugYBLPgjnln4GMWgRbwRNBZIR 0OLgobGB7ET+86H96a/sG5N8MXtzAg4GCf/DEdMIFmXAvXb87DICvjoGzWRe87PmiW peFsnMDZvijg50/U+Ji8GnEHTZZnY5stY05Ul7Ps= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by sourceware.org (Postfix) with ESMTPS id 75D983858D33 for ; Fri, 30 Jun 2023 22:59:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 75D983858D33 Received: by mail-qv1-xf33.google.com with SMTP id 6a1803df08f44-635e6f8bf77so16773386d6.0 for ; Fri, 30 Jun 2023 15:59:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688165973; x=1690757973; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/y25ciD9KeT6Z+o3mlIUp4U/M1/XYzuNJ/0/eIoZE4c=; b=UJDOKWzfx4+paDotr06FXntVD7vBiF0TbZxK6KpcvRigsPYCywtdFHpOSyB81QHLhh 4C7/3Po3cPaxPLrxrf/lnaz8gcoRHatgxj320Cjq0yxUINWkEjg9LJjDAXnvrft9tW5I 8Rc8ZRYTPabtDmfMuzamCGxnq1TUh68mBP9Sfi76DG5oJhOqYIj2oLOcmg0v2WOtsu3n jlqddInWfh1ukpcvGO2RQjAH6EPmjlH1LVkau1DvhEFpPDGqGgRQZprReueavZit0kTo RzO1TprPnuvrUq0ETCG/ts/EWOplj40RnHoj/YWKNyInvXw7v6Asivju5Y1X4QdRtLUI /+oQ== X-Gm-Message-State: ABy/qLYhozCGlf8eFiZ9vE9ZVsUsK/VmQ55hdlSoO8MMh2RMfM0lZmrh s1wA/hCXQA6rrqLymYmkacNX13Ah33o= X-Google-Smtp-Source: APBJJlFLA53N3HPx7vSTy//cQjJ+erxTMR2z+d3SSd02SK1H6mQc6s1fYy7IzG8nu8P0xi28e6tLsA== X-Received: by 2002:a05:6214:194d:b0:635:fb19:2ebd with SMTP id q13-20020a056214194d00b00635fb192ebdmr5582382qvk.13.1688165972651; Fri, 30 Jun 2023 15:59:32 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id ew13-20020a0562140aad00b0062168714c8fsm8386386qvb.120.2023.06.30.15.59.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jun 2023 15:59:32 -0700 (PDT) To: gcc-patches@gcc.gnu.org Cc: Jason Merrill , Lewis Hyatt Subject: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode Date: Fri, 30 Jun 2023 18:59:14 -0400 Message-Id: <20230630225914.620150-1-lhyatt@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3038.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" In order to support processing #pragma in preprocess-only mode (-E or -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from libcpp. In full compilation modes, this is accomplished by calling pragma_lex (), which is a symbol that must be exported by the frontend, and which is currently implemented for C and C++. Neither of those frontends initializes its parser machinery in preprocess-only mode, and consequently pragma_lex () does not work in this case. Address that by adding a new function c_init_preprocess () for the frontends to implement, which arranges for pragma_lex () to work in preprocess-only mode, and adjusting pragma_lex () accordingly. In preprocess-only mode, the preprocessor is accustomed to controlling the interaction with libcpp, and it only knows about tokens that it has called into libcpp itself to obtain. Since it still needs to see the tokens obtained by pragma_lex () so that they can be streamed to the output, also add a new libcpp callback, on_token_lex (), that ensures the preprocessor sees these tokens too. Currently, there is one place where we are already supporting #pragma in preprocess-only mode, namely the handling of `#pragma GCC diagnostic'. That was done by directly interfacing with libcpp, rather than making use of pragma_lex (). Now that pragma_lex () works, that code is no longer necessary; remove it. gcc/c-family/ChangeLog: * c-common.h (c_init_preprocess): Declare new function. * c-opts.cc (c_common_init): Call it. * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to... (pragma_diagnostic_lex): ...this. (pragma_diagnostic_lex_pp): Remove. (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in all modes. (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex () usage. * c-pragma.h (pragma_lex_discard_to_eol): Declare new function. gcc/c/ChangeLog: * c-parser.cc (pragma_lex): Support preprocess-only mode. (pragma_lex_discard_to_eol): New function. (c_init_preprocess): New function. gcc/cp/ChangeLog: * parser.cc (c_init_preprocess): New function. (maybe_read_tokens_for_pragma_lex): New function. (pragma_lex): Support preprocess-only mode. (pragma_lex_discard_to_eol): New funtion. libcpp/ChangeLog: * include/cpplib.h (struct cpp_callbacks): Add new callback on_token_lex. * macro.cc (cpp_get_token_1): Support new callback. --- Notes: Hello- In r13-1544, I added support for processing `#pragma GCC diagnostic' in preprocess-only mode. Because pragma_lex () doesn't work in that mode, in that patch I called into libcpp directly to obtain the tokens needed to process the pragma. As part of the review, Jason noted that it would probably be better to make pragma_lex () usable in preprocess-only mode, and we decided just to add a comment about that for the time being, and to go ahead and implement that in the future, if it became necessary to support other pragmas during preprocessing. I think now is a good time to proceed with that plan, because I would like to fix PR87299, which is about another pragma (#pragma GCC target) not working in preprocess-only mode. This patch makes the necessary changes for pragma_lex () to work in preprocess-only mode. I have also added a new callback, on_token_lex (), to libcpp. This is so the preprocessor can see and stream out all the tokens that pragma_lex () gets from libcpp, since it won't otherwise see them. This seemed the simplest approach to me. Another possibility would be to add a wrapper function in c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then also stream the token in preprocess-only mode, and then change all calls into libcpp in that file to use the wrapper function. The libcpp callback seemed cleaner to me FWIW. There are no new tests added here, since it's just a change of implementation covered by existing tests. Bootstrap + regtest all languages looks good on x86-64 Linux. Please let me know what you think? Thanks! -Lewis gcc/c-family/c-common.h | 3 +++ gcc/c-family/c-opts.cc | 1 + gcc/c-family/c-pragma.cc | 56 ++++++---------------------------------- gcc/c-family/c-pragma.h | 2 ++ gcc/c/c-parser.cc | 34 ++++++++++++++++++++++++ gcc/cp/parser.cc | 50 +++++++++++++++++++++++++++++++++++ libcpp/include/cpplib.h | 4 +++ libcpp/macro.cc | 3 +++ 8 files changed, 105 insertions(+), 48 deletions(-) diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index b5ef5ff6b2c..78fc5248ba6 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -990,6 +990,9 @@ extern void c_parse_file (void); extern void c_parse_final_cleanups (void); +/* This initializes for preprocess-only mode. */ +extern void c_init_preprocess (void); + /* These macros provide convenient access to the various _STMT nodes. */ /* Nonzero if a given STATEMENT_LIST represents the outermost binding diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index af19140e382..4961af63de8 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1232,6 +1232,7 @@ c_common_init (void) if (flag_preprocess_only) { c_finish_options (); + c_init_preprocess (); preprocess_file (parse_in); return false; } diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc index 0d2b333cebb..73d59df3bf4 100644 --- a/gcc/c-family/c-pragma.cc +++ b/gcc/c-family/c-pragma.cc @@ -840,11 +840,11 @@ public: }; -/* When compiling normally, use pragma_lex () to obtain the needed tokens. - This will call into either the C or C++ frontends as appropriate. */ +/* This will call into either the C or C++ frontends as appropriate to get + tokens from libcpp for the pragma. */ static void -pragma_diagnostic_lex_normal (pragma_diagnostic_data *result) +pragma_diagnostic_lex (pragma_diagnostic_data *result) { result->clear (); tree x; @@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data *result) result->valid = true; } -/* When preprocessing only, pragma_lex () is not available, so obtain the - tokens directly from libcpp. We also need to inform the token streamer - about all tokens we lex ourselves here, so it outputs them too; this is - done by calling c_pp_stream_token () for each. - - ??? If we need to support more pragmas in the future, maybe initialize - this_parser with the pragma tokens and call pragma_lex () instead? */ - -static void -pragma_diagnostic_lex_pp (pragma_diagnostic_data *result) -{ - result->clear (); - - auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind); - c_pp_stream_token (parse_in, tok, result->loc_kind); - if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD)) - return; - const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok); - result->set_kind ((const char *)kind_u); - if (result->pd_kind == pragma_diagnostic_data::PK_INVALID) - return; - - if (result->needs_option ()) - { - tok = cpp_get_token_with_location (parse_in, &result->loc_option); - c_pp_stream_token (parse_in, tok, result->loc_option); - if (tok->type != CPP_STRING) - return; - cpp_string str; - if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, &str, - CPP_STRING) - || !str.len) - return; - result->option_str = (const char *)str.text; - result->own_option_str = true; - } - - result->valid = true; -} - /* Handle #pragma GCC diagnostic. Early mode is used by frontends (such as C++) that do not process the deferred pragma while they are consuming tokens; they can use early mode to make sure diagnostics affecting the preprocessor itself @@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl () static const bool want_diagnostics = (is_pp || !early); pragma_diagnostic_data data; - if (is_pp) - pragma_diagnostic_lex_pp (&data); - else - pragma_diagnostic_lex_normal (&data); + pragma_diagnostic_lex (&data); if (!data.kind_str) { @@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id) { const auto data = ®istered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL]; if (data->early_handler) - data->early_handler (parse_in); + { + data->early_handler (parse_in); + pragma_lex_discard_to_eol (); + } } /* Set up front-end pragmas. */ diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index 9cc95ab3ee3..198fa7723e5 100644 --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree); extern void maybe_apply_pragma_scalar_storage_order (tree); extern void add_to_renaming_pragma_list (tree, tree); +/* These are to be implemented in each frontend that needs them. */ extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL); +extern void pragma_lex_discard_to_eol (); /* Flags for use with c_lex_with_flags. The values here were picked so that 0 means to translate and join strings. */ diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 24a6eb6e459..aaf6d704fe6 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -13355,6 +13355,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) enum cpp_ttype pragma_lex (tree *value, location_t *loc) { + if (flag_preprocess_only) + /* Arrange for the preprocessor to see the tokens we're about to read, + since it won't see them later. */ + cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token; + c_token *tok = c_parser_peek_token (the_parser); enum cpp_ttype ret = tok->type; @@ -13373,9 +13378,29 @@ pragma_lex (tree *value, location_t *loc) c_parser_consume_token (the_parser); } + cpp_get_callbacks (parse_in)->on_token_lex = nullptr; return ret; } +void +pragma_lex_discard_to_eol () +{ + if (flag_preprocess_only) + /* Arrange for the preprocessor to see the tokens we're about to read, + since it won't see them later. */ + cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token; + + cpp_ttype type; + do + { + type = c_parser_peek_token (the_parser)->type; + gcc_assert (type != CPP_EOF); + c_parser_consume_token (the_parser); + } while (type != CPP_PRAGMA_EOL); + + cpp_get_callbacks (parse_in)->on_token_lex = nullptr; +} + static void c_parser_pragma_pch_preprocess (c_parser *parser) { @@ -24756,6 +24781,15 @@ c_parse_file (void) the_parser = NULL; } +void +c_init_preprocess (void) +{ + /* Create a parser for use by pragma_lex during preprocessing. */ + the_parser = ggc_alloc (); + memset (the_parser, 0, sizeof (c_parser)); + the_parser->tokens = &the_parser->tokens_buf[0]; +} + /* Parse the body of a function declaration marked with "__RTL". The RTL parser works on the level of characters read from a diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 5e2b5cba57e..b2f2e222d81 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -765,6 +765,15 @@ cp_lexer_new_main (void) return lexer; } +/* Create a lexer and parser to be used during preprocess-only mode. + This will be filled with tokens to parse when needed by pragma_lex (). */ +void +c_init_preprocess () +{ + gcc_assert (!the_parser); + the_parser = cp_parser_new (cp_lexer_alloc ()); +} + /* Create a new lexer whose token stream is primed with the tokens in CACHE. When these tokens are exhausted, no new tokens will be read. */ @@ -49683,11 +49692,42 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) return ret; } +/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not + populated the lexer with any tokens (the tokens rather being read by + c-ppoutput.c's machinery), so we need to read enough tokens now to handle + a pragma. */ +static void +maybe_read_tokens_for_pragma_lex () +{ + const auto lexer = the_parser->lexer; + if (!lexer->buffer->is_empty ()) + return; + + /* Arrange for the preprocessor to see the tokens we're about to read, + since it won't see them later. */ + cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token; + + /* Read the rest of the tokens comprising the pragma line. */ + cp_token *tok; + do + { + tok = vec_safe_push (lexer->buffer, cp_token ()); + cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok); + gcc_assert (tok->type != CPP_EOF); + } while (tok->type != CPP_PRAGMA_EOL); + lexer->next_token = lexer->buffer->address (); + lexer->last_token = lexer->next_token + lexer->buffer->length () - 1; + cpp_get_callbacks (parse_in)->on_token_lex = nullptr; +} + /* The interface the pragma parsers have to the lexer. */ enum cpp_ttype pragma_lex (tree *value, location_t *loc) { + if (flag_preprocess_only) + maybe_read_tokens_for_pragma_lex (); + cp_token *tok = cp_lexer_peek_token (the_parser->lexer); enum cpp_ttype ret = tok->type; @@ -49710,6 +49750,16 @@ pragma_lex (tree *value, location_t *loc) return ret; } +void +pragma_lex_discard_to_eol () +{ + /* We have already read all the tokens, so we just need to discard + them here. */ + const auto lexer = the_parser->lexer; + lexer->next_token = lexer->last_token; + lexer->buffer->truncate (0); +} + /* External interface. */ diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index aef703f8111..8b63204df0e 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -784,6 +784,10 @@ struct cpp_callbacks cpp_buffer containing the translation if translating. */ char *(*translate_include) (cpp_reader *, line_maps *, location_t, const char *path); + + /* Called when cpp_get_token() / cpp_get_token_with_location() + have produced a token. */ + void (*on_token_lex) (cpp_reader *, const cpp_token *, location_t); }; #ifdef VMS diff --git a/libcpp/macro.cc b/libcpp/macro.cc index dada8fea835..ebbc1618a71 100644 --- a/libcpp/macro.cc +++ b/libcpp/macro.cc @@ -3135,6 +3135,9 @@ cpp_get_token_1 (cpp_reader *pfile, location_t *location) } } + if (pfile->cb.on_token_lex) + pfile->cb.on_token_lex (pfile, result, + location ? *location : result->src_loc); return result; }