From patchwork Wed Jun 22 22:34:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646806 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=aVHMqqBo; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSyv51tgxz9sGp for ; Thu, 23 Jun 2022 08:37:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B92C13830678 for ; Wed, 22 Jun 2022 22:37:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B92C13830678 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937454; bh=gwlgnRywJ9ynCqSleLJPSbx8978etuv4KHvRvL1WlNA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=aVHMqqBogtE612KH/znhqUPTXtBaHmDfSHbjTmRm8wb7UupFe7k2V2BG+/rdNHtUg R7/WQYW1F+QkZav0h1BE/f93h3XMEs6KilCZLU6zzYV5NVAx2GR12xGp1ZJR2okO7q veD2lJQyaNXBgCfl0EVOTBVkDwZqHnsm6aTcaLPY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 2949F3830667 for ; Wed, 22 Jun 2022 22:34:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2949F3830667 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-0S-UAQzBNnqR_I0uuUe0uQ-1; Wed, 22 Jun 2022 18:34:48 -0400 X-MC-Unique: 0S-UAQzBNnqR_I0uuUe0uQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9AC20101E98A for ; Wed, 22 Jun 2022 22:34:48 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 79C9A1121314; Wed, 22 Jun 2022 22:34:48 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 01/12] diagnostics: add ability to associate diagnostics with rules from coding standards Date: Wed, 22 Jun 2022 18:34:36 -0400 Message-Id: <20220622223447.2462880-2-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * common.opt (fdiagnostics-show-rules): New option. * diagnostic-format-json.cc (diagnostic_output_format_init_json): Fix up context->show_rules. * diagnostic-format-sarif.cc (diagnostic_output_format_init_sarif): Likewise. * diagnostic-metadata.h (diagnostic_metadata::rule): New class. (diagnostic_metadata::precanned_rule): New class. (diagnostic_metadata::add_rule): New. (diagnostic_metadata::get_num_rules): New. (diagnostic_metadata::get_rule): New. (diagnostic_metadata::m_rules): New field. * diagnostic.cc (diagnostic_initialize): Initialize show_rules. (print_any_rules): New. (diagnostic_report_diagnostic): Call it. * diagnostic.h (diagnostic_context::show_rules): New field. * doc/invoke.texi (-fno-diagnostics-show-rules): New option. * opts.cc (common_handle_option): Handle OPT_fdiagnostics_show_rules. * toplev.cc (general_init): Set up global_dc->show_rules. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic-test-metadata.c: Expect " [STR34-C]" to be emitted at the "gets" call. * gcc.dg/plugin/diagnostic_plugin_test_metadata.c (pass_test_metadata::execute): Associate the "gets" diagnostic with a rule named "STR34-C". Signed-off-by: David Malcolm --- gcc/common.opt | 4 ++ gcc/diagnostic-format-json.cc | 1 + gcc/diagnostic-format-sarif.cc | 1 + gcc/diagnostic-metadata.h | 47 +++++++++++++++++- gcc/diagnostic.cc | 48 +++++++++++++++++++ gcc/diagnostic.h | 3 ++ gcc/doc/invoke.texi | 10 ++++ gcc/opts.cc | 4 ++ .../gcc.dg/plugin/diagnostic-test-metadata.c | 2 +- .../plugin/diagnostic_plugin_test_metadata.c | 9 +++- gcc/toplev.cc | 2 + 11 files changed, 127 insertions(+), 4 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 32917aafcae..3a842847a74 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1466,6 +1466,10 @@ fdiagnostics-show-cwe Common Var(flag_diagnostics_show_cwe) Init(1) Print CWE identifiers for diagnostic messages, where available. +fdiagnostics-show-rules +Common Var(flag_diagnostics_show_rules) Init(1) +Print any rules associated with diagnostic messages. + fdiagnostics-path-format= Common Joined RejectNegative Var(flag_diagnostics_path_format) Enum(diagnostic_path_format) Init(DPF_INLINE_EVENTS) Specify how to print any control-flow path associated with a diagnostic. diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc index 051fa6c2e48..d1d8d3f2081 100644 --- a/gcc/diagnostic-format-json.cc +++ b/gcc/diagnostic-format-json.cc @@ -345,6 +345,7 @@ diagnostic_output_format_init_json (diagnostic_context *context) /* The metadata is handled in JSON format, rather than as text. */ context->show_cwe = false; + context->show_rules = false; /* The option is handled in JSON format, rather than as text. */ context->show_option_requested = false; diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index 0c33179e8cf..a7bb9fb639d 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -1556,6 +1556,7 @@ diagnostic_output_format_init_sarif (diagnostic_context *context) /* The metadata is handled in SARIF format, rather than as text. */ context->show_cwe = false; + context->show_rules = false; /* The option is handled in SARIF format, rather than as text. */ context->show_option_requested = false; diff --git a/gcc/diagnostic-metadata.h b/gcc/diagnostic-metadata.h index ae59942c65e..80017d35fa9 100644 --- a/gcc/diagnostic-metadata.h +++ b/gcc/diagnostic-metadata.h @@ -24,19 +24,62 @@ along with GCC; see the file COPYING3. If not see /* A bundle of additional metadata that can be associated with a diagnostic. - Currently this only supports associating a CWE identifier with a - diagnostic. */ + This supports an optional CWE identifier, and zero or more + "rules". */ class diagnostic_metadata { public: + /* Abstract base class for referencing a rule that has been violated, + such as within a coding standard, or within a specification. */ + class rule + { + public: + virtual char *make_description () const = 0; + virtual char *make_url () const = 0; + }; + + /* Concrete subclass. */ + class precanned_rule : public rule + { + public: + precanned_rule (const char *desc, const char *url) + : m_desc (desc), m_url (url) + {} + + char *make_description () const final override + { + return m_desc ? xstrdup (m_desc) : NULL; + } + + char *make_url () const final override + { + return m_url ? xstrdup (m_url) : NULL; + } + + private: + const char *m_desc; + const char *m_url; + }; + diagnostic_metadata () : m_cwe (0) {} void add_cwe (int cwe) { m_cwe = cwe; } int get_cwe () const { return m_cwe; } + /* Associate R with the diagnostic. R must outlive + the metadata. */ + void add_rule (const rule &r) + { + m_rules.safe_push (&r); + } + + unsigned get_num_rules () const { return m_rules.length (); } + const rule &get_rule (unsigned idx) const { return *(m_rules[idx]); } + private: int m_cwe; + auto_vec m_rules; }; #endif /* ! GCC_DIAGNOSTIC_METADATA_H */ diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc index f2a82fff462..22f7b0b6d6e 100644 --- a/gcc/diagnostic.cc +++ b/gcc/diagnostic.cc @@ -190,6 +190,7 @@ diagnostic_initialize (diagnostic_context *context, int n_opts) for (i = 0; i < rich_location::STATICALLY_ALLOCATED_RANGES; i++) context->caret_chars[i] = '^'; context->show_cwe = false; + context->show_rules = false; context->path_format = DPF_NONE; context->show_path_depths = false; context->show_option_requested = false; @@ -1291,6 +1292,51 @@ print_any_cwe (diagnostic_context *context, } } +/* If DIAGNOSTIC has any rules associated with it, print them. + + For example, if the diagnostic metadata associates it with a rule + named "STR34-C", then " [STR34-C]" will be printed, suitably colorized, + with any URL provided by the rule. */ + +static void +print_any_rules (diagnostic_context *context, + const diagnostic_info *diagnostic) +{ + if (diagnostic->metadata == NULL) + return; + + for (unsigned idx = 0; idx < diagnostic->metadata->get_num_rules (); idx++) + { + const diagnostic_metadata::rule &rule + = diagnostic->metadata->get_rule (idx); + if (char *desc = rule.make_description ()) + { + pretty_printer *pp = context->printer; + char *saved_prefix = pp_take_prefix (context->printer); + pp_string (pp, " ["); + pp_string (pp, + colorize_start (pp_show_color (pp), + diagnostic_kind_color[diagnostic->kind])); + char *url = NULL; + if (pp->url_format != URL_FORMAT_NONE) + { + url = rule.make_url (); + if (url) + pp_begin_url (pp, url); + } + pp_string (pp, desc); + pp_set_prefix (context->printer, saved_prefix); + if (pp->url_format != URL_FORMAT_NONE) + if (url) + pp_end_url (pp); + free (url); + pp_string (pp, colorize_stop (pp_show_color (pp))); + pp_character (pp, ']'); + free (desc); + } + } +} + /* Print any metadata about the option used to control DIAGNOSTIC to CONTEXT's printer, e.g. " [-Werror=uninitialized]". Subroutine of diagnostic_report_diagnostic. */ @@ -1504,6 +1550,8 @@ diagnostic_report_diagnostic (diagnostic_context *context, pp_output_formatted_text (context->printer); if (context->show_cwe) print_any_cwe (context, diagnostic); + if (context->show_rules) + print_any_rules (context, diagnostic); if (context->show_option_requested) print_option_information (context, diagnostic, orig_diag_kind); (*diagnostic_finalizer (context)) (context, diagnostic, orig_diag_kind); diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h index 96c9a7202f9..ae6f2dfb7f4 100644 --- a/gcc/diagnostic.h +++ b/gcc/diagnostic.h @@ -227,6 +227,9 @@ struct diagnostic_context diagnostics. */ bool show_cwe; + /* True if we should print any rules associated with diagnostics. */ + bool show_rules; + /* How should diagnostic_path objects be printed. */ enum diagnostic_path_format path_format; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 50f57877477..4bd73c197e7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -305,6 +305,7 @@ Objective-C and Objective-C++ Dialects}. -fno-diagnostics-show-option -fno-diagnostics-show-caret @gol -fno-diagnostics-show-labels -fno-diagnostics-show-line-numbers @gol -fno-diagnostics-show-cwe @gol +-fno-diagnostics-show-rule @gol -fdiagnostics-minimum-margin-width=@var{width} @gol -fdiagnostics-parseable-fixits -fdiagnostics-generate-patch @gol -fdiagnostics-show-template-tree -fno-elide-type @gol @@ -5028,6 +5029,15 @@ diagnostics. GCC plugins may also provide diagnostics with such metadata. By default, if this information is present, it will be printed with the diagnostic. This option suppresses the printing of this metadata. +@item -fno-diagnostics-show-rules +@opindex fno-diagnostics-show-rules +@opindex fdiagnostics-show-rules +Diagnostic messages can optionally have rules associated with them, such +as from a coding standard, or a specification. +GCC itself does not do this for any of its diagnostics, but plugins may do so. +By default, if this information is present, it will be printed with +the diagnostic. This option suppresses the printing of this metadata. + @item -fno-diagnostics-show-line-numbers @opindex fno-diagnostics-show-line-numbers @opindex fdiagnostics-show-line-numbers diff --git a/gcc/opts.cc b/gcc/opts.cc index 959d48d173f..ef485455093 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -2872,6 +2872,10 @@ common_handle_option (struct gcc_options *opts, dc->show_cwe = value; break; + case OPT_fdiagnostics_show_rules: + dc->show_rules = value; + break; + case OPT_fdiagnostics_path_format_: dc->path_format = (enum diagnostic_path_format)value; break; diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata.c index d2babd35753..38ecf0a6d95 100644 --- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata.c +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata.c @@ -5,5 +5,5 @@ extern char *gets (char *s); void test_cwe (void) { char buf[1024]; - gets (buf); /* { dg-warning "never use 'gets' \\\[CWE-242\\\]" } */ + gets (buf); /* { dg-warning "never use 'gets' \\\[CWE-242\\\] \\\[STR34-C\\\]" } */ } diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c index 4b13afc093d..b86a8b3650e 100644 --- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c @@ -106,9 +106,16 @@ pass_test_metadata::execute (function *fun) if (gcall *call = check_for_named_call (stmt, "gets", 1)) { gcc_rich_location richloc (gimple_location (call)); - /* CWE-242: Use of Inherently Dangerous Function. */ diagnostic_metadata m; + + /* CWE-242: Use of Inherently Dangerous Function. */ m.add_cwe (242); + + /* Example of a diagnostic_metadata::rule. */ + diagnostic_metadata::precanned_rule + test_rule ("STR34-C", "https://example.com/"); + m.add_rule (test_rule); + warning_meta (&richloc, m, 0, "never use %qs", "gets"); } diff --git a/gcc/toplev.cc b/gcc/toplev.cc index 055e0642f77..a24ad5db438 100644 --- a/gcc/toplev.cc +++ b/gcc/toplev.cc @@ -1038,6 +1038,8 @@ general_init (const char *argv0, bool init_signals) = global_options_init.x_flag_diagnostics_show_line_numbers; global_dc->show_cwe = global_options_init.x_flag_diagnostics_show_cwe; + global_dc->show_rules + = global_options_init.x_flag_diagnostics_show_rules; global_dc->path_format = (enum diagnostic_path_format)global_options_init.x_flag_diagnostics_path_format; global_dc->show_path_depths From patchwork Wed Jun 22 22:34:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646808 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=pdpPKcR3; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSyxQ6mVMz9sGp for ; Thu, 23 Jun 2022 08:39:38 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DA1353830668 for ; Wed, 22 Jun 2022 22:39:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DA1353830668 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937576; bh=d1vEeh79ACL+mcuMqV8QCEEsjxMp73SRZ0xNXJx2OC0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=pdpPKcR3YPbME8sZhSm2JsDDvZ5GXLntvJMGyOmsKQ3VIK1AharA7h13KD1nxT56q eHGzuyhpCkThP7Pq4/Xqc2aOI1tADUyT839ZcUgzjT8VXxIOgPGLH5HCYu0LO8li+7 JkWlj3Om3Mt1Uet1Y3vGbztCfW7DsgjhtltxRlRQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 6B90F3830670 for ; Wed, 22 Jun 2022 22:34:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B90F3830670 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-259-f9cqTIDhNByT_gOZoK_J3Q-1; Wed, 22 Jun 2022 18:34:49 -0400 X-MC-Unique: f9cqTIDhNByT_gOZoK_J3Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D57DC101E9B0 for ; Wed, 22 Jun 2022 22:34:48 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id ACDB51121314; Wed, 22 Jun 2022 22:34:48 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 02/12] diagnostics: associate rules with plugins in SARIF output Date: Wed, 22 Jun 2022 18:34:37 -0400 Message-Id: <20220622223447.2462880-3-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * diagnostic-client-data-hooks.h (class diagnostic_client_plugin_info): Move to... * diagnostic-client-plugin.h: ...this new file. * diagnostic-format-sarif.cc: Include "diagnostic-client-plugin.h". (class sarif_tool_component): New class. (sarif_builder::m_rule_id_set): Move field to sarif_tool_component. (sarif_builder::m_rules_arr): Likewise. (sarif_builder::m_driver_obj): New field. (sarif_builder::m_extensions_arr): New field. (sarif_builder::m_plugin_objs): New field. (sarif_tool_component::sarif_tool_component): New. (sarif_tool_component::lazily_add_rule): New. (sarif_builder::sarif_builder): Update for changes to fields. (sarif_builder::get_plugin_tool_component_object): New, adapted from code within sarif_builder::make_tool_object. (maybe_get_first_rule): New. (sarif_builder::set_result_rule_id): New, adapted from code within sarif_builder::make_result_object. (sarif_builder::make_result_object): Move rule_id logic to sarif_builder::set_result_rule_id. (sarif_builder::make_reporting_descriptor_object_for_rule): New. (sarif_builder::make_tool_object): Drop "const" qualifier. Use the m_driver_obj and m_extensions_arr created in the ctor. Update the visitor to call get_plugin_tool_component_object, rather than duplicate the toolComponent creation logic. (sarif_builder::make_driver_tool_component_object): Convert return type to sarif_tool_component *. Move setting of "rules" property to sarif_tool_component ctor. * diagnostic-metadata.h (class diagnostic_client_plugin_info): New forward decl. (diagnostic_metadata::diagnostic_metadata): Add overloaded ctor. Initialize m_plugin. (diagnostic_metadata::get_plugin): New. (diagnostic_metadata::m_plugin): New field. * doc/plugins.texi (Plugin initialization): Update. * plugin.cc: Include "diagnostic-client-plugin.h". (plugin_name_args::plugin_name_args): New ctor. (plugin_name_args::get_short_name): New. (plugin_name_args::get_full_name): New. (plugin_name_args::get_version): New. (add_new_plugin): Rewrite creation of "plugin" to use new with a ctor. * plugin.h: Include "diagnostic-client-plugin.h". (struct plugin_name_args): Convert to... (class plugin_name_args): ...this, deriving it from diagnostic_client_plugin_info. (plugin_name_args::plugin_name_args): New ctor decl. (plugin_name_args::get_short_name): New decl. (plugin_name_args::get_full_name): New decl. (plugin_name_args::get_version): New decl. * tree-diagnostic-client-data-hooks.cc: Include "diagnostic-client-plugin.h". (class compiler_diagnostic_client_plugin_info): Delete. (compiler_version_info::on_plugin_cb): Pass plugin_name_args to the visitor, now that the former is a diagnostic_client_plugin_info. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic-test-metadata-sarif.c: New test. * gcc.dg/plugin/diagnostic_plugin_test_metadata.c (diag_plugin_info): New global. (pass_test_metadata::execute): Pass it to metadata. (plugin_init): Initialize it. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add diagnostic-test-metadata-sarif.c to diagnostic_plugin_test_metadata.c. Signed-off-by: David Malcolm --- gcc/diagnostic-client-data-hooks.h | 18 -- gcc/diagnostic-client-plugin.h | 43 +++ gcc/diagnostic-format-sarif.cc | 283 +++++++++++++----- gcc/diagnostic-metadata.h | 22 +- gcc/doc/plugins.texi | 17 +- gcc/plugin.cc | 36 ++- gcc/plugin.h | 12 +- .../plugin/diagnostic-test-metadata-sarif.c | 39 +++ .../plugin/diagnostic_plugin_test_metadata.c | 6 +- gcc/testsuite/gcc.dg/plugin/plugin.exp | 4 +- gcc/tree-diagnostic-client-data-hooks.cc | 36 +-- 11 files changed, 382 insertions(+), 134 deletions(-) create mode 100644 gcc/diagnostic-client-plugin.h create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c diff --git a/gcc/diagnostic-client-data-hooks.h b/gcc/diagnostic-client-data-hooks.h index ba78546abeb..03464202fe1 100644 --- a/gcc/diagnostic-client-data-hooks.h +++ b/gcc/diagnostic-client-data-hooks.h @@ -84,22 +84,4 @@ public: virtual void for_each_plugin (plugin_visitor &v) const = 0; }; -/* Abstract base class for a diagnostic_context to get at - information about a specific plugin within a client. */ - -class diagnostic_client_plugin_info -{ -public: - /* For use e.g. by SARIF "name" property (SARIF v2.1.0 section 3.19.8). */ - virtual const char *get_short_name () const = 0; - - /* For use e.g. by SARIF "fullName" property - (SARIF v2.1.0 section 3.19.9). */ - virtual const char *get_full_name () const = 0; - - /* For use e.g. by SARIF "version" property - (SARIF v2.1.0 section 3.19.13). */ - virtual const char *get_version () const = 0; -}; - #endif /* ! GCC_DIAGNOSTIC_CLIENT_DATA_HOOKS_H */ diff --git a/gcc/diagnostic-client-plugin.h b/gcc/diagnostic-client-plugin.h new file mode 100644 index 00000000000..b0e266fac1b --- /dev/null +++ b/gcc/diagnostic-client-plugin.h @@ -0,0 +1,43 @@ +/* Metadata about plugins within a diagnostic client. + Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by David Malcolm + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_DIAGNOSTIC_CLIENT_PLUGIN_H +#define GCC_DIAGNOSTIC_CLIENT_PLUGIN_H + +/* Abstract base class for a diagnostic_context to get at + information about a specific plugin within a client, + and for associating plugins with diagnostic metadata. */ + +class diagnostic_client_plugin_info +{ +public: + /* For use e.g. by SARIF "name" property (SARIF v2.1.0 section 3.19.8). */ + virtual const char *get_short_name () const = 0; + + /* For use e.g. by SARIF "fullName" property + (SARIF v2.1.0 section 3.19.9). */ + virtual const char *get_full_name () const = 0; + + /* For use e.g. by SARIF "version" property + (SARIF v2.1.0 section 3.19.13). */ + virtual const char *get_version () const = 0; +}; + +#endif /* ! GCC_DIAGNOSTIC_CLIENT_PLUGIN_H */ diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index a7bb9fb639d..a409abf648b 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -29,9 +29,26 @@ along with GCC; see the file COPYING3. If not see #include "cpplib.h" #include "logical-location.h" #include "diagnostic-client-data-hooks.h" +#include "diagnostic-client-plugin.h" class sarif_builder; +/* Subclass of json::object for SARIF toolComponent objects + (SARIF v2.1.0 section section 3.19), used for the driver + and for extensions. */ + +class sarif_tool_component : public json::object +{ +public: + sarif_tool_component (); + + void lazily_add_rule (char *id, json::object *reporting_desc_obj); + +private: + hash_set m_rule_id_set; + json::array *m_rules_arr; +}; + /* Subclass of json::object for SARIF result objects (SARIF v2.1.0 section 3.27. */ @@ -110,6 +127,13 @@ public: json::object *make_message_object (const char *msg) const; private: + sarif_tool_component * + get_plugin_tool_component_object + (const diagnostic_client_plugin_info *plugin); + void set_result_rule_id (diagnostic_context *context, + diagnostic_info *diagnostic, + diagnostic_t orig_diag_kind, + sarif_result *result_obj); sarif_result *make_result_object (diagnostic_context *context, diagnostic_info *diagnostic, diagnostic_t orig_diag_kind); @@ -133,12 +157,16 @@ private: json::object *make_multiformat_message_string (const char *msg) const; json::object *make_top_level_object (json::array *results); json::object *make_run_object (json::array *results); - json::object *make_tool_object () const; - json::object *make_driver_tool_component_object () const; + json::object *make_tool_object (); + sarif_tool_component *make_driver_tool_component_object () const; json::array *maybe_make_taxonomies_array () const; json::object *maybe_make_cwe_taxonomy_object () const; json::object *make_tool_component_reference_object_for_cwe () const; json::object * + make_reporting_descriptor_object_for_rule + (const diagnostic_metadata::rule &rule, + const char *desc); + json::object * make_reporting_descriptor_object_for_warning (diagnostic_context *context, diagnostic_info *diagnostic, diagnostic_t orig_diag_kind, @@ -168,8 +196,11 @@ private: hash_set m_filenames; bool m_seen_any_relative_paths; - hash_set m_rule_id_set; - json::array *m_rules_arr; + + sarif_tool_component *m_driver_obj; + json::array *m_extensions_arr; + hash_map m_plugin_objs; /* The set of all CWE IDs we've seen, if any. */ hash_set > m_cwe_id_set; @@ -179,6 +210,39 @@ private: static sarif_builder *the_builder; +/* class sarif_tool_component : public json::object. */ + +sarif_tool_component::sarif_tool_component () +: m_rule_id_set (), + m_rules_arr (new json::array ()) +{ + /* "rules" property (SARIF v2.1.0 section 3.19.23). */ + set ("rules", m_rules_arr); +} + +/* Take ownership of ID and REPORTING_DESC_OBJ. + Ensure that ID is represented within the rules array, + using REPORTING_DESC_OBJ if it isn't. */ + +void +sarif_tool_component::lazily_add_rule (char *id, + json::object *reporting_desc_obj) +{ + if (m_rule_id_set.contains (id)) + { + /* Already seen; clean up redundant entries. */ + free (id); + delete reporting_desc_obj; + } + else + { + /* This is the first time we've seen this ruleId. */ + /* Add to set, taking ownership. */ + m_rule_id_set.add (id); + m_rules_arr->append (reporting_desc_obj); + } +} + /* class sarif_result : public json::object. */ /* Handle secondary diagnostics that occur within a diagnostic group. @@ -221,10 +285,12 @@ sarif_builder::sarif_builder (diagnostic_context *context) m_results_array (new json::array ()), m_cur_group_result (NULL), m_seen_any_relative_paths (false), - m_rule_id_set (), - m_rules_arr (new json::array ()), + m_driver_obj (NULL), + m_extensions_arr (NULL), m_tabstop (context->tabstop) { + m_driver_obj = make_driver_tool_component_object (); + m_extensions_arr = new json::array (); } /* Implementation of "end_diagnostic" for SARIF output. */ @@ -320,51 +386,120 @@ make_rule_id_for_diagnostic_kind (diagnostic_t diag_kind) return rstrip; } -/* Make a result object (SARIF v2.1.0 section 3.27) for DIAGNOSTIC. */ +/* Get or create a toolComponent object (SARIF v2.1.0 section 3.19) + for PLUGIN, adding to m_extensions_arr and using m_plugin_objs to + reuse any existing object for PLUGIN. */ -sarif_result * -sarif_builder::make_result_object (diagnostic_context *context, +sarif_tool_component * +sarif_builder:: +get_plugin_tool_component_object (const diagnostic_client_plugin_info *plugin) +{ + if (sarif_tool_component **slot = m_plugin_objs.get (plugin)) + return *slot; + + sarif_tool_component *plugin_obj = new sarif_tool_component (); + m_plugin_objs.put (plugin, plugin_obj); + m_extensions_arr->append (plugin_obj); + + /* "name" property (SARIF v2.1.0 section 3.19.8). */ + if (const char *short_name = plugin->get_short_name ()) + plugin_obj->set ("name", new json::string (short_name)); + + /* "fullName" property (SARIF v2.1.0 section 3.19.9). */ + if (const char *full_name = plugin->get_full_name ()) + plugin_obj->set ("fullName", new json::string (full_name)); + + /* "version" property (SARIF v2.1.0 section 3.19.13). */ + if (const char *version = plugin->get_version ()) + plugin_obj->set ("version", new json::string (version)); + + return plugin_obj; +} + +/* If DIAGNOSTIC has any associated rules, get the first one. */ + +static const diagnostic_metadata::rule * +maybe_get_first_rule (diagnostic_info *diagnostic) +{ + if (!diagnostic->metadata) + return NULL; + if (diagnostic->metadata->get_num_rules () == 0) + return NULL; + return &diagnostic->metadata->get_rule (0); +} + +/* Set "ruleId" property of RESULT_OBJ (SARIF v2.1.0 section 3.27.5). + Ensure that there is such a rule within the component, and that such + a component exists (either for the driver, or any plugin). */ + +void +sarif_builder::set_result_rule_id (diagnostic_context *context, diagnostic_info *diagnostic, - diagnostic_t orig_diag_kind) + diagnostic_t orig_diag_kind, + sarif_result *result_obj) { - sarif_result *result_obj = new sarif_result (); + /* Determine which component we should search for within/add the + rule to. */ + sarif_tool_component *component_obj = m_driver_obj; + if (diagnostic->metadata) + if (const diagnostic_client_plugin_info *plugin + = diagnostic->metadata->get_plugin ()) + component_obj = get_plugin_tool_component_object (plugin); /* "ruleId" property (SARIF v2.1.0 section 3.27.5). */ - /* Ideally we'd have an option_name for these. */ + if (const diagnostic_metadata::rule *rule + = maybe_get_first_rule (diagnostic)) + if (char *desc = rule->make_description ()) + { + /* Lazily create reportingDescriptor objects for the rule + and add to the component. + Set ruleId referencing them. */ + result_obj->set ("ruleId", new json::string (desc)); + component_obj->lazily_add_rule + (desc, + make_reporting_descriptor_object_for_rule (*rule, desc)); + return; + } + + /* Otherwise, try to use the option_name for these. */ if (char *option_text = context->option_name (context, diagnostic->option_index, orig_diag_kind, diagnostic->kind)) { - /* Lazily create reportingDescriptor objects for and add to m_rules_arr. + /* Lazily create reportingDescriptor objects for the warning + and add to the component. Set ruleId referencing them. */ result_obj->set ("ruleId", new json::string (option_text)); - if (m_rule_id_set.contains (option_text)) - free (option_text); - else - { - /* This is the first time we've seen this ruleId. */ - /* Add to set, taking ownership. */ - m_rule_id_set.add (option_text); - - json::object *reporting_desc_obj - = make_reporting_descriptor_object_for_warning (context, - diagnostic, - orig_diag_kind, - option_text); - m_rules_arr->append (reporting_desc_obj); - } - } - else - { - /* Otherwise, we have an "error" or a stray "note"; use the - diagnostic kind as the ruleId, so that the result object at least - has a ruleId. - We don't bother creating reportingDescriptor objects for these. */ - char *rule_id = make_rule_id_for_diagnostic_kind (orig_diag_kind); - result_obj->set ("ruleId", new json::string (rule_id)); - free (rule_id); + component_obj->lazily_add_rule + (option_text, + make_reporting_descriptor_object_for_warning (context, + diagnostic, + orig_diag_kind, + option_text)); + return; } + /* Otherwise, we have a "warning" without an option, an "error" or a stray + "note"; use the diagnostic kind as the ruleId, so that the result object + at least has a ruleId. + We don't bother creating reportingDescriptor objects for these. */ + char *rule_id = make_rule_id_for_diagnostic_kind (orig_diag_kind); + result_obj->set ("ruleId", new json::string (rule_id)); + free (rule_id); +} + +/* Make a result object (SARIF v2.1.0 section 3.27) for DIAGNOSTIC. */ + +sarif_result * +sarif_builder::make_result_object (diagnostic_context *context, + diagnostic_info *diagnostic, + diagnostic_t orig_diag_kind) +{ + sarif_result *result_obj = new sarif_result (); + + /* "ruleId" property (SARIF v2.1.0 section 3.27.5). */ + set_result_rule_id (context, diagnostic, orig_diag_kind, result_obj); + /* "taxa" property (SARIF v2.1.0 section 3.27.8). */ if (diagnostic->metadata) if (int cwe_id = diagnostic->metadata->get_cwe ()) @@ -424,6 +559,32 @@ sarif_builder::make_result_object (diagnostic_context *context, return result_obj; } +/* Make a reportingDescriptor object (SARIF v2.1.0 section 3.49) + for RULE, with DESC. */ + +json::object * +sarif_builder:: +make_reporting_descriptor_object_for_rule (const diagnostic_metadata::rule &rule, + const char *desc) +{ + json::object *reporting_desc = new json::object (); + + /* "id" property (SARIF v2.1.0 section 3.49.3). */ + reporting_desc->set ("id", new json::string (desc)); + + /* We don't implement "name" property (SARIF v2.1.0 section 3.49.7), since + it seems redundant compared to "id". */ + + /* "helpUri" property (SARIF v2.1.0 section 3.49.12). */ + if (char *url = rule.make_url ()) + { + reporting_desc->set ("helpUri", new json::string (url)); + free (url); + } + + return reporting_desc; +} + /* Make a reportingDescriptor object (SARIF v2.1.0 section 3.49) for a GCC warning. */ @@ -1089,16 +1250,16 @@ sarif_builder::make_run_object (json::array *results) /* Make a tool object (SARIF v2.1.0 section 3.18). */ json::object * -sarif_builder::make_tool_object () const +sarif_builder::make_tool_object () { json::object *tool_obj = new json::object (); /* "driver" property (SARIF v2.1.0 section 3.18.2). */ - json::object *driver_obj = make_driver_tool_component_object (); - tool_obj->set ("driver", driver_obj); + tool_obj->set ("driver", m_driver_obj); /* Report plugins via the "extensions" property (SARIF v2.1.0 section 3.18.3). */ + tool_obj->set ("extensions", m_extensions_arr); if (m_context->m_client_data_hooks) if (const client_version_info *vinfo = m_context->m_client_data_hooks->get_any_version_info ()) @@ -1106,36 +1267,19 @@ sarif_builder::make_tool_object () const class my_plugin_visitor : public client_version_info :: plugin_visitor { public: + my_plugin_visitor (sarif_builder *builder) + : m_builder (builder) + {} + void on_plugin (const diagnostic_client_plugin_info &p) final override { - /* Create a toolComponent object (SARIF v2.1.0 section 3.19) - for the plugin. */ - json::object *plugin_obj = new json::object (); - m_plugin_objs.safe_push (plugin_obj); - - /* "name" property (SARIF v2.1.0 section 3.19.8). */ - if (const char *short_name = p.get_short_name ()) - plugin_obj->set ("name", new json::string (short_name)); - - /* "fullName" property (SARIF v2.1.0 section 3.19.9). */ - if (const char *full_name = p.get_full_name ()) - plugin_obj->set ("fullName", new json::string (full_name)); - - /* "version" property (SARIF v2.1.0 section 3.19.13). */ - if (const char *version = p.get_version ()) - plugin_obj->set ("version", new json::string (version)); + m_builder->get_plugin_tool_component_object (&p); } - auto_vec m_plugin_objs; + private: + sarif_builder *m_builder; }; - my_plugin_visitor v; + my_plugin_visitor v (this); vinfo->for_each_plugin (v); - if (v.m_plugin_objs.length () > 0) - { - json::array *extensions_arr = new json::array (); - tool_obj->set ("extensions", extensions_arr); - for (auto iter : v.m_plugin_objs) - extensions_arr->append (iter); - } } /* Perhaps we could also show GMP, MPFR, MPC, isl versions as other @@ -1147,10 +1291,10 @@ sarif_builder::make_tool_object () const /* Make a toolComponent object (SARIF v2.1.0 section 3.19) for what SARIF calls the "driver" (see SARIF v2.1.0 section 3.18.1). */ -json::object * +sarif_tool_component * sarif_builder::make_driver_tool_component_object () const { - json::object *driver_obj = new json::object (); + sarif_tool_component *driver_obj = new sarif_tool_component (); if (m_context->m_client_data_hooks) if (const client_version_info *vinfo @@ -1179,9 +1323,6 @@ sarif_builder::make_driver_tool_component_object () const } } - /* "rules" property (SARIF v2.1.0 section 3.19.23). */ - driver_obj->set ("rules", m_rules_arr); - return driver_obj; } diff --git a/gcc/diagnostic-metadata.h b/gcc/diagnostic-metadata.h index 80017d35fa9..dce6763a1d9 100644 --- a/gcc/diagnostic-metadata.h +++ b/gcc/diagnostic-metadata.h @@ -21,11 +21,16 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_DIAGNOSTIC_METADATA_H #define GCC_DIAGNOSTIC_METADATA_H +class diagnostic_client_plugin_info; + /* A bundle of additional metadata that can be associated with a diagnostic. - This supports an optional CWE identifier, and zero or more - "rules". */ + This supports: + - an optional plugin associated with this diagnostic + - an optional CWE identifier + - zero or more "rules" (such as rules within a coding standard, + or within a specification). */ class diagnostic_metadata { @@ -62,7 +67,15 @@ class diagnostic_metadata const char *m_url; }; - diagnostic_metadata () : m_cwe (0) {} + /* Ctors. */ + diagnostic_metadata () + : m_plugin (NULL), + m_cwe (0) + {} + diagnostic_metadata (const diagnostic_client_plugin_info *plugin) + : m_plugin (plugin), + m_cwe (0) + {} void add_cwe (int cwe) { m_cwe = cwe; } int get_cwe () const { return m_cwe; } @@ -74,10 +87,13 @@ class diagnostic_metadata m_rules.safe_push (&r); } + const diagnostic_client_plugin_info *get_plugin () const { return m_plugin; } + unsigned get_num_rules () const { return m_rules.length (); } const rule &get_rule (unsigned idx) const { return *(m_rules[idx]); } private: + const diagnostic_client_plugin_info *m_plugin; int m_cwe; auto_vec m_rules; }; diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi index 6d1a5fa7607..a1713901314 100644 --- a/gcc/doc/plugins.texi +++ b/gcc/doc/plugins.texi @@ -102,11 +102,20 @@ the parser. The arguments to @code{plugin_init} are: @item @code{version}: GCC version. @end itemize -The @code{plugin_info} struct is defined as follows: +The @code{plugin_info} class is defined as follows: @smallexample -struct plugin_name_args +class plugin_name_args : public diagnostic_client_plugin_info @{ + public: + plugin_name_args (char *base_name_, + const char *full_name_); + + /* Implementation of diagnostic_client_plugin_info. */ + const char *get_short_name () const final override; + const char *get_full_name () const final override; + const char *get_version () const final override; + char *base_name; /* Short name of the plugin (filename without .so suffix). */ const char *full_name; /* Path to the plugin as specified with @@ -149,7 +158,7 @@ recommended version check to perform looks like ... int -plugin_init (struct plugin_name_args *plugin_info, +plugin_init (plugin_name_args *plugin_info, struct plugin_gcc_version *version) @{ if (!plugin_default_version_check (version, &gcc_version)) @@ -295,7 +304,7 @@ struct register_pass_info /* Sample plugin code that registers a new pass. */ int -plugin_init (struct plugin_name_args *plugin_info, +plugin_init (plugin_name_args *plugin_info, struct plugin_gcc_version *version) @{ struct register_pass_info pass_info; diff --git a/gcc/plugin.cc b/gcc/plugin.cc index 6c42e057cbc..19cc74867d3 100644 --- a/gcc/plugin.cc +++ b/gcc/plugin.cc @@ -26,6 +26,7 @@ along with GCC; see the file COPYING3. If not see #include "options.h" #include "tree-pass.h" #include "diagnostic-core.h" +#include "diagnostic-client-plugin.h" #include "flags.h" #include "intl.h" #include "plugin.h" @@ -124,6 +125,39 @@ static const char *str_plugin_init_func_name = "plugin_init"; static const char *str_license = "plugin_is_GPL_compatible"; #endif +/* class plugin_name_args : public diagnostic_client_plugin_info. */ + +/* plugin_name_args's ctor. */ + +plugin_name_args::plugin_name_args (char *base_name_, + const char *full_name_) +: base_name (base_name_), + full_name (full_name_), + argc (0), + argv (NULL), + version (NULL), + help (NULL) +{ +} + +const char * +plugin_name_args::get_short_name () const +{ + return base_name; +} + +const char * +plugin_name_args::get_full_name () const +{ + return full_name; +} + +const char * +plugin_name_args::get_version () const +{ + return version; +} + /* Helper function for hashing the base_name of the plugin_name_args structure to be inserted into the hash table. */ @@ -236,7 +270,7 @@ add_new_plugin (const char* plugin_name) return; } - plugin = XCNEW (struct plugin_name_args); + plugin = new plugin_name_args (base_name, plugin_name); plugin->base_name = base_name; plugin->full_name = plugin_name; diff --git a/gcc/plugin.h b/gcc/plugin.h index e7e8b51d15a..f5d22a8f53b 100644 --- a/gcc/plugin.h +++ b/gcc/plugin.h @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see #define PLUGIN_H #include "highlev-plugin-common.h" +#include "diagnostic-client-plugin.h" /* Event names. */ enum plugin_event @@ -65,8 +66,17 @@ struct plugin_gcc_version }; /* Object that keeps track of the plugin name and its arguments. */ -struct plugin_name_args +class plugin_name_args : public diagnostic_client_plugin_info { + public: + plugin_name_args (char *base_name_, + const char *full_name_); + + /* Implementation of diagnostic_client_plugin_info. */ + const char *get_short_name () const final override; + const char *get_full_name () const final override; + const char *get_version () const final override; + char *base_name; /* Short name of the plugin (filename without .so suffix). */ const char *full_name; /* Path to the plugin as specified with diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c new file mode 100644 index 00000000000..ac8f4ba2d83 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-fdiagnostics-format=sarif-file" } */ + +extern char *gets (char *s); + +void test_cwe (void) +{ + char buf[1024]; + gets (buf); +} + +/* Verify that some JSON was written to a file with the expected name. */ + +/* We expect various properties. + The indentation here reflects the expected hierarchy, though these tests + don't check for that, merely the string fragments we expect. + { dg-final { scan-sarif-file "\"version\": \"2.1.0\"" } } + { dg-final { scan-sarif-file "\"runs\": \\\[" } } + { dg-final { scan-sarif-file "\"tool\": " } } + { dg-final { scan-sarif-file "\"driver\": " } } + { dg-final { scan-sarif-file "\"name\": \"GNU C" } } + + { dg-final { scan-sarif-file "\"rules\": \\\[" } } + { dg-final { scan-sarif-file "\"id\": \"STR34-C\"" } } + { dg-final { scan-sarif-file "\"helpUri\": \"https://example.com/\"" } } + + Ideally we would verify that the above rule is within the extension, + rather than the driver. Unfortunately we don't have a way to + do this at present. + + { dg-final { scan-sarif-file "\"extensions\": \\\[" } } + { dg-final { scan-sarif-file "\"name\": \"diagnostic_plugin_test_metadata\"" } } + { dg-final { scan-sarif-file "\"results\": \\\[" } } + { dg-final { scan-sarif-file "\"level\": \"warning\"" } } + { dg-final { scan-sarif-file "\"message\": " } } + { dg-final { scan-sarif-file "\"text\": \"never use 'gets'\"" } } + { dg-final { scan-sarif-file "\"taxa\": \\\[" } } + { dg-final { scan-sarif-file "\"id\": \"242\"" } } +*/ diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c index b86a8b3650e..b2a86e1bc68 100644 --- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_metadata.c @@ -32,6 +32,8 @@ int plugin_is_GPL_compatible; +static diagnostic_client_plugin_info *diag_plugin_info; + const pass_data pass_data_test_metadata = { GIMPLE_PASS, /* type */ @@ -106,7 +108,7 @@ pass_test_metadata::execute (function *fun) if (gcall *call = check_for_named_call (stmt, "gets", 1)) { gcc_rich_location richloc (gimple_location (call)); - diagnostic_metadata m; + diagnostic_metadata m (diag_plugin_info); /* CWE-242: Use of Inherently Dangerous Function. */ m.add_cwe (242); @@ -136,6 +138,8 @@ plugin_init (struct plugin_name_args *plugin_info, if (!plugin_default_version_check (version, &gcc_version)) return 1; + diag_plugin_info = plugin_info; + pass_info.pass = new pass_test_metadata (g); pass_info.reference_pass_name = "ssa"; pass_info.ref_pass_instance_number = 1; diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp index 63b117d3cde..2244f52211d 100644 --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp @@ -96,7 +96,9 @@ set plugin_test_list [list \ diagnostic-test-inlining-2.c \ diagnostic-test-inlining-3.c \ diagnostic-test-inlining-4.c } \ - { diagnostic_plugin_test_metadata.c diagnostic-test-metadata.c } \ + { diagnostic_plugin_test_metadata.c \ + diagnostic-test-metadata.c \ + diagnostic-test-metadata-sarif.c } \ { diagnostic_plugin_test_paths.c \ diagnostic-test-paths-1.c \ diagnostic-test-paths-2.c \ diff --git a/gcc/tree-diagnostic-client-data-hooks.cc b/gcc/tree-diagnostic-client-data-hooks.cc index f8ff271d2f5..c58df1fe70c 100644 --- a/gcc/tree-diagnostic-client-data-hooks.cc +++ b/gcc/tree-diagnostic-client-data-hooks.cc @@ -27,41 +27,10 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "tree-logical-location.h" #include "diagnostic-client-data-hooks.h" +#include "diagnostic-client-plugin.h" #include "langhooks.h" #include "plugin.h" -/* Concrete class for supplying a diagnostic_context with information - about a specific plugin within the client, when the client is the - compiler (i.e. a GCC plugin). */ - -class compiler_diagnostic_client_plugin_info - : public diagnostic_client_plugin_info -{ -public: - compiler_diagnostic_client_plugin_info (const plugin_name_args *args) - : m_args (args) - { - } - - const char *get_short_name () const final override - { - return m_args->base_name; - } - - const char *get_full_name () const final override - { - return m_args->full_name; - } - - const char *get_version () const final override - { - return m_args->version; - } - -private: - const plugin_name_args *m_args; -}; - /* Concrete subclass of client_version_info for use by compilers proper, (i.e. using lang_hooks, and with knowledge of GCC plugins). */ @@ -103,10 +72,9 @@ private: on_plugin_cb (const plugin_name_args *args, void *user_data) { - compiler_diagnostic_client_plugin_info cpi (args); client_version_info::plugin_visitor *visitor = (client_version_info::plugin_visitor *)user_data; - visitor->on_plugin (cpi); + visitor->on_plugin (*args); } }; From patchwork Wed Jun 22 22:34:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646804 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=nN1eu5nT; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSyrw4PqBz9sGp for ; Thu, 23 Jun 2022 08:35:43 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CAAC383067B for ; Wed, 22 Jun 2022 22:35:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9CAAC383067B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937340; bh=d3bEMBUMl1+fYz5nb5evTderVNcCEvQ36ULMgE7W5WA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=nN1eu5nTpKQCAeltLMHc3S8AlOk4Fgyof9il60bRX5FKSBE/MQH8GImjfFMvo6QWA n62lvro93ZKSxYjj1857KfJ2F9QVUT2NKhknxAt17qn+SlMO4caMfeJht/8YQeXyYJ mvrVbH0lqvkYFLT3QqTu2KLyOJmvOFViVZxz+zF4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id AD2CD3830666 for ; Wed, 22 Jun 2022 22:34:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AD2CD3830666 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-187-sk5s_WgLP2CLKWfNXcZ9qQ-1; Wed, 22 Jun 2022 18:34:49 -0400 X-MC-Unique: sk5s_WgLP2CLKWfNXcZ9qQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 113D98032EC for ; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id E26921121314; Wed, 22 Jun 2022 22:34:48 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 03/12] Add more emit_diagnostic overloads Date: Wed, 22 Jun 2022 18:34:38 -0400 Message-Id: <20220622223447.2462880-4-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * diagnostic-core.h (emit_diagnostic): New overload. (emit_diagnostic_valist): New overload. * diagnostic.cc (emit_diagnostic): New overload. (emit_diagnostic_valist): New overload. Signed-off-by: David Malcolm --- gcc/diagnostic-core.h | 7 +++++++ gcc/diagnostic.cc | 25 +++++++++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h index 286954ac2f8..53f39a110e4 100644 --- a/gcc/diagnostic-core.h +++ b/gcc/diagnostic-core.h @@ -114,8 +114,15 @@ extern bool emit_diagnostic (diagnostic_t, location_t, int, const char *, ...) ATTRIBUTE_GCC_DIAG(4,5); extern bool emit_diagnostic (diagnostic_t, rich_location *, int, const char *, ...) ATTRIBUTE_GCC_DIAG(4,5); +extern bool emit_diagnostic (diagnostic_t, rich_location *, + const diagnostic_metadata *, int, + const char *, ...) ATTRIBUTE_GCC_DIAG(5,6); extern bool emit_diagnostic_valist (diagnostic_t, location_t, int, const char *, va_list *) ATTRIBUTE_GCC_DIAG (4,0); +extern bool emit_diagnostic_valist (diagnostic_t, rich_location *, + const diagnostic_metadata *, int, + const char *, va_list *) + ATTRIBUTE_GCC_DIAG (5,0); extern bool seen_error (void); #ifdef BUFSIZ diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc index 22f7b0b6d6e..8099f585bfb 100644 --- a/gcc/diagnostic.cc +++ b/gcc/diagnostic.cc @@ -1769,6 +1769,21 @@ emit_diagnostic (diagnostic_t kind, rich_location *richloc, int opt, return ret; } +/* As above, but with optional metadata. */ + +bool +emit_diagnostic (diagnostic_t kind, rich_location *richloc, + const diagnostic_metadata *metadata, int opt, + const char *gmsgid, ...) +{ + auto_diagnostic_group d; + va_list ap; + va_start (ap, gmsgid); + bool ret = diagnostic_impl (richloc, metadata, opt, gmsgid, &ap, kind); + va_end (ap); + return ret; +} + /* Wrapper around diagnostic_impl taking a va_list parameter. */ bool @@ -1779,6 +1794,16 @@ emit_diagnostic_valist (diagnostic_t kind, location_t location, int opt, return diagnostic_impl (&richloc, NULL, opt, gmsgid, ap, kind); } +/* As above, but with optional metadata. */ + +bool +emit_diagnostic_valist (diagnostic_t kind, rich_location *richloc, + const diagnostic_metadata *metadata, int opt, + const char *gmsgid, va_list *ap) +{ + return diagnostic_impl (richloc, metadata, opt, gmsgid, ap, kind); +} + /* An informative note at LOCATION. Use this for additional details on an error message. */ void From patchwork Wed Jun 22 22:34:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646814 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=odJ+FPPw; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSz4D5sqFz9sGq for ; Thu, 23 Jun 2022 08:45:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 45805388457F for ; Wed, 22 Jun 2022 22:45:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 45805388457F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937930; bh=aXBPrSP32TN4C0ViRcpHLBiWPbR1dF5mWLp6vd/DLZU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=odJ+FPPwYovj79DOfkhNgtph/OQXHb3N02N09wlWb+hy7J8r7QJRbF4aAOLldiDe0 vjplCj/yrLZppdb+bEw7/EmfgyDZH3cLfN33l+6UISPj7CA0OPLp45pxACNosimCZ7 k8TuzSxXz7q+dfrMspIV9ZPp3PqUG3xgRKZ/IkmM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 23B1B3830677 for ; Wed, 22 Jun 2022 22:34:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 23B1B3830677 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-327-t_uzHRLPOLSr8LO517YmHA-1; Wed, 22 Jun 2022 18:34:49 -0400 X-MC-Unique: t_uzHRLPOLSr8LO517YmHA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5A302294EDC4 for ; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2479A1121314; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 04/12] json: add json parsing support Date: Wed, 22 Jun 2022 18:34:39 -0400 Message-Id: <20220622223447.2462880-5-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-25.6 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements JSON parsing support. It's based on the parsing parts of the patch I posted here: https://gcc.gnu.org/legacy-ml/gcc-patches/2017-08/msg00417.html with the parsing moved to a separate source file and header, and heavily rewritten to capture source location information for JSON values. I also added optional support for C and C++ style comments, which is extremely useful in DejaGnu tests. gcc/ChangeLog: * Makefile.in (OBJS-libcommon): Add json-parsing.o. * json-parsing.cc: New file. * json-parsing.h: New file. * json.cc (selftest::assert_print_eq): Remove "static". * json.h (json::array::begin): New. (json::array::end): New. (json::array::length): New. (json::array::get): New. (is_a_helper ::test): New. (is_a_helper ::test): New. (is_a_helper ::test): New. (is_a_helper ::test): New. (is_a_helper ::test): New. (selftest::assert_print_eq): New. * selftest-run-tests.cc (selftest::run_tests): Call selftest::json_parser_cc_tests. * selftest.h (selftest::json_parser_cc_tests): New decl. Signed-off-by: David Malcolm --- gcc/Makefile.in | 2 +- gcc/json-parsing.cc | 2391 +++++++++++++++++++++++++++++++++++++ gcc/json-parsing.h | 94 ++ gcc/json.cc | 2 +- gcc/json.h | 59 +- gcc/selftest-run-tests.cc | 1 + gcc/selftest.h | 1 + 7 files changed, 2546 insertions(+), 4 deletions(-) create mode 100644 gcc/json-parsing.cc create mode 100644 gcc/json-parsing.h diff --git a/gcc/Makefile.in b/gcc/Makefile.in index b6dcc45a58a..acdb608f393 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1735,7 +1735,7 @@ OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \ diagnostic-show-locus.o \ edit-context.o \ pretty-print.o intl.o \ - json.o \ + json.o json-parsing.o \ sbitmap.o \ vec.o input.o hash-table.o ggc-none.o memory-block.o \ selftest.o selftest-diagnostic.o sort.o diff --git a/gcc/json-parsing.cc b/gcc/json-parsing.cc new file mode 100644 index 00000000000..a8b45bdca33 --- /dev/null +++ b/gcc/json-parsing.cc @@ -0,0 +1,2391 @@ +/* JSON parsing + Copyright (C) 2017-2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "json-parsing.h" +#include "pretty-print.h" +#include "math.h" +#include "selftest.h" + +using namespace json; + +/* Declarations relating to parsing JSON, all within an + anonymous namespace. */ + +namespace { + +/* A typedef representing a single unicode character. */ + +typedef unsigned unichar; + +/* An enum for discriminating different kinds of JSON token. */ + +enum token_id +{ + TOK_ERROR, + + TOK_EOF, + + /* Punctuation. */ + TOK_OPEN_SQUARE, + TOK_OPEN_CURLY, + TOK_CLOSE_SQUARE, + TOK_CLOSE_CURLY, + TOK_COLON, + TOK_COMMA, + + /* Literal names. */ + TOK_TRUE, + TOK_FALSE, + TOK_NULL, + + TOK_STRING, + TOK_FLOAT_NUMBER, + TOK_INTEGER_NUMBER +}; + +/* Human-readable descriptions of enum token_id. */ + +static const char *token_id_name[] = { + "error", + "EOF", + "'['", + "'{'", + "']'", + "'}'", + "':'", + "','", + "'true'", + "'false'", + "'null'", + "string", + "number", + "number" +}; + +/* Tokens within the JSON lexer. */ + +struct token +{ + /* The kind of token. */ + enum token_id id; + + /* The location of this token within the unicode + character stream. */ + location_map::range range; + + union + { + /* Value for TOK_ERROR and TOK_STRING. */ + char *string; + + /* Value for TOK_FLOAT_NUMBER. */ + double float_number; + + /* Value for TOK_INTEGER_NUMBER. */ + long integer_number; + } u; +}; + +/* A class for lexing JSON. */ + +class lexer +{ + public: + lexer (bool support_comments); + ~lexer (); + bool add_utf8 (size_t length, const char *utf8_buf, error **err_out); + + const token *peek (); + void consume (); + + private: + bool get_char (unichar &out_char, location_map::point *out_point); + void unget_char (); + location_map::point get_next_point () const; + static void dump_token (FILE *outf, const token *tok); + void lex_token (token *out); + void lex_string (token *out); + void lex_number (token *out, unichar first_char); + bool rest_of_literal (token *out, const char *suffix); + error *make_error (const char *msg); + bool consume_single_line_comment (token *out); + bool consume_multiline_comment (token *out); + + private: + auto_vec m_buffer; + int m_next_char_idx; + int m_next_char_line; + int m_next_char_column; + int m_prev_line_final_column; /* for handling unget_char after a '\n'. */ + + static const int MAX_TOKENS = 1; + token m_next_tokens[MAX_TOKENS]; + int m_num_next_tokens; + + bool m_support_comments; +}; + +/* A class for parsing JSON. */ + +class parser +{ + public: + parser (error **err_out, location_map *out_loc_map, + bool support_comments); + ~parser (); + bool add_utf8 (size_t length, const char *utf8_buf, error **err_out); + value *parse_value (int depth); + object *parse_object (int depth); + array *parse_array (int depth); + + bool seen_error_p () const { return *m_err_out; } + void require_eof (); + + private: + location_map::point get_next_token_start (); + location_map::point get_next_token_end (); + void require (enum token_id tok_id); + enum token_id require_one_of (enum token_id tok_id_a, enum token_id tok_id_b); + void error_at (const location_map::range &r, + const char *fmt, ...) ATTRIBUTE_PRINTF_3; + void maybe_record_range (json::value *jv, const location_map::range &r); + void maybe_record_range (json::value *jv, + const location_map::point &start, + const location_map::point &end); + + private: + lexer m_lexer; + error **m_err_out; + location_map *m_loc_map; +}; + +} // anonymous namespace for parsing implementation + +/* Parser implementation. */ + +/* lexer's ctor. */ + +lexer::lexer (bool support_comments) +: m_buffer (), m_next_char_idx (0), + m_next_char_line (1), m_next_char_column (0), + m_prev_line_final_column (-1), + m_num_next_tokens (0), + m_support_comments (support_comments) +{ +} + +/* lexer's dtor. */ + +lexer::~lexer () +{ + while (m_num_next_tokens > 0) + consume (); +} + +/* Peek the next token. */ + +const token * +lexer::peek () +{ + if (m_num_next_tokens == 0) + { + lex_token (&m_next_tokens[0]); + m_num_next_tokens++; + } + return &m_next_tokens[0]; +} + +/* Consume the next token. */ + +void +lexer::consume () +{ + if (m_num_next_tokens == 0) + peek (); + + gcc_assert (m_num_next_tokens > 0); + gcc_assert (m_num_next_tokens <= MAX_TOKENS); + + if (0) + { + fprintf (stderr, "consuming token: "); + dump_token (stderr, &m_next_tokens[0]); + fprintf (stderr, "\n"); + } + + if (m_next_tokens[0].id == TOK_ERROR + || m_next_tokens[0].id == TOK_STRING) + free (m_next_tokens[0].u.string); + + m_num_next_tokens--; + memmove (&m_next_tokens[0], &m_next_tokens[1], + sizeof (token) * m_num_next_tokens); +} + +/* Add LENGTH bytes of UTF-8 encoded text from UTF8_BUF to this lexer's + buffer. */ + +bool +lexer::add_utf8 (size_t length, const char *utf8_buf, error **err_out) +{ + /* Adapted from charset.c:one_utf8_to_cppchar. */ + static const uchar masks[6] = { 0x7F, 0x1F, 0x0F, 0x07, 0x03, 0x01 }; + static const uchar patns[6] = { 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC }; + + const uchar *inbuf = (const unsigned char *) (utf8_buf); + const uchar **inbufp = &inbuf; + size_t *inbytesleftp = &length; + + while (length > 0) + { + unichar c; + const uchar *inbuf = *inbufp; + size_t nbytes, i; + + c = *inbuf; + if (c < 0x80) + { + m_buffer.safe_push (c); + *inbytesleftp -= 1; + *inbufp += 1; + continue; + } + + /* The number of leading 1-bits in the first byte indicates how many + bytes follow. */ + for (nbytes = 2; nbytes < 7; nbytes++) + if ((c & ~masks[nbytes-1]) == patns[nbytes-1]) + goto found; + *err_out = make_error ("ill-formed UTF-8 sequence"); + return false; + found: + + if (*inbytesleftp < nbytes) + { + *err_out = make_error ("ill-formed UTF-8 sequence"); + return false; + } + + c = (c & masks[nbytes-1]); + inbuf++; + for (i = 1; i < nbytes; i++) + { + unichar n = *inbuf++; + if ((n & 0xC0) != 0x80) + { + *err_out = make_error ("ill-formed UTF-8 sequence"); + return false; + } + c = ((c << 6) + (n & 0x3F)); + } + + /* Make sure the shortest possible encoding was used. */ + if (( c <= 0x7F && nbytes > 1) + || (c <= 0x7FF && nbytes > 2) + || (c <= 0xFFFF && nbytes > 3) + || (c <= 0x1FFFFF && nbytes > 4) + || (c <= 0x3FFFFFF && nbytes > 5)) + { + *err_out = make_error ("ill-formed UTF-8:" + " shortest possible encoding not used"); + return false; + } + + /* Make sure the character is valid. */ + if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) + { + *err_out = make_error ("ill-formed UTF-8: invalid character"); + return false; + } + + m_buffer.safe_push (c); + *inbufp = inbuf; + *inbytesleftp -= nbytes; + } + return true; +} + +/* Attempt to get the next unicode character from this lexer's buffer. + If successful, write it to OUT_CHAR, and its location to *OUT_POINT, + and return true. + Otherwise, return false. */ + +bool +lexer::get_char (unichar &out_char, location_map::point *out_point) +{ + if (m_next_char_idx >= (int)m_buffer.length ()) + return false; + + if (out_point) + *out_point = get_next_point (); + out_char = m_buffer[m_next_char_idx++]; + + if (out_char == '\n') + { + m_next_char_line++; + m_prev_line_final_column = m_next_char_column; + m_next_char_column = 0; + } + else + m_next_char_column++; + + return true; +} + +/* Undo the last successful get_char. */ + +void +lexer::unget_char () +{ + --m_next_char_idx; + if (m_next_char_column > 0) + --m_next_char_column; + else + { + m_next_char_line--; + m_next_char_column = m_prev_line_final_column; + /* We don't support more than one unget_char in a row. */ + gcc_assert (m_prev_line_final_column != -1); + m_prev_line_final_column = -1; + } +} + +/* Get the location of the next char. */ + +location_map::point +lexer::get_next_point () const +{ + location_map::point result; + result.m_unichar_idx = m_next_char_idx; + result.m_line = m_next_char_line; + result.m_column = m_next_char_column; + return result; +} + +/* Print a textual representation of TOK to OUTF. + This is intended for debugging the lexer and parser, + rather than for user-facing output. */ + +void +lexer::dump_token (FILE *outf, const token *tok) +{ + switch (tok->id) + { + case TOK_ERROR: + fprintf (outf, "TOK_ERROR (\"%s\")", tok->u.string); + break; + + case TOK_EOF: + fprintf (outf, "TOK_EOF"); + break; + + case TOK_OPEN_SQUARE: + fprintf (outf, "TOK_OPEN_SQUARE"); + break; + + case TOK_OPEN_CURLY: + fprintf (outf, "TOK_OPEN_CURLY"); + break; + + case TOK_CLOSE_SQUARE: + fprintf (outf, "TOK_CLOSE_SQUARE"); + break; + + case TOK_CLOSE_CURLY: + fprintf (outf, "TOK_CLOSE_CURLY"); + break; + + case TOK_COLON: + fprintf (outf, "TOK_COLON"); + break; + + case TOK_COMMA: + fprintf (outf, "TOK_COMMA"); + break; + + case TOK_TRUE: + fprintf (outf, "TOK_TRUE"); + break; + + case TOK_FALSE: + fprintf (outf, "TOK_FALSE"); + break; + + case TOK_NULL: + fprintf (outf, "TOK_NULL"); + break; + + case TOK_STRING: + fprintf (outf, "TOK_STRING (\"%s\")", tok->u.string); + break; + + case TOK_FLOAT_NUMBER: + fprintf (outf, "TOK_FLOAT_NUMBER (%f)", tok->u.float_number); + break; + + case TOK_INTEGER_NUMBER: + fprintf (outf, "TOK_INTEGER_NUMBER (%ld)", tok->u.integer_number); + break; + + default: + gcc_unreachable (); + break; + } +} + +/* Treat "//" as a comment to the end of the line. + + This isn't compliant with the JSON spec, + but is very handy for writing DejaGnu tests. + + Return true if EOF and populate *OUT, false otherwise. */ + +bool +lexer::consume_single_line_comment (token *out) +{ + while (1) + { + unichar next_char; + if (!get_char (next_char, NULL)) + { + out->id = TOK_EOF; + location_map::point p = get_next_point (); + out->range.m_start = p; + out->range.m_end = p; + return true; + } + if (next_char == '\n') + return false; + } +} + +/* Treat '/' '*' as a multiline comment until the next closing '*' '/'. + + This isn't compliant with the JSON spec, + but is very handy for writing DejaGnu tests. + + Return true if EOF and populate *OUT, false otherwise. */ + +bool +lexer::consume_multiline_comment (token *out) +{ + while (1) + { + unichar next_char; + if (!get_char (next_char, NULL)) + { + out->id = TOK_ERROR; + gcc_unreachable (); // TODO + location_map::point p = get_next_point (); + out->range.m_start = p; + out->range.m_end = p; + return true; + } + if (next_char != '*') + continue; + if (!get_char (next_char, NULL)) + { + out->id = TOK_ERROR; + gcc_unreachable (); // TODO + location_map::point p = get_next_point (); + out->range.m_start = p; + out->range.m_end = p; + return true; + } + if (next_char == '/') + return false; + } +} + +/* Attempt to lex the input buffer, writing the next token to OUT. + On errors, TOK_ERROR (or TOK_EOF) is written to OUT. */ + +void +lexer::lex_token (token *out) +{ + /* Skip to next non-whitespace char. */ + unichar next_char; + location_map::point start_point; + while (1) + { + if (!get_char (next_char, &start_point)) + { + out->id = TOK_EOF; + location_map::point p = get_next_point (); + out->range.m_start = p; + out->range.m_end = p; + return; + } + if (m_support_comments) + if (next_char == '/') + { + location_map::point point; + unichar next_next_char; + if (get_char (next_next_char, &point)) + { + switch (next_next_char) + { + case '/': + if (consume_single_line_comment (out)) + return; + continue; + case '*': + if (consume_multiline_comment (out)) + return; + continue; + default: + /* A stray single '/'. Break out of loop, so that we + handle it below as an unexpected character. */ + goto non_whitespace; + } + } + } + if (next_char != ' ' + && next_char != '\t' + && next_char != '\n' + && next_char != '\r') + break; + } + + non_whitespace: + + out->range.m_start = start_point; + out->range.m_end = start_point; + + switch (next_char) + { + case '[': + out->id = TOK_OPEN_SQUARE; + break; + + case '{': + out->id = TOK_OPEN_CURLY; + break; + + case ']': + out->id = TOK_CLOSE_SQUARE; + break; + + case '}': + out->id = TOK_CLOSE_CURLY; + break; + + case ':': + out->id = TOK_COLON; + break; + + case ',': + out->id = TOK_COMMA; + break; + + case '"': + lex_string (out); + break; + + case '-': + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + lex_number (out, next_char); + break; + + case 't': + /* Handle literal "true". */ + if (rest_of_literal (out, "rue")) + { + out->id = TOK_TRUE; + break; + } + else + goto err; + + case 'f': + /* Handle literal "false". */ + if (rest_of_literal (out, "alse")) + { + out->id = TOK_FALSE; + break; + } + else + goto err; + + case 'n': + /* Handle literal "null". */ + if (rest_of_literal (out, "ull")) + { + out->id = TOK_NULL; + break; + } + else + goto err; + + err: + default: + out->id = TOK_ERROR; + out->u.string = xasprintf ("unexpected character: '%c'", next_char); + break; + } +} + +/* Having consumed an open-quote character from the lexer's buffer, attempt + to lex the rest of a JSON string, writing the result to OUT (or TOK_ERROR) + if an error occurred. + (ECMA-404 section 9; RFC 7159 section 7). */ + +void +lexer::lex_string (token *out) +{ + auto_vec content; + bool still_going = true; + while (still_going) + { + unichar uc; + if (!get_char (uc, &out->range.m_end)) + { + out->id = TOK_ERROR; + out->range.m_end = get_next_point (); + out->u.string = xstrdup ("EOF within string"); + return; + } + switch (uc) + { + case '"': + still_going = false; + break; + case '\\': + { + unichar next_char; + if (!get_char (next_char, &out->range.m_end)) + { + out->id = TOK_ERROR; + out->range.m_end = get_next_point (); + out->u.string = xstrdup ("EOF within string");; + return; + } + switch (next_char) + { + case '"': + case '\\': + case '/': + content.safe_push (next_char); + break; + + case 'b': + content.safe_push ('\b'); + break; + + case 'f': + content.safe_push ('\f'); + break; + + case 'n': + content.safe_push ('\n'); + break; + + case 'r': + content.safe_push ('\r'); + break; + + case 't': + content.safe_push ('\t'); + break; + + case 'u': + { + unichar result = 0; + for (int i = 0; i < 4; i++) + { + unichar hexdigit; + if (!get_char (hexdigit, &out->range.m_end)) + { + out->id = TOK_ERROR; + out->range.m_end = get_next_point (); + out->u.string = xstrdup ("EOF within string"); + return; + } + result <<= 4; + if (hexdigit >= '0' && hexdigit <= '9') + result += hexdigit - '0'; + else if (hexdigit >= 'a' && hexdigit <= 'f') + result += (hexdigit - 'a') + 10; + else if (hexdigit >= 'A' && hexdigit <= 'F') + result += (hexdigit - 'A') + 10; + else + { + out->id = TOK_ERROR; + out->range.m_start = out->range.m_end; + out->u.string = xstrdup ("bogus hex char"); + return; + } + } + content.safe_push (result); + } + break; + + default: + out->id = TOK_ERROR; + out->u.string = xstrdup ("unrecognized escape char"); + return; + } + } + break; + + default: + /* Reject unescaped control characters U+0000 through U+001F + (ECMA-404 section 9 para 1; RFC 7159 section 7 para 1). */ + if (uc <= 0x1f) + { + out->id = TOK_ERROR; + out->range.m_start = out->range.m_end; + out->u.string = xstrdup ("unescaped control char"); + return; + } + + /* Otherwise, add regular unicode code point. */ + content.safe_push (uc); + break; + } + } + + out->id = TOK_STRING; + + auto_vec utf8_buf; + // Adapted from libcpp/charset.c:one_cppchar_to_utf8 + for (unsigned i = 0; i < content.length (); i++) + { + static const uchar masks[6] = { 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC }; + static const uchar limits[6] = { 0x80, 0xE0, 0xF0, 0xF8, 0xFC, 0xFE }; + size_t nbytes; + uchar buf[6], *p = &buf[6]; + unichar c = content[i]; + + nbytes = 1; + if (c < 0x80) + *--p = c; + else + { + do + { + *--p = ((c & 0x3F) | 0x80); + c >>= 6; + nbytes++; + } + while (c >= 0x3F || (c & limits[nbytes-1])); + *--p = (c | masks[nbytes-1]); + } + + while (p < &buf[6]) + utf8_buf.safe_push (*p++); + } + + out->u.string = XNEWVEC (char, utf8_buf.length () + 1); + for (unsigned i = 0; i < utf8_buf.length (); i++) + out->u.string[i] = utf8_buf[i]; + out->u.string[utf8_buf.length ()] = '\0'; +} + +/* Having consumed FIRST_CHAR, an initial digit or '-' character from + the lexer's buffer attempt to lex the rest of a JSON number, writing + the result to OUT (or TOK_ERROR) if an error occurred. + (ECMA-404 section 8; RFC 7159 section 6). */ + +void +lexer::lex_number (token *out, unichar first_char) +{ + bool negate = false; + double value = 0.0; + if (first_char == '-') + { + negate = true; + if (!get_char (first_char, &out->range.m_end)) + { + out->id = TOK_ERROR; + out->range.m_start = out->range.m_end; + out->u.string = xstrdup ("expected digit"); + return; + } + } + + if (first_char == '0') + value = 0.0; + else if (!ISDIGIT (first_char)) + { + out->id = TOK_ERROR; + out->range.m_start = out->range.m_end; + out->u.string = xstrdup ("expected digit"); + return; + } + else + { + /* Got a nonzero digit; expect zero or more digits. */ + value = first_char - '0'; + while (1) + { + unichar uc; + location_map::point point; + if (!get_char (uc, &point)) + break; + if (ISDIGIT (uc)) + { + value *= 10; + value += uc -'0'; + out->range.m_end = point; + continue; + } + else + { + unget_char (); + break; + } + } + } + + /* Optional '.', followed by one or more decimals. */ + unichar next_char; + location_map::point point; + if (get_char (next_char, &point)) + { + if (next_char == '.') + { + /* Parse decimal digits. */ + bool had_digit = false; + double digit_factor = 0.1; + while (get_char (next_char, &point)) + { + if (!ISDIGIT (next_char)) + { + unget_char (); + break; + } + value += (next_char - '0') * digit_factor; + digit_factor *= 0.1; + had_digit = true; + out->range.m_end = point; + } + if (!had_digit) + { + out->id = TOK_ERROR; + out->range.m_start = point; + out->range.m_start = point; + out->u.string = xstrdup ("expected digit"); + return; + } + } + else + unget_char (); + } + + /* Parse 'e' and 'E'. */ + unichar exponent_char; + if (get_char (exponent_char, &point)) + { + if (exponent_char == 'e' || exponent_char == 'E') + { + /* Optional +/-. */ + unichar sign_char; + int exponent = 0; + bool negate_exponent = false; + bool had_exponent_digit = false; + if (!get_char (sign_char, &point)) + { + out->id = TOK_ERROR; + out->range.m_start = point; + out->range.m_start = point; + out->u.string = xstrdup ("EOF within exponent"); + return; + } + if (sign_char == '-') + negate_exponent = true; + else if (sign_char == '+') + ; + else if (ISDIGIT (sign_char)) + { + exponent = sign_char - '0'; + had_exponent_digit = true; + } + else + { + out->id = TOK_ERROR; + out->range.m_start = point; + out->range.m_start = point; + out->u.string + = xstrdup ("expected '-','+' or digit within exponent"); + return; + } + out->range.m_end = point; + + /* One or more digits (we might have seen the digit above, + though). */ + while (1) + { + unichar uc; + location_map::point point; + if (!get_char (uc, &point)) + break; + if (ISDIGIT (uc)) + { + exponent *= 10; + exponent += uc -'0'; + had_exponent_digit = true; + out->range.m_end = point; + continue; + } + else + { + unget_char (); + break; + } + } + if (!had_exponent_digit) + { + out->id = TOK_ERROR; + out->range.m_start = point; + out->range.m_start = point; + out->u.string = xstrdup ("expected digit within exponent"); + return; + } + if (negate_exponent) + exponent = -exponent; + value = value * pow (10, exponent); + } + else + unget_char (); + } + + if (negate) + value = -value; + + if (value == (long)value) + { + out->id = TOK_INTEGER_NUMBER; + out->u.integer_number = value; + } + else + { + out->id = TOK_FLOAT_NUMBER; + out->u.float_number = value; + } +} + +/* Determine if the next characters to be lexed match SUFFIX. + SUFFIX must be pure ASCII and not contain newlines. + If so, consume the characters and return true. + Otherwise, return false. */ + +bool +lexer::rest_of_literal (token *out, const char *suffix) +{ + int suffix_idx = 0; + int buf_idx = m_next_char_idx; + while (1) + { + if (suffix[suffix_idx] == '\0') + { + m_next_char_idx += suffix_idx; + m_next_char_column += suffix_idx; + out->range.m_end.m_unichar_idx += suffix_idx; + out->range.m_end.m_column += suffix_idx; + return true; + } + if (buf_idx >= (int)m_buffer.length ()) + return false; + /* This assumes that suffix is ASCII. */ + if (m_buffer[buf_idx] != (unichar)suffix[suffix_idx]) + return false; + buf_idx++; + suffix_idx++; + } +} + +/* Create a new error instance for MSG, using the location of the next + character for the location of the error. */ + +error * +lexer::make_error (const char *msg) +{ + location_map::point p; + p.m_unichar_idx = m_next_char_idx; + p.m_line = m_next_char_line; + p.m_column = m_next_char_column; + location_map::range r; + r.m_start = p; + r.m_end = p; + return new error (r, xstrdup (msg)); +} + +/* parser's ctor. */ + +parser::parser (error **err_out, location_map *out_loc_map, + bool support_comments) +: m_lexer (support_comments), m_err_out (err_out), m_loc_map (out_loc_map) +{ + gcc_assert (err_out); + gcc_assert (*err_out == NULL); + *err_out = NULL; +} + +/* parser's dtor. */ + +parser::~parser () +{ + if (m_loc_map) + m_loc_map->on_finished_parsing (); +} + +/* Add LENGTH bytes of UTF-8 encoded text from UTF8_BUF to this parser's + lexer's buffer. */ + +bool +parser::add_utf8 (size_t length, const char *utf8_buf, error **err_out) +{ + return m_lexer.add_utf8 (length, utf8_buf, err_out); +} + +/* Parse a JSON value (object, array, number, string, or literal). + (ECMA-404 section 5; RFC 7159 section 3). */ + +value * +parser::parse_value (int depth) +{ + const token *tok = m_lexer.peek (); + + /* Avoid stack overflow with deeply-nested inputs; RFC 7159 section 9 + states: "An implementation may set limits on the maximum depth + of nesting.". + + Ideally we'd avoid this limit (e.g. by rewriting parse_value, + parse_object, and parse_array into a single function with a vec of + state). */ + const int MAX_DEPTH = 100; + if (depth >= MAX_DEPTH) + { + error_at (tok->range, "maximum nesting depth exceeded: %i", + MAX_DEPTH); + return NULL; + } + + switch (tok->id) + { + case TOK_OPEN_CURLY: + return parse_object (depth); + + case TOK_STRING: + { + string *result = new string (tok->u.string); + m_lexer.consume (); + maybe_record_range (result, tok->range); + return result; + } + + case TOK_OPEN_SQUARE: + return parse_array (depth); + + case TOK_FLOAT_NUMBER: + { + float_number *result = new float_number (tok->u.float_number); + m_lexer.consume (); + maybe_record_range (result, tok->range); + return result; + } + + case TOK_INTEGER_NUMBER: + { + integer_number *result = new integer_number (tok->u.integer_number); + m_lexer.consume (); + maybe_record_range (result, tok->range); + return result; + } + + case TOK_TRUE: + { + literal *result = new literal (JSON_TRUE); + m_lexer.consume (); + maybe_record_range (result, tok->range); + return result; + } + + case TOK_FALSE: + { + literal *result = new literal (JSON_FALSE); + m_lexer.consume (); + maybe_record_range (result, tok->range); + return result; + } + + case TOK_NULL: + { + literal *result = new literal (JSON_NULL); + m_lexer.consume (); + maybe_record_range (result, tok->range); + return result; + } + + case TOK_ERROR: + error_at (tok->range, "invalid JSON token: %s", tok->u.string); + return NULL; + + default: + error_at (tok->range, "expected a JSON value but got %s", + token_id_name[tok->id]); + return NULL; + } +} + +/* Parse a JSON object. + (ECMA-404 section 6; RFC 7159 section 4). */ + +object * +parser::parse_object (int depth) +{ + location_map::point start = get_next_token_start (); + + require (TOK_OPEN_CURLY); + + object *result = new object (); + + const token *tok = m_lexer.peek (); + if (tok->id == TOK_CLOSE_CURLY) + { + location_map::point end = get_next_token_end (); + maybe_record_range (result, start, end); + require (TOK_CLOSE_CURLY); + return result; + } + if (tok->id != TOK_STRING) + { + error_at (tok->range, + "expected string for object key after '{'; got %s", + token_id_name[tok->id]); + return result; + } + while (!seen_error_p ()) + { + tok = m_lexer.peek (); + if (tok->id != TOK_STRING) + { + error_at (tok->range, + "expected string for object key after ','; got %s", + token_id_name[tok->id]); + return result; + } + char *key = xstrdup (tok->u.string); + m_lexer.consume (); + + require (TOK_COLON); + + value *v = parse_value (depth + 1); + if (!v) + { + free (key); + return result; + } + /* We don't enforce uniqueness for keys. */ + result->set (key, v); + free (key); + + location_map::point end = get_next_token_end (); + if (require_one_of (TOK_COMMA, TOK_CLOSE_CURLY) == TOK_COMMA) + continue; + else + { + /* TOK_CLOSE_CURLY. */ + maybe_record_range (result, start, end); + break; + } + } + return result; +} + +/* Parse a JSON array. + (ECMA-404 section 7; RFC 7159 section 5). */ + +array * +parser::parse_array (int depth) +{ + location_map::point start = get_next_token_start (); + require (TOK_OPEN_SQUARE); + + array *result = new array (); + + const token *tok = m_lexer.peek (); + if (tok->id == TOK_CLOSE_SQUARE) + { + location_map::point end = get_next_token_end (); + maybe_record_range (result, start, end); + m_lexer.consume (); + return result; + } + + while (!seen_error_p ()) + { + value *v = parse_value (depth + 1); + if (!v) + return result; + + result->append (v); + + location_map::point end = get_next_token_end (); + if (require_one_of (TOK_COMMA, TOK_CLOSE_SQUARE) == TOK_COMMA) + continue; + else + { + /* TOK_CLOSE_SQUARE. */ + maybe_record_range (result, start, end); + break; + } + } + + return result; +} + +/* Get the start point of the next token. */ + +location_map::point +parser::get_next_token_start () +{ + const token *tok = m_lexer.peek (); + return tok->range.m_start; +} + +/* Get the end point of the next token. */ + +location_map::point +parser::get_next_token_end () +{ + const token *tok = m_lexer.peek (); + return tok->range.m_end; +} + +/* Require an EOF, or fail if there is surplus input. */ + +void +parser::require_eof () +{ + require (TOK_EOF); +} + +/* Consume the next token, issuing an error if it is not of kind TOK_ID. */ + +void +parser::require (enum token_id tok_id) +{ + const token *tok = m_lexer.peek (); + if (tok->id != tok_id) + { + if (tok->id == TOK_ERROR) + error_at (tok->range, "expected %s; got bad token: %s", + token_id_name[tok_id], tok->u.string); + else + error_at (tok->range, "expected %s; got %s", token_id_name[tok_id], + token_id_name[tok->id]); + } + m_lexer.consume (); +} + +/* Consume the next token, issuing an error if it is not of + kind TOK_ID_A or TOK_ID_B. + Return which kind it was. */ + +enum token_id +parser::require_one_of (enum token_id tok_id_a, enum token_id tok_id_b) +{ + const token *tok = m_lexer.peek (); + if ((tok->id != tok_id_a) + && (tok->id != tok_id_b)) + { + if (tok->id == TOK_ERROR) + error_at (tok->range, "expected %s or %s; got bad token: %s", + token_id_name[tok_id_a], token_id_name[tok_id_b], + tok->u.string); + else + error_at (tok->range, "expected %s or %s; got %s", + token_id_name[tok_id_a], token_id_name[tok_id_b], + token_id_name[tok->id]); + } + enum token_id result = tok->id; + m_lexer.consume (); + return result; +} + +/* Genarate a parsing error . If this is the first error that has occurred on + the parser, store it within the parser's *m_err_out. + Otherwise do nothing. */ + +void +parser::error_at (const location_map::range &r, const char *fmt, ...) +{ + if (m_err_out == NULL) + return; + /* Only record the first error. */ + if (*m_err_out) + return; + + va_list ap; + va_start (ap, fmt); + char *formatted_msg = xvasprintf (fmt, ap); + va_end (ap); + + *m_err_out = new error (r, formatted_msg); +} + +/* Record that JV has range R within the input file. */ + +void +parser::maybe_record_range (json::value *jv, const location_map::range &r) +{ + if (m_loc_map) + m_loc_map->record_range_for_value (jv, r); +} + +/* Record that JV has range START to END within the input file. */ + +void +parser::maybe_record_range (json::value *jv, + const location_map::point &start, + const location_map::point &end) +{ + if (m_loc_map) + { + location_map::range r; + r.m_start = start; + r.m_end = end; + m_loc_map->record_range_for_value (jv, r); + } +} + +/* Attempt to parse the UTF-8 encoded buffer at UTF8_BUF + of the given LENGTH. + If ALLOW_COMMENTS is true, then allow C and C++ style-comments in the + buffer, as an extension to JSON, otherwise forbid them. + If successful, return a non-NULL json::value *. + if there was a problem, return NULL and write an error + message to err_out, which must be deleted by the caller. + If OUT_LOC_MAP is non-NULL, notify *OUT_LOC_MAP about + source locations of nodes seen during parsing. */ + +value * +json::parse_utf8_string (size_t length, + const char *utf8_buf, + bool allow_comments, + error **err_out, + location_map *out_loc_map) +{ + gcc_assert (err_out); + gcc_assert (*err_out == NULL); + + parser p (err_out, out_loc_map, allow_comments); + if (!p.add_utf8 (length, utf8_buf, err_out)) + return NULL; + value *result = p.parse_value (0); + if (!p.seen_error_p ()) + p.require_eof (); + if (p.seen_error_p ()) + { + gcc_assert (*err_out); + delete result; + return NULL; + } + return result; +} + +/* Attempt to parse the nil-terminated UTF-8 encoded buffer at + UTF8_BUF. + If ALLOW_COMMENTS is true, then allow C and C++ style-comments in the + buffer, as an extension to JSON, otherwise forbid them. + If successful, return a non-NULL json::value *. + if there was a problem, return NULL and write an error + message to err_out, which must be deleted by the caller. + If OUT_LOC_MAP is non-NULL, notify *OUT_LOC_MAP about + source locations of nodes seen during parsing. */ + +value * +json::parse_utf8_string (const char *utf8, + bool allow_comments, + error **err_out, + location_map *out_loc_map) +{ + return parse_utf8_string (strlen (utf8), utf8, allow_comments, + err_out, out_loc_map); +} + + +#if CHECKING_P + +namespace selftest { + +/* Selftests. */ + +/* Implementation detail of ASSERT_RANGE_EQ. */ + +static void +assert_point_eq (const location &loc, + const location_map::point &actual_point, + size_t exp_unichar_idx, int exp_line, int exp_column) +{ + ASSERT_EQ_AT (loc, actual_point.m_unichar_idx, exp_unichar_idx); + ASSERT_EQ_AT (loc, actual_point.m_line, exp_line); + ASSERT_EQ_AT (loc, actual_point.m_column, exp_column); +} + +/* Implementation detail of ASSERT_RANGE_EQ. */ + +static void +assert_range_eq (const location &loc, + const location_map::range &actual_range, + /* Expected location. */ + size_t start_unichar_idx, int start_line, int start_column, + size_t end_unichar_idx, int end_line, int end_column) +{ + assert_point_eq (loc, actual_range.m_start, + start_unichar_idx, start_line, start_column); + assert_point_eq (loc, actual_range.m_end, + end_unichar_idx, end_line, end_column); +} + +/* Assert that ACTUAL_RANGE starts at + (START_UNICHAR_IDX, START_LINE, START_COLUMN) + and ends at (END_UNICHAR_IDX, END_LINE, END_COLUMN). */ + +#define ASSERT_RANGE_EQ(ACTUAL_RANGE, \ + START_UNICHAR_IDX, START_LINE, START_COLUMN, \ + END_UNICHAR_IDX, END_LINE, END_COLUMN) \ + assert_range_eq ((SELFTEST_LOCATION), (ACTUAL_RANGE), \ + (START_UNICHAR_IDX), (START_LINE), (START_COLUMN), \ + (END_UNICHAR_IDX), (END_LINE), (END_COLUMN)) + +/* Implementation detail of ASSERT_ERR_EQ. */ + +static void +assert_err_eq (const location &loc, + json::error *actual_err, + /* Expected location. */ + size_t start_unichar_idx, int start_line, int start_column, + size_t end_unichar_idx, int end_line, int end_column, + const char *expected_msg) +{ + ASSERT_TRUE_AT (loc, actual_err); + const location_map::range &actual_range = actual_err->get_range (); + ASSERT_EQ_AT (loc, actual_range.m_start.m_unichar_idx, start_unichar_idx); + ASSERT_EQ_AT (loc, actual_range.m_start.m_line, start_line); + ASSERT_EQ_AT (loc, actual_range.m_start.m_column, start_column); + ASSERT_EQ_AT (loc, actual_range.m_end.m_unichar_idx, end_unichar_idx); + ASSERT_EQ_AT (loc, actual_range.m_end.m_line, end_line); + ASSERT_EQ_AT (loc, actual_range.m_end.m_column, end_column); + ASSERT_STREQ_AT (loc, actual_err->get_msg (), expected_msg); +} + +/* Assert that ACTUAL_ERR is a non-NULL json::error *, + with message EXPECTED_MSG, and that its location starts + at (START_UNICHAR_IDX, START_LINE, START_COLUMN) + and ends at (END_UNICHAR_IDX, END_LINE, END_COLUMN). */ + +#define ASSERT_ERR_EQ(ACTUAL_ERR, \ + START_UNICHAR_IDX, START_LINE, START_COLUMN, \ + END_UNICHAR_IDX, END_LINE, END_COLUMN, \ + EXPECTED_MSG) \ + assert_err_eq ((SELFTEST_LOCATION), (ACTUAL_ERR), \ + (START_UNICHAR_IDX), (START_LINE), (START_COLUMN), \ + (END_UNICHAR_IDX), (END_LINE), (END_COLUMN), \ + (EXPECTED_MSG)) + +/* Verify that the JSON lexer works as expected. */ + +static void +test_lexer () +{ + error *err = NULL; + lexer l (false); + const char *str + /* 0 1 2 3 4 . */ + /* 01234567890123456789012345678901234567890123456789. */ + = (" 1066 -1 \n" + " -273.15 1e6\n" + " [ ] null true false { } \"foo\" \n"); + l.add_utf8 (strlen (str), str, &err); + ASSERT_EQ (err, NULL); + + /* Line 1. */ + { + const size_t line_offset = 0; + + /* Expect token: "1066" in columns 4-7. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 1066); + ASSERT_RANGE_EQ (tok->range, + line_offset + 4, 1, 4, + line_offset + 7, 1, 7); + l.consume (); + } + /* Expect token: "-1" in columns 11-12. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, -1); + ASSERT_RANGE_EQ (tok->range, + line_offset + 11, 1, 11, + line_offset + 12, 1, 12); + l.consume (); + } + } + + /* Line 2. */ + { + const size_t line_offset = 16; + + /* Expect token: "-273.15" in columns 4-10. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_FLOAT_NUMBER); + ASSERT_EQ (int(tok->u.float_number), int(-273.15)); + ASSERT_RANGE_EQ (tok->range, + line_offset + 4, 2, 4, + line_offset + 10, 2, 10); + l.consume (); + } + /* Expect token: "1e6" in columns 12-14. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 1000000); + ASSERT_RANGE_EQ (tok->range, + line_offset + 12, 2, 12, + line_offset + 14, 2, 14); + l.consume (); + } + } + + /* Line 3. */ + { + const size_t line_offset = 32; + + /* Expect token: "[". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_OPEN_SQUARE); + ASSERT_RANGE_EQ (tok->range, + line_offset + 2, 3, 2, + line_offset + 2, 3, 2); + l.consume (); + } + /* Expect token: "]". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_CLOSE_SQUARE); + ASSERT_RANGE_EQ (tok->range, + line_offset + 6, 3, 6, + line_offset + 6, 3, 6); + l.consume (); + } + /* Expect token: "null". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_NULL); + ASSERT_RANGE_EQ (tok->range, + line_offset + 8, 3, 8, + line_offset + 11, 3, 11); + l.consume (); + } + /* Expect token: "true". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_TRUE); + ASSERT_RANGE_EQ (tok->range, + line_offset + 15, 3, 15, + line_offset + 18, 3, 18); + l.consume (); + } + /* Expect token: "false". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_FALSE); + ASSERT_RANGE_EQ (tok->range, + line_offset + 21, 3, 21, + line_offset + 25, 3, 25); + l.consume (); + } + /* Expect token: "{". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_OPEN_CURLY); + ASSERT_RANGE_EQ (tok->range, + line_offset + 28, 3, 28, + line_offset + 28, 3, 28); + l.consume (); + } + /* Expect token: "}". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_CLOSE_CURLY); + ASSERT_RANGE_EQ (tok->range, + line_offset + 31, 3, 31, + line_offset + 31, 3, 31); + l.consume (); + } + /* Expect token: "\"foo\"". */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_STRING); + ASSERT_RANGE_EQ (tok->range, + line_offset + 34, 3, 34, + line_offset + 38, 3, 38); + l.consume (); + } + } +} + +/* Verify that the JSON lexer complains about single-line comments + when comments are disabled. */ + +static void +test_lexing_unsupported_single_line_comment () +{ + error *err = NULL; + lexer l (false); + const char *str + /* 0 1 2 3 4 . */ + /* 01234567890123456789012345678901234567890123456789. */ + = (" 1066 // Hello world\n"); + l.add_utf8 (strlen (str), str, &err); + ASSERT_EQ (err, NULL); + + /* Line 1. */ + { + const size_t line_offset = 0; + const int line_1 = 1; + + /* Expect token: "1066" in columns 4-7. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 1066); + ASSERT_RANGE_EQ (tok->range, + line_offset + 4, line_1, 4, + line_offset + 7, line_1, 7); + l.consume (); + } + + /* Expect error. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_ERROR); + ASSERT_STREQ (tok->u.string, "unexpected character: '/'"); + ASSERT_RANGE_EQ (tok->range, + line_offset + 11, line_1, 11, + line_offset + 11, line_1, 11); + l.consume (); + } + } +} + +/* Verify that the JSON lexer complains about multiline comments + when comments are disabled. */ + +static void +test_lexing_unsupported_multiline_comment () +{ + error *err = NULL; + lexer l (false); + const char *str + /* 0 1 2 3 4 . */ + /* 01234567890123456789012345678901234567890123456789. */ + = (" 1066 /* Hello world\n" + " continuation of comment\n" + " end of comment */ 42\n"); + l.add_utf8 (strlen (str), str, &err); + ASSERT_EQ (err, NULL); + + /* Line 1. */ + { + const size_t line_offset = 0; + const int line_1 = 1; + + /* Expect token: "1066" in line 1, columns 4-7. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 1066); + ASSERT_RANGE_EQ (tok->range, + line_offset + 4, line_1, 4, + line_offset + 7, line_1, 7); + l.consume (); + } + + /* Expect error. */ + { + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_ERROR); + ASSERT_STREQ (tok->u.string, "unexpected character: '/'"); + ASSERT_RANGE_EQ (tok->range, + line_offset + 11, line_1, 11, + line_offset + 11, line_1, 11); + l.consume (); + } + } +} + +/* Verify that the JSON lexer handles single-line comments + when comments are enabled. */ + +static void +test_lexing_supported_single_line_comment () +{ + error *err = NULL; + lexer l (true); + const char *str + /* 0 1 2 3 4 . */ + /* 01234567890123456789012345678901234567890123456789. */ + = (" 1066 // Hello world\n" + " 42 // etc\n"); + l.add_utf8 (strlen (str), str, &err); + ASSERT_EQ (err, NULL); + + const size_t line_1_offset = 0; + const size_t line_2_offset = 26; + const size_t line_3_offset = line_2_offset + 17; + + /* Expect token: "1066" in line 1, columns 4-7. */ + { + const int line_1 = 1; + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 1066); + ASSERT_RANGE_EQ (tok->range, + line_1_offset + 4, line_1, 4, + line_1_offset + 7, line_1, 7); + l.consume (); + } + + /* Expect token: "42" in line 2, columns 5-6. */ + { + const int line_2 = 2; + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 42); + ASSERT_RANGE_EQ (tok->range, + line_2_offset + 5, line_2, 5, + line_2_offset + 6, line_2, 6); + l.consume (); + } + + /* Expect EOF. */ + { + const int line_3 = 3; + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_EOF); + ASSERT_RANGE_EQ (tok->range, + line_3_offset + 0, line_3, 0, + line_3_offset + 0, line_3, 0); + l.consume (); + } +} + +/* Verify that the JSON lexer handles multiline comments + when comments are enabled. */ + +static void +test_lexing_supported_multiline_comment () +{ + error *err = NULL; + lexer l (true); + const char *str + /* 0 1 2 3 4 . */ + /* 01234567890123456789012345678901234567890123456789. */ + = (" 1066 /* Hello world\n" + " continuation of comment\n" + " end of comment */ 42\n"); + l.add_utf8 (strlen (str), str, &err); + ASSERT_EQ (err, NULL); + + const size_t line_1_offset = 0; + const size_t line_2_offset = 26; + const size_t line_3_offset = line_2_offset + 25; + const size_t line_4_offset = line_3_offset + 23; + + /* Expect token: "1066" in line 1, columns 4-7. */ + { + const int line_1 = 1; + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 1066); + ASSERT_RANGE_EQ (tok->range, + line_1_offset + 4, line_1, 4, + line_1_offset + 7, line_1, 7); + l.consume (); + } + + /* Expect token: "42" in line 3, columns 20-21. */ + { + const int line_3 = 3; + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_INTEGER_NUMBER); + ASSERT_EQ (tok->u.integer_number, 42); + ASSERT_RANGE_EQ (tok->range, + line_3_offset + 20, line_3, 20, + line_3_offset + 21, line_3, 21); + l.consume (); + } + + /* Expect EOF. */ + { + const int line_4 = 4; + const token *tok = l.peek (); + ASSERT_EQ (tok->id, TOK_EOF); + ASSERT_RANGE_EQ (tok->range, + line_4_offset + 0, line_4, 0, + line_4_offset + 0, line_4, 0); + l.consume (); + } +} + +/* Concrete implementation of location_map for use in + JSON parsing selftests. */ + +class test_location_map : public location_map +{ +public: + void record_range_for_value (json::value *jv, const range &r) final override + { + m_map.put (jv, r); + } + + range *get_range_for_value (json::value *jv) + { + return m_map.get (jv); + } + +private: + hash_map m_map; +}; + +/* Verify that parse_utf8_string works as expected. */ + +static void +test_parse_string () +{ + const int line_1 = 1; + test_location_map loc_map; + error *err = NULL; + json::value *jv = parse_utf8_string ("\"foo\"", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_STRING); + ASSERT_STREQ (as_a (jv)->get_string (), "foo"); + assert_print_eq (*jv, "\"foo\""); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 4, line_1, 4); + delete jv; + + const char *contains_quotes = "\"before \\\"quoted\\\" after\""; + jv = parse_utf8_string (contains_quotes, false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_STRING); + ASSERT_STREQ (as_a (jv)->get_string (), + "before \"quoted\" after"); + assert_print_eq (*jv, contains_quotes); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 24, line_1, 24); + delete jv; + + /* Test of non-ASCII input. This string is the Japanese word "mojibake", + written as C octal-escaped UTF-8. */ + const char *mojibake = (/* Opening quote. */ + "\"" + /* U+6587 CJK UNIFIED IDEOGRAPH-6587 + UTF-8: 0xE6 0x96 0x87 + C octal escaped UTF-8: \346\226\207. */ + "\346\226\207" + /* U+5B57 CJK UNIFIED IDEOGRAPH-5B57 + UTF-8: 0xE5 0xAD 0x97 + C octal escaped UTF-8: \345\255\227. */ + "\345\255\227" + /* U+5316 CJK UNIFIED IDEOGRAPH-5316 + UTF-8: 0xE5 0x8C 0x96 + C octal escaped UTF-8: \345\214\226. */ + "\345\214\226" + /* U+3051 HIRAGANA LETTER KE + UTF-8: 0xE3 0x81 0x91 + C octal escaped UTF-8: \343\201\221. */ + "\343\201\221" + /* Closing quote. */ + "\""); + jv = parse_utf8_string (mojibake, false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_STRING); + /* Result of get_string should be UTF-8 encoded, without quotes. */ + ASSERT_STREQ (as_a (jv)->get_string (), + "\346\226\207" "\345\255\227" "\345\214\226" "\343\201\221"); + /* Result of dump should be UTF-8 encoded, with quotes. */ + assert_print_eq (*jv, mojibake); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 5, line_1, 5); + delete jv; + + /* Test of \u-escaped unicode. This is "mojibake" again, as above. */ + const char *escaped_unicode = "\"\\u6587\\u5b57\\u5316\\u3051\""; + jv = parse_utf8_string (escaped_unicode, false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_STRING); + /* Result of get_string should be UTF-8 encoded, without quotes. */ + ASSERT_STREQ (as_a (jv)->get_string (), + "\346\226\207" "\345\255\227" "\345\214\226" "\343\201\221"); + /* Result of dump should be UTF-8 encoded, with quotes. */ + assert_print_eq (*jv, mojibake); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 25, line_1, 25); + delete jv; +} + +/* Verify that we can parse various kinds of JSON numbers. */ + +static void +test_parse_number () +{ + const int line_1 = 1; + test_location_map loc_map; + json::value *jv; + + error *err = NULL; + jv = parse_utf8_string ("42", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_INTEGER); + ASSERT_EQ (as_a (jv)->get (), 42.0); + assert_print_eq (*jv, "42"); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 1, line_1, 1); + delete jv; + + /* Negative number. */ + jv = parse_utf8_string ("-17", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_INTEGER); + ASSERT_EQ (as_a (jv)->get (), -17.0); + assert_print_eq (*jv, "-17"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 2, line_1, 2); + delete jv; + + /* Decimal. */ + jv = parse_utf8_string ("3.141", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (JSON_FLOAT, jv->get_kind ()); + ASSERT_EQ (3.141, ((json::float_number *)jv)->get ()); + assert_print_eq (*jv, "3.141"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 4, line_1, 4); + delete jv; + + /* Exponents. */ + jv = parse_utf8_string ("3.141e+0", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_FLOAT); + ASSERT_EQ (as_a (jv)->get (), 3.141); + assert_print_eq (*jv, "3.141"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 7, line_1, 7); + delete jv; + + jv = parse_utf8_string ("42e2", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_INTEGER); + ASSERT_EQ (as_a (jv)->get (), 4200); + assert_print_eq (*jv, "4200"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 3, line_1, 3); + delete jv; + + jv = parse_utf8_string ("42e-1", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_FLOAT); + ASSERT_EQ (as_a (jv)->get (), 4.2); + assert_print_eq (*jv, "4.2"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 4, line_1, 4); + delete jv; +} + +/* Verify that JSON array parsing works. */ + +static void +test_parse_array () +{ + const int line_1 = 1; + test_location_map loc_map; + json::value *jv; + + error *err = NULL; + jv = parse_utf8_string ("[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]", false, + &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_EQ (jv->get_kind (), JSON_ARRAY); + json::array *arr = as_a (jv); + ASSERT_EQ (arr->length (), 10); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 29, line_1, 29); + for (int i = 0; i < 10; i++) + { + json::value *element = arr->get (i); + ASSERT_EQ (element->get_kind (), JSON_INTEGER); + ASSERT_EQ (as_a (element)->get (), i); + range = loc_map.get_range_for_value (element); + ASSERT_TRUE (range); + const int offset = 1 + (i * 3); + ASSERT_RANGE_EQ (*range, + offset, line_1, offset, + offset, line_1, offset); + } + assert_print_eq (*jv, "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]"); + + delete jv; +} + +/* Verify that JSON object parsing works. */ + +static void +test_parse_object () +{ + const int line_1 = 1; + test_location_map loc_map; + error *err = NULL; + json::value *jv + /* 0 1 2 3 . */ + /* 01 2345 678 9012 345 6789 0123456789012. */ + = parse_utf8_string ("{\"foo\": \"bar\", \"baz\": [42, null]}", + false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + ASSERT_EQ (jv->get_kind (), JSON_OBJECT); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 32, line_1, 32); + json::object *jo = static_cast (jv); + + json::value *foo_value = jo->get ("foo"); + ASSERT_NE (foo_value, NULL); + ASSERT_EQ (foo_value->get_kind (), JSON_STRING); + ASSERT_STREQ (as_a (foo_value)->get_string (), "bar"); + range = loc_map.get_range_for_value (foo_value); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 8, line_1, 8, + 12, line_1, 12); + + json::value *baz_value = jo->get ("baz"); + ASSERT_NE (baz_value, NULL); + ASSERT_EQ (baz_value->get_kind (), JSON_ARRAY); + range = loc_map.get_range_for_value (baz_value); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 22, line_1, 22, + 31, line_1, 31); + + json::array *baz_array = as_a (baz_value); + ASSERT_EQ (baz_array->length (), 2); + + json::value *element0 = baz_array->get (0); + ASSERT_EQ (as_a (element0)->get (), 42); + range = loc_map.get_range_for_value (element0); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 23, line_1, 23, + 24, line_1, 24); + + json::value *element1 = baz_array->get (1); + ASSERT_EQ (element1->get_kind (), JSON_NULL); + range = loc_map.get_range_for_value (element1); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 27, line_1, 27, + 30, line_1, 30); + + delete jv; +} + +/* Verify that the JSON literals "true", "false" and "null" are parsed + correctly. */ + +static void +test_parse_literals () +{ + const int line_1 = 1; + test_location_map loc_map; + json::value *jv; + error *err = NULL; + jv = parse_utf8_string ("true", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + ASSERT_EQ (jv->get_kind (), JSON_TRUE); + assert_print_eq (*jv, "true"); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 3, line_1, 3); + delete jv; + + jv = parse_utf8_string ("false", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + ASSERT_EQ (jv->get_kind (), JSON_FALSE); + assert_print_eq (*jv, "false"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 4, line_1, 4); + delete jv; + + jv = parse_utf8_string ("null", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + ASSERT_EQ (jv->get_kind (), JSON_NULL); + assert_print_eq (*jv, "null"); + range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 3, line_1, 3); + delete jv; +} + +/* Verify that we can parse a simple JSON-RPC request. */ + +static void +test_parse_jsonrpc () +{ + test_location_map loc_map; + error *err = NULL; + const char *request + /* 0 1 2 3 4. */ + /* 01 23456789 012 3456 789 0123456 789 012345678 90. */ + = ("{\"jsonrpc\": \"2.0\", \"method\": \"subtract\",\n" + /* 0 1 2 3 4. */ + /* 0 1234567 8901234567890 1234 56789012345678 90. */ + " \"params\": [42, 23], \"id\": 1}"); + const int line_1 = 1; + const int line_2 = 2; + const size_t line_2_offset = 41; + json::value *jv = parse_utf8_string (request, false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + line_2_offset + 28, line_2, 28); + delete jv; +} + +/* Verify that we can parse an empty JSON object. */ + +static void +test_parse_empty_object () +{ + const int line_1 = 1; + test_location_map loc_map; + error *err = NULL; + json::value *jv = parse_utf8_string ("{}", false, &err, &loc_map); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + ASSERT_EQ (jv->get_kind (), JSON_OBJECT); + assert_print_eq (*jv, "{}"); + location_map::range *range = loc_map.get_range_for_value (jv); + ASSERT_TRUE (range); + ASSERT_RANGE_EQ (*range, + 0, line_1, 0, + 1, line_1, 1); + delete jv; +} + +/* Verify that comment-parsing can be enabled or disabled. */ + +static void +test_parsing_comments () +{ + const char *str = ("// foo\n" + "/*...\n" + "...*/ 42 // bar\n" + "/* etc */\n"); + + /* Parsing with comment support disabled. */ + { + error *err = NULL; + json::value *jv = parse_utf8_string (str, false, &err, NULL); + ASSERT_NE (err, NULL); + ASSERT_STREQ (err->get_msg (), + "invalid JSON token: unexpected character: '/'"); + ASSERT_EQ (jv, NULL); + } + + /* Parsing with comment support enabled. */ + { + error *err = NULL; + json::value *jv = parse_utf8_string (str, true, &err, NULL); + ASSERT_EQ (err, NULL); + ASSERT_NE (jv, NULL); + ASSERT_EQ (jv->get_kind (), JSON_INTEGER); + ASSERT_EQ (((json::integer_number *)jv)->get (), 42); + delete jv; + } +} + +/* Verify that we can parse an empty JSON string. */ + +static void +test_error_empty_string () +{ + const int line_1 = 1; + error *err = NULL; + json::value *jv = parse_utf8_string ("", false, &err, NULL); + ASSERT_ERR_EQ (err, + 0, line_1, 0, + 0, line_1, 0, + "expected a JSON value but got EOF"); + ASSERT_EQ (jv, NULL); + delete err; +} + +/* Verify that JSON parsing gracefully handles an invalid token. */ + +static void +test_error_bad_token () +{ + const int line_1 = 1; + error *err = NULL; + json::value *jv = parse_utf8_string (" not valid ", false, &err, NULL); + ASSERT_ERR_EQ (err, + 2, line_1, 2, + 2, line_1, 2, + "invalid JSON token: unexpected character: 'n'"); + ASSERT_EQ (jv, NULL); + delete err; +} + +/* Verify that JSON parsing gracefully handles a missing comma + within an object. */ + +static void +test_error_object_with_missing_comma () +{ + const int line_1 = 1; + error *err = NULL; + /* 0 1 2. */ + /* 01 2345 6789012 3456 7890. */ + const char *json = "{\"foo\" : 42 \"bar\""; + json::value *jv = parse_utf8_string (json, false, &err, NULL); + ASSERT_ERR_EQ (err, + 12, line_1, 12, + 16, line_1, 16, + "expected ',' or '}'; got string"); + ASSERT_EQ (jv, NULL); + delete err; +} + +/* Verify that JSON parsing gracefully handles a missing comma + within an array. */ + +static void +test_error_array_with_missing_comma () +{ + const int line_1 = 1; + error *err = NULL; + /* 01234567. */ + const char *json = "[0, 1 42]"; + json::value *jv = parse_utf8_string (json, false, &err, NULL); + ASSERT_ERR_EQ (err, + 6, line_1, 6, + 7, line_1, 7, + "expected ',' or ']'; got number"); + ASSERT_EQ (jv, NULL); + delete err; +} + +/* Run all of the selftests within this file. */ + +void +json_parser_cc_tests () +{ + test_lexer (); + test_lexing_unsupported_single_line_comment (); + test_lexing_unsupported_multiline_comment (); + test_lexing_supported_single_line_comment (); + test_lexing_supported_multiline_comment (); + test_parse_string (); + test_parse_number (); + test_parse_array (); + test_parse_object (); + test_parse_literals (); + test_parse_jsonrpc (); + test_parse_empty_object (); + test_parsing_comments (); + test_error_empty_string (); + test_error_bad_token (); + test_error_object_with_missing_comma (); + test_error_array_with_missing_comma (); +} + +} // namespace selftest + +#endif /* #if CHECKING_P */ diff --git a/gcc/json-parsing.h b/gcc/json-parsing.h new file mode 100644 index 00000000000..7b1dd951395 --- /dev/null +++ b/gcc/json-parsing.h @@ -0,0 +1,94 @@ +/* JSON parsing + Copyright (C) 2017-2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_JSON_PARSING_H +#define GCC_JSON_PARSING_H + +#include "json.h" + +namespace json +{ + +/* Declarations for parsing JSON to a json::value * tree. */ + +/* Abstract base class for recording what the locations of JSON values + were as they parsed. */ + +class location_map +{ +public: + /* A point within the JSON input file. */ + struct point + { + size_t m_unichar_idx; /* zero-based. */ + int m_line; /* one-based. */ + int m_column; /* zero-based unichar count. */ + }; + + /* A range of points within the JSON input file. + Both endpoints are part of the range. */ + struct range + { + point m_start; + point m_end; + }; + + virtual ~location_map () {} + virtual void record_range_for_value (json::value *jv, const range &r) = 0; + virtual void on_finished_parsing () {} +}; + +/* Class for recording an error within a JSON file. */ + +class error +{ +public: + error (const location_map::range &r, char *msg) + : m_range (r), m_msg (msg) + { + } + ~error () + { + free (m_msg); + } + + const location_map::range &get_range () const { return m_range; } + const char *get_msg () const { return m_msg; } + +private: + location_map::range m_range; + char *m_msg; +}; + +/* Functions for parsing JSON buffers. */ + +extern value *parse_utf8_string (size_t length, + const char *utf8_buf, + bool allow_comments, + error **err_out, + location_map *out_loc_map); +extern value *parse_utf8_string (const char *utf8, + bool allow_comments, + error **err_out, + location_map *out_loc_map); + +} // namespace json + +#endif /* GCC_JSON_PARSING_H */ diff --git a/gcc/json.cc b/gcc/json.cc index 974f8c36825..9577a41432f 100644 --- a/gcc/json.cc +++ b/gcc/json.cc @@ -264,7 +264,7 @@ namespace selftest { /* Verify that JV->print () prints EXPECTED_JSON. */ -static void +void assert_print_eq (const json::value &jv, const char *expected_json) { pretty_printer pp; diff --git a/gcc/json.h b/gcc/json.h index f272981259b..dcb96f8e94c 100644 --- a/gcc/json.h +++ b/gcc/json.h @@ -27,8 +27,8 @@ along with GCC; see the file COPYING3. If not see and http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf and https://tools.ietf.org/html/rfc7159 - Supports creating a DOM-like tree of json::value *, and then dumping - json::value * to text. */ + Supports parsing text into a DOM-like tree of json::value *, directly + creating such trees, and dumping json::value * to text. */ namespace json { @@ -114,6 +114,11 @@ class array : public value void append (value *v); + value **begin () { return m_elements.begin (); } + value **end () { return m_elements.end (); } + size_t length () const { return m_elements.length (); } + value *get (size_t idx) const { return m_elements[idx]; } + private: auto_vec m_elements; }; @@ -188,4 +193,54 @@ class literal : public value } // namespace json +template <> +template <> +inline bool +is_a_helper ::test (json::value *jv) +{ + return jv->get_kind () == json::JSON_OBJECT; +} + +template <> +template <> +inline bool +is_a_helper ::test (json::value *jv) +{ + return jv->get_kind () == json::JSON_ARRAY; +} + +template <> +template <> +inline bool +is_a_helper ::test (json::value *jv) +{ + return jv->get_kind () == json::JSON_FLOAT; +} + +template <> +template <> +inline bool +is_a_helper ::test (json::value *jv) +{ + return jv->get_kind () == json::JSON_INTEGER; +} + +template <> +template <> +inline bool +is_a_helper ::test (json::value *jv) +{ + return jv->get_kind () == json::JSON_STRING; +} + +#if CHECKING_P + +namespace selftest { + +extern void assert_print_eq (const json::value &jv, const char *expected_json); + +} // namespace selftest + +#endif /* #if CHECKING_P */ + #endif /* GCC_JSON_H */ diff --git a/gcc/selftest-run-tests.cc b/gcc/selftest-run-tests.cc index d59e0aeddee..57e40a197e9 100644 --- a/gcc/selftest-run-tests.cc +++ b/gcc/selftest-run-tests.cc @@ -74,6 +74,7 @@ selftest::run_tests () opt_suggestions_cc_tests (); opts_cc_tests (); json_cc_tests (); + json_parser_cc_tests (); cgraph_cc_tests (); optinfo_emit_json_cc_tests (); ordered_hash_map_tests_cc_tests (); diff --git a/gcc/selftest.h b/gcc/selftest.h index 7568a6d24d4..9d67490438e 100644 --- a/gcc/selftest.h +++ b/gcc/selftest.h @@ -237,6 +237,7 @@ extern void hash_map_tests_cc_tests (); extern void hash_set_tests_cc_tests (); extern void input_cc_tests (); extern void json_cc_tests (); +extern void json_parser_cc_tests (); extern void optinfo_emit_json_cc_tests (); extern void opts_cc_tests (); extern void ordered_hash_map_tests_cc_tests (); From patchwork Wed Jun 22 22:34:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646805 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WRkL9yFG; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSyt00xFHz9sGp for ; Thu, 23 Jun 2022 08:36:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B9E973830655 for ; Wed, 22 Jun 2022 22:36:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B9E973830655 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937397; bh=RpSgFjn2ikP0Br7hASpD1h9nZCFz5ZmhyHk7Uy2alLg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=WRkL9yFGrDZOfUvjwLf0KzXx3mpVpTx4+Zy16dylQm/fcr0to0mISwtz8ffi66hU0 x8jWsJWHSNZCZmJEsg3qItRWZAU4ljrFS7u69f09q9i+5/QA9XmlMVXBCIdcUr8aCQ DbABqCb0F2Pbo9QsB5xjG6reOMuYi3d/oq6QPtDY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id F01983830659 for ; Wed, 22 Jun 2022 22:34:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F01983830659 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-596-37yaC_LyOaSL_2OHvpT0Lg-1; Wed, 22 Jun 2022 18:34:49 -0400 X-MC-Unique: 37yaC_LyOaSL_2OHvpT0Lg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 870BE8032E7 for ; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 670B41121314; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 05/12] Placeholder libcpp fixups Date: Wed, 22 Jun 2022 18:34:40 -0400 Message-Id: <20220622223447.2462880-6-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Obviously this isn't quite ready for trunk yet. libcpp/ChangeLog: * include/line-map.h (rich_location::maybe_add_fixit): Make public. * line-map.cc (linemap_add): Hack away assertion about LC_RENAME for now. Signed-off-by: David Malcolm --- libcpp/include/line-map.h | 7 ++++--- libcpp/line-map.cc | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h index 80335721e03..c27d8a6fdcd 100644 --- a/libcpp/include/line-map.h +++ b/libcpp/include/line-map.h @@ -1799,13 +1799,14 @@ class rich_location bool escape_on_output_p () const { return m_escape_on_output; } void set_escape_on_output (bool flag) { m_escape_on_output = flag; } -private: - bool reject_impossible_fixit (location_t where); - void stop_supporting_fixits (); void maybe_add_fixit (location_t start, location_t next_loc, const char *new_content); +private: + bool reject_impossible_fixit (location_t where); + void stop_supporting_fixits (); + public: static const int STATICALLY_ALLOCATED_RANGES = 3; diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc index 62077c3857c..82f27280ea7 100644 --- a/libcpp/line-map.cc +++ b/libcpp/line-map.cc @@ -496,10 +496,11 @@ linemap_add (line_maps *set, enum lc_reason reason, linemap_assert (!LINEMAPS_ORDINARY_USED (set) || (start_location >= MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set)))); - +#if 0 /* When we enter the file for the first time reason cannot be LC_RENAME. */ linemap_assert (!(set->depth == 0 && reason == LC_RENAME)); +#endif /* If we are leaving the main file, return a NULL map. */ if (reason == LC_LEAVE From patchwork Wed Jun 22 22:34:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646812 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=CmHMWx6W; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSz204QD1z9sGp for ; Thu, 23 Jun 2022 08:43:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ADE7F38303C0 for ; Wed, 22 Jun 2022 22:43:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ADE7F38303C0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937812; bh=t2/O+tQQxPxqmEMCUE3tbxzKPpTKBV7VCY04fcZgQR4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=CmHMWx6WyRwcQCcfe3OeRk049rjmx01hwkJp8kV4fdVFI6Cvw6HigMGc1KkRYUz1v idodAFFr3DvwKVQyxeeE7JYHPe+erciNPXqFC0w7/Nl5bBJFWaH98+/uGnZ3Xp7eGp rEQ1CdKVJ9Uyg9nLlgBa1JuRtxFj/q1thCkOnNEU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 0AD2B3830666 for ; Wed, 22 Jun 2022 22:34:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0AD2B3830666 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-369-MIM_KHwHPOCPZS3T84PZDg-1; Wed, 22 Jun 2022 18:34:50 -0400 X-MC-Unique: MIM_KHwHPOCPZS3T84PZDg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B562D8032F0 for ; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 93B2E1121314; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 06/12] prune.exp: move multiline-handling to before other pruning Date: Wed, 22 Jun 2022 18:34:41 -0400 Message-Id: <20220622223447.2462880-7-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Doing so allows for multiline directives to contain things like "note: " which would otherwise already have been pruned. gcc/testsuite/ChangeLog: * lib/prune.exp (prune_gcc_output): Move multiline-handling to before other pruning. Signed-off-by: David Malcolm --- gcc/testsuite/lib/prune.exp | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp index 04c6a1dd7a1..fd6584d0e7d 100644 --- a/gcc/testsuite/lib/prune.exp +++ b/gcc/testsuite/lib/prune.exp @@ -40,6 +40,13 @@ proc prune_gcc_output { text } { # Remove Windows .exe suffix regsub -all "(as|cc1|cc1plus|collect2|f951|ld|lto-wrapper)\.exe?:" $text {\1:} text + # If dg-enable-nn-line-numbers was provided, then obscure source-margin + # line numbers by converting them to "NN" form. + set text [maybe-handle-nn-line-numbers $text] + + # Call into multiline.exp to handle any multiline output directives. + set text [handle-multiline-outputs $text] + regsub -all "(^|\n)(\[^\n\]*: \[iI\]|I)n ((static member |lambda )?function|member|method|(copy )?constructor|destructor|instantiation|substitution|program|subroutine|block-data)\[^\n\]*" $text "" text regsub -all "(^|\n)\[^\n\]*(: )?At (top level|global scope):\[^\n\]*" $text "" text regsub -all "(^|\n)\[^\n\]*: (recursively )?required \[^\n\]*" $text "" text @@ -108,13 +115,6 @@ proc prune_gcc_output { text } { # Many tests that use visibility will still pass on platforms that don't support it. regsub -all "(^|\n)\[^\n\]*lto1: warning: visibility attribute not supported in this configuration; ignored\[^\n\]*" $text "" text - # If dg-enable-nn-line-numbers was provided, then obscure source-margin - # line numbers by converting them to "NN" form. - set text [maybe-handle-nn-line-numbers $text] - - # Call into multiline.exp to handle any multiline output directives. - set text [handle-multiline-outputs $text] - #send_user "After:$text\n" return $text From patchwork Wed Jun 22 22:34:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646809 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=q/DTgYPs; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSyyk0LNXz9sGp for ; Thu, 23 Jun 2022 08:40:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B6A633881873 for ; Wed, 22 Jun 2022 22:40:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B6A633881873 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937641; bh=eFwZVlsl+KI8Uvj2W7la7XxmiVSHiJCWCM9HAQ/GTeg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=q/DTgYPsP+tuGa7ujiSXI60ddFnrrlBH3ZlOa7r1DVSSamOF/4mJFTh4XK5jD0rZ8 JPp2nnGWL8yISMN3HLINIiFKqYRX6OYaNbAEUw00smRDvNvq0iVuMGZcoQ+0wcxgJx DNT7KOKlrIMhzTgZ+VodcyuBjwwGtooBXefYT4Do= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 64D93383066B for ; Wed, 22 Jun 2022 22:34:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 64D93383066B Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-cNnQRQ9mMQmWVS2SsObJcw-1; Wed, 22 Jun 2022 18:34:50 -0400 X-MC-Unique: cNnQRQ9mMQmWVS2SsObJcw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E703989C7DD for ; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id C37DF1121314; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 07/12] Add deferred-locations.h/cc Date: Wed, 22 Jun 2022 18:34:42 -0400 Message-Id: <20220622223447.2462880-8-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" libcpp requires locations to be created as if by a tokenizer, creating them by filename, in ascending order of line/column. This patch adds support classes that allow the creation of locations in arbitrary orders, by deferring all location creation, grouping things up by filename/line, and then creating the linemap entries in a post-processing phase. gcc/ChangeLog: * deferred-locations.cc: New file, adapted from code in jit/jit-playback.cc. * deferred-locations.h: New file. Signed-off-by: David Malcolm --- gcc/deferred-locations.cc | 231 ++++++++++++++++++++++++++++++++++++++ gcc/deferred-locations.h | 52 +++++++++ 2 files changed, 283 insertions(+) create mode 100644 gcc/deferred-locations.cc create mode 100644 gcc/deferred-locations.h diff --git a/gcc/deferred-locations.cc b/gcc/deferred-locations.cc new file mode 100644 index 00000000000..e78b29a4d58 --- /dev/null +++ b/gcc/deferred-locations.cc @@ -0,0 +1,231 @@ +/* Support for deferred creation of location_t values. + Copyright (C) 2013-2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "deferred-locations.h" + +/* Concrete implementation for use by deferred_locations. */ + +class deferred_locations_impl +{ +public: + class source_file; + class source_line; + class source_location; + + /* A specific location on a source line, with a saved location_t * + to write back to. */ + class source_location + { + public: + source_location (int column_num, location_t *out_loc) + : m_column_num (column_num), m_out_loc (out_loc) + { + } + + int get_column_num () const { return m_column_num; } + + /* qsort comparator for comparing pairs of source_location *, + ordering them by column number. */ + + static int + comparator (const void *lhs, const void *rhs) + { + const source_location &location_lhs + = *static_cast (lhs); + const source_location &location_rhs + = *static_cast (rhs); + return location_lhs.m_column_num - location_rhs.m_column_num; + } + + void generate_location_t_value () const + { + *m_out_loc = linemap_position_for_column (line_table, m_column_num); + } + + private: + int m_column_num; + location_t *m_out_loc; + }; + + /* A source line, with one or more locations of interest. */ + class source_line + { + public: + source_line (int line_num) : m_line_num (line_num) {} + + source_location * + get_location (int column_num, location_t *out_loc); + + int get_line_num () const { return m_line_num; } + + void add_location (const expanded_location &exploc, + location_t *out_loc) + { + m_locations.safe_push (source_location (exploc.column, out_loc)); + } + + /* qsort comparator for comparing pairs source_line *, + ordering them by line number. */ + + static int + comparator (const void *lhs, const void *rhs) + { + const source_line *line_lhs + = *static_cast (lhs); + const source_line *line_rhs + = *static_cast (rhs); + return line_lhs->get_line_num () - line_rhs->get_line_num (); + } + + void generate_location_t_values () + { + /* Determine maximum column within this line. */ + m_locations.qsort (source_location::comparator); + gcc_assert (m_locations.length () > 0); + source_location *final_column = &m_locations[m_locations.length () - 1]; + int max_col = final_column->get_column_num (); + + linemap_line_start (line_table, m_line_num, max_col); + for (auto loc_iter : m_locations) + loc_iter.generate_location_t_value (); + } + + private: + int m_line_num; + auto_vec m_locations; + }; + + /* A set of locations, all sharing a filename */ + class source_file + { + public: + source_file (const char *filename) + : m_filename (xstrdup (filename)) + { + } + ~source_file () + { + free (m_filename); + } + + source_line * + get_source_line (int line_num); + + const char* + get_filename () const { return m_filename; } + + bool + matches (const char *filename) + { + return ((filename == NULL && m_filename == NULL) + || ((filename && m_filename) + && 0 == strcmp (filename, m_filename))); + } + + void add_location (const expanded_location &exploc, + location_t *out_loc) + { + source_line *line = get_or_create_line (exploc.line); + line->add_location (exploc, out_loc); + } + + void generate_location_t_values () + { + linemap_add (line_table, LC_ENTER, false, xstrdup (m_filename), 0); + + /* Sort lines by ascending line numbers. */ + m_source_lines.qsort (source_line::comparator); + + for (auto line_iter : m_source_lines) + line_iter->generate_location_t_values (); + + linemap_add (line_table, LC_LEAVE, false, NULL, 0); + } + + private: + source_line *get_or_create_line (int line_num) + { + // FIXME: something better than linear search here? + for (auto iter : m_source_lines) + if (line_num == iter->get_line_num ()) + return iter; + source_line *line = new source_line (line_num); + m_source_lines.safe_push (line); + return line; + } + + char *m_filename; + auto_delete_vec m_source_lines; + }; + + void add_location (const expanded_location &exploc, + location_t *out_loc) + { + source_file *f = get_or_create_file (exploc.file); + f->add_location (exploc, out_loc); + } + + void generate_location_t_values () + { + for (auto file_iter : m_source_files) + file_iter->generate_location_t_values (); + } + +private: + source_file *get_or_create_file (const char *filename) + { + // FIXME: something better than linear search here? + for (auto iter : m_source_files) + if (iter->matches (filename)) + return iter; + source_file *f = new source_file (filename); + m_source_files.safe_push (f); + return f; + } + auto_delete_vec m_source_files; +}; + +/* class deferred_locations. */ + +deferred_locations::deferred_locations () +: m_pimpl (new deferred_locations_impl ()) +{ +} + +deferred_locations::~deferred_locations () +{ + delete m_pimpl; +} + +void +deferred_locations::add_location (const expanded_location &exploc, + location_t *out_loc) +{ + m_pimpl->add_location (exploc, out_loc); +} + +void +deferred_locations::generate_location_t_values () +{ + m_pimpl->generate_location_t_values (); +} diff --git a/gcc/deferred-locations.h b/gcc/deferred-locations.h new file mode 100644 index 00000000000..97d962aa613 --- /dev/null +++ b/gcc/deferred-locations.h @@ -0,0 +1,52 @@ +/* Support for deferred creation of location_t values. + Copyright (C) 2013-2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef DEFERRED_LOCATIONS_H +#define DEFERRED_LOCATIONS_H + +class deferred_locations_impl; + +/* Dealing with the linemap API. + + libcpp requires locations to be created as if by + a tokenizer, creating them by filename, in ascending order of + line/column. + + This class is for supporting code that allows the creation of locations + in arbitrary orders, by deferring all location creation, + grouping things up by filename/line, and then creating the linemap + entries in a post-processing phase. */ + +class deferred_locations +{ + public: + deferred_locations (); + ~deferred_locations (); + + void add_location (const expanded_location &exploc, + location_t *loc); + + void generate_location_t_values (); + + private: + deferred_locations_impl *m_pimpl; +}; + +#endif /* DEFERRED_LOCATIONS_H */ From patchwork Wed Jun 22 22:34:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646810 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=J3gqDfoN; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSyzn1SR7z9sGp for ; Thu, 23 Jun 2022 08:41:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C3D323881873 for ; Wed, 22 Jun 2022 22:41:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C3D323881873 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937698; bh=Sbjpn+eVAp/4TyIfZ1t1ZGD6ivrK0yowCnfzfe6Sduo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=J3gqDfoN5JayGe5eoTW+z0/ZmBohyiVBS/cUJ9ztCluVTDPf5HgSqFxQy+zh24HVu qAiXBbZ6HZuLswhcrXVB1cMgBFtxzDzAmOBe3i6C7tVBAL5yzYd9AYiEoMQ62cULIH ZaHBhyu1U9ffRRsXqdEdMpE0AjjI0EiUqlVWjtmE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id A2D163830654 for ; Wed, 22 Jun 2022 22:34:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A2D163830654 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-ZkNWAfjAPWC-wFU1srMogw-1; Wed, 22 Jun 2022 18:34:50 -0400 X-MC-Unique: ZkNWAfjAPWC-wFU1srMogw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 202671C06907 for ; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 001C01121314; Wed, 22 Jun 2022 22:34:49 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 08/12] Add json-reader.h/cc Date: Wed, 22 Jun 2022 18:34:43 -0400 Message-Id: <20220622223447.2462880-9-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds classes that better integrate the JSON parser with GCC's diagnostic subsystem (e.g. line_maps). gcc/ChangeLog: * json-reader.cc: New file. * json-reader.h: New file. Signed-off-by: David Malcolm --- gcc/json-reader.cc | 122 +++++++++++++++++++++++++++++++++++++++++++++ gcc/json-reader.h | 107 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 229 insertions(+) create mode 100644 gcc/json-reader.cc create mode 100644 gcc/json-reader.h diff --git a/gcc/json-reader.cc b/gcc/json-reader.cc new file mode 100644 index 00000000000..e4fbd0db803 --- /dev/null +++ b/gcc/json-reader.cc @@ -0,0 +1,122 @@ +/* Integration of JSON parsing with GCC diagnostics. + Copyright (C) 2022 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "diagnostic.h" +#include "json-reader.h" + +/* Read the contents of PATH into memory, returning a 0-terminated buffer + that must be freed by the caller. + Issue a fatal error if there are any problems. */ + +char * +read_file (const char *path) +{ + FILE *f_in = fopen (path, "r"); + if (!f_in) + fatal_error (UNKNOWN_LOCATION, "unable to open file %qs: %s", + path, xstrerror (errno)); + + /* Read content, allocating a buffer for it. */ + char *result = NULL; + size_t total_sz = 0; + size_t alloc_sz = 0; + char buf[4096]; + size_t iter_sz_in; + + while ( (iter_sz_in = fread (buf, 1, sizeof (buf), f_in)) ) + { + gcc_assert (alloc_sz >= total_sz); + size_t old_total_sz = total_sz; + total_sz += iter_sz_in; + /* Allow 1 extra byte for 0-termination. */ + if (alloc_sz < (total_sz + 1)) + { + size_t new_alloc_sz = alloc_sz ? alloc_sz * 2: total_sz + 1; + result = (char *)xrealloc (result, new_alloc_sz); + alloc_sz = new_alloc_sz; + } + memcpy (result + old_total_sz, buf, iter_sz_in); + } + + if (!feof (f_in)) + fatal_error (UNKNOWN_LOCATION, "error reading from %qs: %s", path, + xstrerror (errno)); + + fclose (f_in); + + /* 0-terminate the buffer. */ + gcc_assert (total_sz < alloc_sz); + result[total_sz] = '\0'; + + return result; +} + +/* json_reader's ctor. */ + +json_reader::json_reader (const char *filename) +: m_filename (filename), + m_json_loc_map (filename, line_table) +{ +} + +/* Parse UTF8, capturing source location information in m_json_loc_map. + If successful, return the top-level value. + Otherwise, return NULL and write to *ERR_OUT. */ + +json::value * +json_reader::parse_utf8_string (const char *utf8, bool allow_comments, + json::error **err_out) +{ + json::value *result + = json::parse_utf8_string (utf8, allow_comments, err_out, &m_json_loc_map); + return result; +} + +/* Issue an error diagnostic for GMSGID at the location of JV, and exit. */ + +void +json_reader::fatal_error (json::value *jv, const char *gmsgid, ...) +{ + location_t loc = m_json_loc_map.get_range_for_value (jv); + + auto_diagnostic_group d; + va_list ap; + va_start (ap, gmsgid); + rich_location richloc (line_table, loc); + emit_diagnostic_valist (DK_ERROR, &richloc, NULL, 0, gmsgid, &ap); + va_end (ap); + exit (1); + /* Ideally we'd use ::fatal_error here, but we seem to need to use + DK_ERROR for it to be usable from DejaGnu. */ +} + +/* Issue an error diagnostic for ERR, and exit. */ + +void +json_reader::fatal_error (json::error *err) +{ + location_t loc = m_json_loc_map.make_location_for_range (err->get_range ()); + ::error_at (loc, "%s", err->get_msg ()); + exit (1); + /* Ideally we'd use ::fatal_error here, but we seem to need to use + DK_ERROR for it to be usable from DejaGnu. */ +} diff --git a/gcc/json-reader.h b/gcc/json-reader.h new file mode 100644 index 00000000000..4537e29cb6a --- /dev/null +++ b/gcc/json-reader.h @@ -0,0 +1,107 @@ +/* Integration of JSON parsing with GCC diagnostics. + Copyright (C) 2022 David Malcolm . + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_JSON_READER_H +#define GCC_JSON_READER_H + +#include "json-parsing.h" + +/* Concrete implementation of json::location_map that integrates + with a line_table, creating location_t values for the locations + in the JSON file. */ + +class json_line_map : public json::location_map +{ +public: + json_line_map (const char *filename, + line_maps *line_table) + : m_filename (filename), + m_line_table (line_table) + { + linemap_add (m_line_table, LC_ENTER, false, xstrdup (m_filename), 0); + } + + void record_range_for_value (json::value *jv, const range &r) final override + { + location_t loc = make_location_for_range (r); + m_map.put (jv, loc); + } + + void on_finished_parsing () final override + { + linemap_add (m_line_table, LC_LEAVE, false, NULL, 0); + } + + location_t get_range_for_value (json::value *jv) + { + if (location_t *slot = m_map.get (jv)) + return *slot; + return UNKNOWN_LOCATION; + } + + location_t make_location_for_range (const range &r) + { + location_t start = make_location_for_point (r.m_start); + location_t end = make_location_for_point (r.m_end); + return make_location (start, start, end); + } + +private: + location_t make_location_for_point (const point &p) + { + /* json::location_map::point columns are zero-based, + whereas libcpp/gcc columns are 1-based. */ + const int gcc_column = p.m_column + 1; + const int max_column = MAX (1024, gcc_column); + linemap_line_start (m_line_table, p.m_line, max_column); + return linemap_position_for_column (m_line_table, gcc_column); + } + + const char *m_filename; + hash_map m_map; + line_maps *m_line_table; +}; + +/* Class for reading a JSON file, capturing location_t values for + the json::values, and emitting fatal error messages. */ + +class json_reader +{ +public: + json_reader (const char *filename); + json::value *parse_utf8_string (const char *utf8, bool allow_comments, + json::error **err_out); + + void fatal_error (json::value *jv, + const char *gmsgid, ...) + ATTRIBUTE_GCC_DIAG(3,4) + ATTRIBUTE_NORETURN; + + void fatal_error (json::error *err) + ATTRIBUTE_NORETURN; + +protected: + const char *m_filename; + json_line_map m_json_loc_map; +}; + +extern char *read_file (const char *path); + +#endif /* GCC_JSON_READER_H */ From patchwork Wed Jun 22 22:34:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646815 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=wQ1MQhAQ; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSz5R3Vfwz9sGp for ; Thu, 23 Jun 2022 08:46:33 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AEEB2388457F for ; Wed, 22 Jun 2022 22:46:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AEEB2388457F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937991; bh=iNTdT4IOygJ43cp63KfbbK/yTCNGCbkpacMiS6GVClU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=wQ1MQhAQzL8FYYXv7yTmN1oIizTDC5RsfVO+7Hz+1ZbMuV3mfQdC3oKPTaVtwsV4u J8MbbPI3LvKH6Xp3xOMYuZ/QgdtYKG79V35DpKshbIKBtKN8pyM6vnY2DrEcCWuOc9 iG5dFENVeR1U7qqMKWjkJTueCKW8qO2/9P9GaZKQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id C477B383065E for ; Wed, 22 Jun 2022 22:34:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C477B383065E Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-394-QYK4UwA7PkWEgdgl3KrA2A-1; Wed, 22 Jun 2022 18:34:50 -0400 X-MC-Unique: QYK4UwA7PkWEgdgl3KrA2A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5275D89C7DE for ; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30DA21121314; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 09/12] Add json frontend Date: Wed, 22 Jun 2022 18:34:44 -0400 Message-Id: <20220622223447.2462880-10-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds a new json frontend: json-replayer, which the gcc driver can invoke on .json files saved with -fdiagnostics-format=json-file. gcc/ChangeLog: * json/Make-lang.in: New file. * json/config-lang.in: New file. * json/json-frontend.cc: New file. * json/json-replay.cc: New file. * json/json-replay.h: New file. * json/lang-specs.h: New file. * json/lang.opt: New file. gcc/testsuite/ChangeLog: * json/invalid-json-array-missing-comma.json: New test. * json/invalid-json-array-with-trailing-comma.json: New test. * json/invalid-json-bad-token.json: New test. * json/invalid-json-object-missing-comma.json: New test. * json/invalid-json-object-with-trailing-comma.json: New test. * json/invalid-jsondump-diag-not-an-object.json: New test. * json/invalid-jsondump-kind-not-a-string.json: New test. * json/invalid-jsondump-not-an-array.json: New test. * json/json.exp: New test. * json/signal-1.c.json: New test. * lib/json-dg.exp: New test. * lib/json.exp: New test. Signed-off-by: David Malcolm --- gcc/json/Make-lang.in | 131 ++++ gcc/json/config-lang.in | 34 + gcc/json/json-frontend.cc | 176 +++++ gcc/json/json-replay.cc | 614 ++++++++++++++++++ gcc/json/json-replay.h | 26 + gcc/json/lang-specs.h | 26 + gcc/json/lang.opt | 31 + .../invalid-json-array-missing-comma.json | 6 + ...nvalid-json-array-with-trailing-comma.json | 6 + .../json/invalid-json-bad-token.json | 6 + .../invalid-json-object-missing-comma.json | 7 + ...valid-json-object-with-trailing-comma.json | 6 + .../invalid-jsondump-diag-not-an-object.json | 6 + .../invalid-jsondump-kind-not-a-string.json | 20 + .../json/invalid-jsondump-not-an-array.json | 6 + gcc/testsuite/json/json.exp | 50 ++ gcc/testsuite/json/signal-1.c.json | 131 ++++ gcc/testsuite/lib/json-dg.exp | 233 +++++++ gcc/testsuite/lib/json.exp | 36 + 19 files changed, 1551 insertions(+) create mode 100644 gcc/json/Make-lang.in create mode 100644 gcc/json/config-lang.in create mode 100644 gcc/json/json-frontend.cc create mode 100644 gcc/json/json-replay.cc create mode 100644 gcc/json/json-replay.h create mode 100644 gcc/json/lang-specs.h create mode 100644 gcc/json/lang.opt create mode 100644 gcc/testsuite/json/invalid-json-array-missing-comma.json create mode 100644 gcc/testsuite/json/invalid-json-array-with-trailing-comma.json create mode 100644 gcc/testsuite/json/invalid-json-bad-token.json create mode 100644 gcc/testsuite/json/invalid-json-object-missing-comma.json create mode 100644 gcc/testsuite/json/invalid-json-object-with-trailing-comma.json create mode 100644 gcc/testsuite/json/invalid-jsondump-diag-not-an-object.json create mode 100644 gcc/testsuite/json/invalid-jsondump-kind-not-a-string.json create mode 100644 gcc/testsuite/json/invalid-jsondump-not-an-array.json create mode 100644 gcc/testsuite/json/json.exp create mode 100644 gcc/testsuite/json/signal-1.c.json create mode 100644 gcc/testsuite/lib/json-dg.exp create mode 100644 gcc/testsuite/lib/json.exp diff --git a/gcc/json/Make-lang.in b/gcc/json/Make-lang.in new file mode 100644 index 00000000000..a62d028f41a --- /dev/null +++ b/gcc/json/Make-lang.in @@ -0,0 +1,131 @@ +# Make-lang.in -- Top level -*- makefile -*- fragment for gcc JSON "frontend". + +# Copyright (C) 2022 Free Software Foundation, Inc. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. + +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# This file provides the language dependent support in the main Makefile. + +# The name for selecting json in LANGUAGES. +json: json-replay$(exeext) + +.PHONY: json + +JSON_OBJS = json/json-frontend.o json/json-replay.o \ + attribs.o deferred-locations.o json-reader.o +json_OBJS = $(JSON_OBJS) + +json-replay$(exeext): $(JSON_OBJS) $(BACKEND) $(LIBDEPS) + @$(call LINK_PROGRESS,$(INDEX.json),start) + +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \ + $(JSON_OBJS) $(BACKEND) $(LIBS) $(BACKENDLIBS) + @$(call LINK_PROGRESS,$(INDEX.json),end) + +# Build hooks. + +json.all.cross: +json.start.encap: +json.rest.encap: + +json.info: +json.man: + +lang_checks += check-json + +# No json-specific selftests +selftest-json: + +# Install hooks. + +json.install-common: installdirs + -rm -f $(DESTDIR)$(bindir)/$(GCCJSON_INSTALL_NAME)$(exeext) + $(INSTALL_PROGRAM) gccjson$(exeext) $(DESTDIR)$(bindir)/$(GCCJSON_INSTALL_NAME)$(exeext) + -if test -f json-replay$(exeext); then \ + if test -f gccjson-cross$(exeext); then \ + :; \ + else \ + rm -f $(DESTDIR)$(bindir)/$(GCCJSON_TARGET_INSTALL_NAME)$(exeext); \ + ( cd $(DESTDIR)$(bindir) && \ + $(LN) $(GCCJSON_INSTALL_NAME)$(exeext) $(GCCJSON_TARGET_INSTALL_NAME)$(exeext) ); \ + fi; \ + fi + +json.install-plugin: + +json.install-info: $(DESTDIR)$(infodir)/gccjson.info + +json.install-pdf: doc/gccjson.pdf + @$(NORMAL_INSTALL) + test -z "$(pdfdir)" || $(mkinstalldirs) "$(DESTDIR)$(pdfdir)/gcc" + @for p in doc/gccjson.pdf; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + f=$(pdf__strip_dir) \ + echo " $(INSTALL_DATA) '$$d$$p' '$(DESTDIR)$(pdfdir)/gcc/$$f'"; \ + $(INSTALL_DATA) "$$d$$p" "$(DESTDIR)$(pdfdir)/gcc/$$f"; \ + done + +json.install-html: $(build_htmldir)/json + @$(NORMAL_INSTALL) + test -z "$(htmldir)" || $(mkinstalldirs) "$(DESTDIR)$(htmldir)" + @for p in $(build_htmldir)/json; do \ + if test -f "$$p" || test -d "$$p"; then d=""; else d="$(srcdir)/"; fi; \ + f=$(html__strip_dir) \ + if test -d "$$d$$p"; then \ + echo " $(mkinstalldirs) '$(DESTDIR)$(htmldir)/$$f'"; \ + $(mkinstalldirs) "$(DESTDIR)$(htmldir)/$$f" || exit 1; \ + echo " $(INSTALL_DATA) '$$d$$p'/* '$(DESTDIR)$(htmldir)/$$f'"; \ + $(INSTALL_DATA) "$$d$$p"/* "$(DESTDIR)$(htmldir)/$$f"; \ + else \ + echo " $(INSTALL_DATA) '$$d$$p' '$(DESTDIR)$(htmldir)/$$f'"; \ + $(INSTALL_DATA) "$$d$$p" "$(DESTDIR)$(htmldir)/$$f"; \ + fi; \ + done + +json.install-man: $(DESTDIR)$(man1dir)/$(GCCJSON_INSTALL_NAME)$(man1ext) + + +json.uninstall: + rm -rf $(DESTDIR)$(bindir)/$(GCCJSON_INSTALL_NAME)$(exeext) + rm -rf $(DESTDIR)$(man1dir)/$(GCCJSON_INSTALL_NAME)$(man1ext) + rm -rf $(DESTDIR)$(bindir)/$(GCCJSON_TARGET_INSTALL_NAME)$(exeext) + rm -rf $(DESTDIR)$(infodir)/gccjson.info* + +# Clean hooks. + +json.mostlyclean: + -rm -f json/*$(objext) + -rm -f json/*$(coverageexts) + -rm -f gccjson$(exeext) gccjson-cross$(exeext) json-replay$(exeext) +json.clean: +json.distclean: +json.maintainer-clean: + -rm -f $(docobjdir)/gccjson.1 + +# Stage hooks. + +json.stage1: stage1-start + -mv json/*$(objext) stage1/json +json.stage2: stage2-start + -mv json/*$(objext) stage2/json +json.stage3: stage3-start + -mv json/*$(objext) stage3/json +json.stage4: stage4-start + -mv json/*$(objext) stage4/json +json.stageprofile: stageprofile-start + -mv json/*$(objext) stageprofile/json +json.stagefeedback: stagefeedback-start + -mv json/*$(objext) stagefeedback/json diff --git a/gcc/json/config-lang.in b/gcc/json/config-lang.in new file mode 100644 index 00000000000..c1b3593570c --- /dev/null +++ b/gcc/json/config-lang.in @@ -0,0 +1,34 @@ +# config-lang.in -- Top level configure fragment for gcc JSON "frontend". + +# Copyright (C) 2022 Free Software Foundation, Inc. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. + +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# Configure looks for the existence of this file to auto-config each language. +# We define several parameters used by configure: +# +# language - name of language as it would appear in $(LANGUAGES) +# compilers - value to add to $(COMPILERS) + +language="json" + +compilers="json-replay\$(exeext)" + +gtfiles="\$(srcdir)/json/json-frontend.cc" + +# Build by default. +build_by_default="yes" diff --git a/gcc/json/json-frontend.cc b/gcc/json/json-frontend.cc new file mode 100644 index 00000000000..edf72282d96 --- /dev/null +++ b/gcc/json/json-frontend.cc @@ -0,0 +1,176 @@ +/* The dummy "frontend" for re-emitting diagnostics saved in JSON form. + Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "debug.h" +#include "langhooks.h" +#include "langhooks-def.h" +#include "json/json-replay.h" +#include "opts.h" +#include "diagnostic.h" + +/* Placeholder implementation; needed by a frontend. */ + +tree +convert (tree, tree) +{ + gcc_unreachable (); + return NULL_TREE; +} + +/* Language-dependent contents of a type. */ + +struct GTY(()) lang_type +{ + char dummy; +}; + +/* Language-dependent contents of a decl. */ + +struct GTY((variable_size)) lang_decl +{ + char dummy; +}; + +/* Language-dependent contents of an identifier. This must include a + tree_identifier. */ + +struct GTY(()) lang_identifier +{ + struct tree_identifier common; +}; + +/* The resulting tree type. */ + +union GTY((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE"), + chain_next ("CODE_CONTAINS_STRUCT (TREE_CODE (&%h.generic), TS_COMMON) ? ((union lang_tree_node *) TREE_CHAIN (&%h.generic)) : NULL"))) +lang_tree_node +{ + union tree_node GTY((tag ("0"), + desc ("tree_node_structure (&%h)"))) generic; + struct lang_identifier GTY((tag ("1"))) identifier; +}; + +/* We don't use language_function. */ + +struct GTY(()) language_function +{ + int dummy; +}; + +/* Language hooks. */ + +static bool +json_langhook_init (void) +{ + replay_json (main_input_filename); + return false; +} + +static unsigned int +json_langhook_option_lang_mask (void) +{ + return CL_JSON; +} + +static bool +json_langhook_handle_option (size_t scode, + const char *arg, + HOST_WIDE_INT value, + int kind, + location_t loc, + const struct cl_option_handlers *handlers) +{ + bool result = true; + + switch (scode) + { + default: + if (cl_options[scode].flags & json_langhook_option_lang_mask ()) + break; + result = false; + } + + JSON_handle_option_auto (&global_options, &global_options_set, + scode, arg, value, + json_langhook_option_lang_mask (), kind, + loc, handlers, global_dc); + + return result; +} + +static tree +json_langhook_type_for_mode (machine_mode, int) +{ + gcc_unreachable (); + return NULL_TREE; +} + +static bool +json_langhook_global_bindings_p (void) +{ + return true; +} + +static tree +json_langhook_pushdecl (tree decl ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +static tree +json_langhook_getdecls (void) +{ + return NULL; +} +#undef LANG_HOOKS_NAME +#define LANG_HOOKS_NAME "json" + +#undef LANG_HOOKS_INIT +#define LANG_HOOKS_INIT json_langhook_init + +#undef LANG_HOOKS_OPTION_LANG_MASK +#define LANG_HOOKS_OPTION_LANG_MASK json_langhook_option_lang_mask + +#undef LANG_HOOKS_HANDLE_OPTION +#define LANG_HOOKS_HANDLE_OPTION json_langhook_handle_option + +#undef LANG_HOOKS_TYPE_FOR_MODE +#define LANG_HOOKS_TYPE_FOR_MODE json_langhook_type_for_mode + +#undef LANG_HOOKS_GLOBAL_BINDINGS_P +#define LANG_HOOKS_GLOBAL_BINDINGS_P json_langhook_global_bindings_p + +#undef LANG_HOOKS_PUSHDECL +#define LANG_HOOKS_PUSHDECL json_langhook_pushdecl + +#undef LANG_HOOKS_GETDECLS +#define LANG_HOOKS_GETDECLS json_langhook_getdecls + +#undef LANG_HOOKS_DEEP_UNSHARING +#define LANG_HOOKS_DEEP_UNSHARING true + +struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; + +#include "gt-json-json-frontend.h" +#include "gtype-json.h" diff --git a/gcc/json/json-replay.cc b/gcc/json/json-replay.cc new file mode 100644 index 00000000000..d24d2b63e15 --- /dev/null +++ b/gcc/json/json-replay.cc @@ -0,0 +1,614 @@ +/* Re-emitting diagnostics saved in JSON form. + Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "diagnostic.h" +#include "diagnostic-metadata.h" +#include "diagnostic-path.h" +#include "line-map.h" +#include "stringpool.h" +#include "gcc-rich-location.h" +#include "json-parsing.h" +#include "json-reader.h" +#include "deferred-locations.h" +#include "diagnostic-client-data-hooks.h" +#include "json/json-replay.h" + +class json_replayer; + +/* Concrete subclass of diagnostic_client_data_hooks, for + use when replaying a json file. + + Takes ownership of the json_replayer and toplevel json::value objects. */ + +class json_replayer_diagnostic_client_data_hooks + : public diagnostic_client_data_hooks +{ + public: + json_replayer_diagnostic_client_data_hooks (json_replayer *replayer) + : m_replayer (replayer), + m_toplevel_jv (NULL) + {} + + ~json_replayer_diagnostic_client_data_hooks (); + + const client_version_info *get_any_version_info () const final override + { + /* The JSON dump doesn't contain any version info. */ + return NULL; + } + const logical_location *get_current_logical_location () const final override + { + return NULL; // TODO + } + const char * + maybe_get_sarif_source_language (const char *) const final override + { + return NULL; + } + + void stash (json::value *toplevel_jv) + { + m_toplevel_jv = toplevel_jv; + } + + json_replayer *m_replayer; + json::value *m_toplevel_jv; +}; + +/* A bundle of state for replaying a GCC diagnostic json file. */ + +class json_replayer : public json_reader +{ +public: + json_replayer (const char *filename) : json_reader (filename) {} + ~json_replayer (); + + void emit_json_as_diagnostics (json::value *jv); + + /* Get the value of property ATTR_NAME within OBJ, + exiting with an error if it is not present. */ + json::value * + get_required_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = obj->get (attr_name); + if (!attr_val) + fatal_error (obj, + "expected a %qs within object", attr_name); + return attr_val; + } + + /* Get the value of optional property ATTR_NAME within OBJ, + exiting with an error if it is not a string. */ + const char * + get_optional_string_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = obj->get (attr_name); + if (!attr_val) + return NULL; + json::string *attr_str = dyn_cast (attr_val); + if (!attr_str) + fatal_error (attr_val, + "expected the value of %qs to be a string", attr_name); + return attr_str->get_string (); + } + + /* Get the value of property ATTR_NAME within OBJ, + exiting with an error if it is not present or is not a string. */ + const char * + get_required_string_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = get_required_attr (obj, attr_name); + json::string *attr_str = dyn_cast (attr_val); + if (!attr_str) + fatal_error (attr_val, + "expected the value of %qs to be a string", attr_name); + return attr_str->get_string (); + } + + /* Get the value of optional property ATTR_NAME within OBJ, + exiting with an error if it is not an object. */ + json::object * + get_optional_object_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = obj->get (attr_name); + if (!attr_val) + return NULL; + json::object *attr_obj = dyn_cast (attr_val); + if (!attr_obj) + fatal_error (attr_val, + "expected the value of %qs to be an object", attr_name); + return attr_obj; + } + + /* Get the value of property ATTR_NAME within OBJ, + exiting with an error if it is not present or is not an object. */ + json::object * + get_required_object_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = get_required_attr (obj, attr_name); + json::object *attr_obj = dyn_cast (attr_val); + if (!attr_obj) + fatal_error (attr_val, + "expected the value of %qs to be an object", attr_name); + return attr_obj; + } + + /* Get the value of optional property ATTR_NAME within OBJ, + exiting with an error if it is not an array. */ + json::array * + get_optional_array_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = obj->get (attr_name); + if (!attr_val) + return NULL; + json::array *attr_arr = dyn_cast (attr_val); + if (!attr_arr) + fatal_error (attr_val, + "expected the value of %qs to be an array", attr_name); + return attr_arr; + } + + /* Get the value of property ATTR_NAME within OBJ, + exiting with an error if it is not present or is not an integer. */ + long + get_required_long_attr (json::object *obj, const char *attr_name) + { + json::value *attr_val = get_required_attr (obj, attr_name); + json::integer_number *attr_int + = dyn_cast (attr_val); + if (!attr_int) + fatal_error (attr_val, + "expected the value of %qs to be an integer number", + attr_name); + return attr_int->get (); + } + + /* If OBJ has optional property ATTR_NAME within OBJ and it is an integer, + return true and write its value to *OUT. + If it is not present, return false. + If it is not an integer, exit with an error. */ + bool + get_optional_long_attr (json::object *obj, const char *attr_name, long *out) + { + json::value *attr_val = obj->get (attr_name); + if (!attr_val) + return false; + json::integer_number *attr_int + = dyn_cast (attr_val); + if (!attr_int) + fatal_error (attr_val, + "expected the value of %qs to be an integer number", + attr_name); + *out = attr_int->get (); + return true; + } + + json::object * + require_object (json::value *jv) + { + if (json::object *obj = dyn_cast (jv)) + return obj; + fatal_error (jv, "expected an object"); + return NULL; + } + + location_t + get_required_expanded_location_attr (json::object *obj, const char *attr_name, + deferred_locations *deferred_locs, + int pass); + + location_t + get_optional_expanded_location_attr (json::object *obj, const char *attr_name, + deferred_locations *deferred_locs, + int pass); +private: + void emit_diag_obj (json::object *diag_obj, + deferred_locations *deferred_locs, + int pass); + location_t + get_location_from_expanded_location_obj (json::object *loc_obj, + deferred_locations *deferred_locs, + int pass); + + diagnostic_t get_diagnostic_kind (json::object *diag_obj); + + hash_map m_map_loc_obj_to_loc_t; +}; + +/* class json_replayer_diagnostic_client_data_hooks + : public diagnostic_client_data_hooks. */ + +json_replayer_diagnostic_client_data_hooks:: +~json_replayer_diagnostic_client_data_hooks () +{ + delete m_replayer; + delete m_toplevel_jv; +} + +/* class json_replayer : public json_reader. */ + +json_replayer::~json_replayer () +{ + for (auto iter : m_map_loc_obj_to_loc_t) + delete iter.second; +} + +/* In pass 0, defer the creation of a location_t for OBJ (originally created + by json_from_expanded_location). + In pass 1, look up the location_t that was created. */ + +location_t +json_replayer:: +get_location_from_expanded_location_obj (json::object *loc_obj, + deferred_locations *deferred_locs, + int pass) +{ + expanded_location exp_loc = {NULL, 0, 0, NULL, false}; + exp_loc.file = get_optional_string_attr (loc_obj, "file"); + long line; + if (get_optional_long_attr (loc_obj, "line", &line)) + exp_loc.line = line; + long column; + if (get_optional_long_attr (loc_obj, "column", &column)) + exp_loc.column = column; + if (pass) + return **m_map_loc_obj_to_loc_t.get (loc_obj); + else + { + location_t *loc = new location_t; + deferred_locs->add_location (exp_loc, loc); + m_map_loc_obj_to_loc_t.put (loc_obj, loc); + return UNKNOWN_LOCATION; + } +} + +/* Get the value of property ATTR_NAME within OBJ, + exiting with an error if it is not present or not a location. + In pass 0, defer the creation of a location_t for the value (originally + created by json_from_expanded_location). + In pass 1, look up the location_t that was created. */ + +location_t +json_replayer:: +get_required_expanded_location_attr (json::object *obj, const char *attr_name, + deferred_locations *deferred_locs, + int pass) +{ + json::object *loc_obj = get_required_object_attr (obj, attr_name); + return get_location_from_expanded_location_obj (loc_obj, deferred_locs, pass); +} + +/* As get_required_expanded_location_attr, but return UNKNOWN_LOCATION + if ATTR_NAME is not present within OBJ. */ + +location_t +json_replayer:: +get_optional_expanded_location_attr (json::object *obj, const char *attr_name, + deferred_locations *deferred_locs, + int pass) +{ + json::object *loc_obj = get_optional_object_attr (obj, attr_name); + if (!loc_obj) + return UNKNOWN_LOCATION; + return get_location_from_expanded_location_obj (loc_obj, deferred_locs, pass); +} + +/* Get the diagnostic_t kind for DIAG_OBJ. */ + +diagnostic_t +json_replayer::get_diagnostic_kind (json::object *diag_obj) +{ + const char *kind_str = get_required_string_attr (diag_obj, "kind"); + + static const char *const diagnostic_kind_text[] = { +#define DEFINE_DIAGNOSTIC_KIND(K, T, C) (T), +#include "diagnostic.def" +#undef DEFINE_DIAGNOSTIC_KIND + "must-not-happen" + }; + for (unsigned i = 0; i < ARRAY_SIZE (diagnostic_kind_text); i++) + { + /* Compare, without the trailing ": ". */ + const char *kind_text = diagnostic_kind_text[i]; + size_t len = strlen (kind_text); + if (len <= 2) + continue; + gcc_assert (kind_text[len - 2] == ':'); + gcc_assert (kind_text[len - 1] == ' '); + if (0 == strncmp (kind_text, kind_str, len - 2) + && kind_str[len - 2] == '\0') + return static_cast (i); + } + fatal_error (diag_obj, + "unrecognized value for %qs: %qs", "kind", kind_str); +} + +static hash_map map_id_to_fndecl; + +/* Create a placeholder void -> void function declaration for a function + named FUNC_STR. */ + +static tree +make_placeholder_fndecl (const char *func_str) +{ + tree id = get_identifier (func_str); + if (tree *slot = map_id_to_fndecl.get (id)) + return *slot; + tree fndecl = build_fn_decl (func_str, NULL_TREE/*fn_type*/); + map_id_to_fndecl.put (id, fndecl); + return fndecl; +} + +/* Custom subclass of rich_location, relating to replaying a specific + diagnostic serialized to JSON. */ + +class json_rich_location : public rich_location +{ +public: + json_rich_location (json_replayer *replayer, + json::object *diag_obj, deferred_locations *deferred_locs, + int pass) + : rich_location (line_table, UNKNOWN_LOCATION) + { + json::value *locations_val + = replayer->get_required_attr (diag_obj, "locations"); + json::array *locations_arr = dyn_cast (locations_val); + if (!locations_arr) + replayer->fatal_error (locations_val, + "expected an array for the value of %qs", + "locations"); + m_ranges.truncate (0); + for (auto loc_iter : *locations_arr) + { + json::object *loc_obj = dyn_cast (loc_iter); + if (!loc_obj) + replayer->fatal_error (loc_iter, + "expected an object within the %qs array", + "locations"); + /* Compare with diagnostic-format-json.cc:json_from_location_range. */ + location_t caret_loc + = replayer->get_required_expanded_location_attr (loc_obj, "caret", + deferred_locs, pass); + location_t start_loc + = replayer->get_optional_expanded_location_attr (loc_obj, "start", + deferred_locs, pass); + location_t finish_loc + = replayer->get_optional_expanded_location_attr (loc_obj, "finish", + deferred_locs, pass); + if (start_loc == UNKNOWN_LOCATION) + start_loc = caret_loc; + if (finish_loc == UNKNOWN_LOCATION) + finish_loc = caret_loc; + const char *label_str + = replayer->get_optional_string_attr (loc_obj, "label"); + range_label *label + = label_str ? new text_range_label (label_str) : NULL; + location_t range_loc = make_location (caret_loc, start_loc, finish_loc); + add_range (range_loc, SHOW_RANGE_WITH_CARET, label); + } + + json::array *fixits_arr + = replayer->get_optional_array_attr (diag_obj, "fixits"); + if (fixits_arr) + for (auto fixit_iter : *fixits_arr) + { + json::object *fixit_obj = dyn_cast (fixit_iter); + if (!fixit_obj) + replayer->fatal_error (fixit_iter, + "expected an object within the %qs array", + "fixits"); + on_fixit_obj (replayer, fixit_obj, deferred_locs, pass); + } + + json::array *path_arr + = replayer->get_optional_array_attr (diag_obj, "path"); + if (path_arr) + { + pretty_printer pp; + simple_diagnostic_path *path = new simple_diagnostic_path (&pp); + set_path (path); + for (auto event_iter : *path_arr) + { + json::object *event_obj = dyn_cast (event_iter); + if (!event_obj) + replayer->fatal_error (event_iter, + "expected an object within the %qs array", + "path"); + location_t loc + = replayer->get_optional_expanded_location_attr (event_obj, + "location", + deferred_locs, + pass); + const char *desc_str + = replayer->get_required_string_attr (event_obj, "description"); + tree fndecl = NULL_TREE; + if (const char *func_str + = replayer->get_optional_string_attr (event_obj, "function")) + fndecl = make_placeholder_fndecl (func_str); + long depth = replayer->get_required_long_attr (event_obj, "depth"); + path->add_event (loc, fndecl, depth, "%s", desc_str); + } + } + } + +private: + /* Compare with diagnostic-format-json.cc:json_from_fixit_hint. */ + void on_fixit_obj (json_replayer *replayer, + json::object *fixit_obj, + deferred_locations *deferred_locs, int pass) + { + location_t start_loc + = replayer->get_required_expanded_location_attr (fixit_obj, "start", + deferred_locs, pass); + location_t next_loc + = replayer->get_required_expanded_location_attr (fixit_obj, "next", + deferred_locs, pass); + const char *string + = replayer->get_required_string_attr (fixit_obj, "string"); + if (pass) + maybe_add_fixit (start_loc, next_loc, string); + } +}; + +class json_replayer_rule : public diagnostic_metadata::rule +{ +public: + json_replayer_rule (json_replayer *replayer, json::object *diag_obj) + : m_replayer (replayer), m_diag_obj (diag_obj) + {} + + char *make_description () const final override + { + if (const char *option + = m_replayer->get_optional_string_attr (m_diag_obj, "option")) + return xstrdup (option); + return NULL; + } + + char *make_url () const final override + { + if (const char *option_url + = m_replayer->get_optional_string_attr (m_diag_obj, "option_url")) + return xstrdup (option_url); + return NULL; + } + +private: + json_replayer *m_replayer; + json::object *m_diag_obj; +}; + + +/* Replay the diagnotic DIAG_OBJ to global_dc. + In pass 0, do everything except actually replay the diagnostic, + deferring the creation of location_t values. + In pass 1, actually emit the diagostic, using real location_t values. + Exit with an error if DIAG_OBJ does not match the expected output format. */ + +void +json_replayer::emit_diag_obj (json::object *diag_obj, + deferred_locations *deferred_locs, + int pass) +{ + auto_diagnostic_group d; + + diagnostic_t kind = get_diagnostic_kind (diag_obj); + const char *message_str = get_required_string_attr (diag_obj, "message"); + + json_rich_location rich_loc (this, diag_obj, deferred_locs, pass); + + diagnostic_metadata meta; + + if (json::object *metadata_obj = get_optional_object_attr (diag_obj, + "metadata")) + { + long cwe_long; + if (get_optional_long_attr (metadata_obj, "cwe", &cwe_long)) + meta.add_cwe (cwe_long); + } + + bool emitted = true; + if (pass == 1) + { + const int option = 0; + json_replayer_rule rule (this, diag_obj); + meta.add_rule (rule); + emitted = emit_diagnostic (kind, &rich_loc, &meta, option, + "%s", message_str); + } + + if (emitted) + if (json::value *children_val = diag_obj->get ("children")) + { + json::array *children_arr = dyn_cast (children_val); + if (!children_arr) + fatal_error (children_val, + "expected an array for the value of %qs", + "children"); + for (auto child_val : *children_arr) + { + /* We expect an array of objects. */ + json::object *child_obj = dyn_cast (child_val); + if (!child_obj) + fatal_error (child_val, + "expected an object within the %qs array", + "children"); + emit_diag_obj (child_obj, deferred_locs, pass); + } + } +} + +/* Replay the diagnotics in JV to global_dc. + Exit with an error if JV does not match the expected output format. */ + +void +json_replayer::emit_json_as_diagnostics (json::value *jv) +{ + /* We expect an array as the top-level value. */ + json::array *toplev_arr = dyn_cast (jv); + if (!toplev_arr) + fatal_error (jv, "expected an array as the top-level value"); + + deferred_locations deferred_locs; + + for (int pass = 0; pass < 2; pass++) + { + for (auto diag_val : *toplev_arr) + { + /* We expect an object. */ + json::object *diag_obj = require_object (diag_val); + emit_diag_obj (diag_obj, &deferred_locs, pass); + } + if (pass == 0) + deferred_locs.generate_location_t_values (); + } +} + +/* Attempt to load a json file from FILENAME and replay it. + Exit on any errors. */ + +void +replay_json (const char *filename) +{ + json_replayer *p = new json_replayer (filename); + json_replayer_diagnostic_client_data_hooks *hooks + = new json_replayer_diagnostic_client_data_hooks (p); + global_dc->m_client_data_hooks = hooks; + + char *content = read_file (filename); + json::error *err = NULL; + json::value *jv = p->parse_utf8_string (content, flag_allow_comments, &err); + if (err) + { + p->fatal_error (err); + delete err; + } + free (content); + + if (jv) + { + hooks->stash (jv); + p->emit_json_as_diagnostics (jv); + } +} diff --git a/gcc/json/json-replay.h b/gcc/json/json-replay.h new file mode 100644 index 00000000000..f2776aa4171 --- /dev/null +++ b/gcc/json/json-replay.h @@ -0,0 +1,26 @@ +/* Re-emitting diagnostics saved in JSON form. + Copyright (C) 2022 David Malcolm . + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_JSON_JSON_REPLAY_H +#define GCC_JSON_JSON_REPLAY_H + +extern void replay_json (const char *filename); + +#endif /* GCC_JSON_JSON_H */ diff --git a/gcc/json/lang-specs.h b/gcc/json/lang-specs.h new file mode 100644 index 00000000000..da9a4f25757 --- /dev/null +++ b/gcc/json/lang-specs.h @@ -0,0 +1,26 @@ +/* lang-specs.h -- gcc driver specs for the JSON "frontend". + Copyright (C) 2022 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* This is the contribution to the `default_compilers' array in gcc.cc + for the json "frontend". */ + +{".json", "@json", 0, 1, 0}, +/* FIXME: */ +{"@json", "json-replay %i %(cc1_options)", + 0, 1, 0}, diff --git a/gcc/json/lang.opt b/gcc/json/lang.opt new file mode 100644 index 00000000000..4c75f5d35dd --- /dev/null +++ b/gcc/json/lang.opt @@ -0,0 +1,31 @@ +; Options for the JSON front end. +; Copyright (C) 2022 Free Software Foundation, Inc. +; +; This file is part of GCC. +; +; GCC is free software; you can redistribute it and/or modify it under +; the terms of the GNU General Public License as published by the Free +; Software Foundation; either version 3, or (at your option) any later +; version. +; +; GCC is distributed in the hope that it will be useful, but WITHOUT ANY +; WARRANTY; without even the implied warranty of MERCHANTABILITY or +; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +; for more details. +; +; You should have received a copy of the GNU General Public License +; along with GCC; see the file COPYING3. If not see +; . + +; See the GCC internals manual for a description of this file's format. + +; Please try to keep this file in ASCII collating order. + +Language +JSON + +fallow-comments +JSON Var(flag_allow_comments) +Extend JSON to support comments + +; This comment is to ensure we retain the blank line above. diff --git a/gcc/testsuite/json/invalid-json-array-missing-comma.json b/gcc/testsuite/json/invalid-json-array-missing-comma.json new file mode 100644 index 00000000000..0f32d38420e --- /dev/null +++ b/gcc/testsuite/json/invalid-json-array-missing-comma.json @@ -0,0 +1,6 @@ +[ "foo", "bar" "baz"] // { dg-error "expected ',' or '\]'; got string" } + +{ dg-begin-multiline-output "" } + 1 | [ "foo", "bar" "baz"] + | ^~~~~ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/json/invalid-json-array-with-trailing-comma.json b/gcc/testsuite/json/invalid-json-array-with-trailing-comma.json new file mode 100644 index 00000000000..05b74a81efc --- /dev/null +++ b/gcc/testsuite/json/invalid-json-array-with-trailing-comma.json @@ -0,0 +1,6 @@ +[ 0, 1, 2, ] /* { dg-error "expected a JSON value but got '\\\]'" } */ + +{ dg-begin-multiline-output "" } + 1 | [ 0, 1, 2, ] + | ^ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/json/invalid-json-bad-token.json b/gcc/testsuite/json/invalid-json-bad-token.json new file mode 100644 index 00000000000..7756eef1add --- /dev/null +++ b/gcc/testsuite/json/invalid-json-bad-token.json @@ -0,0 +1,6 @@ + not a valid JSON file // { dg-error "invalid JSON token: unexpected character: 'n'" } + +{ dg-begin-multiline-output "" } + 1 | not a valid JSON file + | ^ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/json/invalid-json-object-missing-comma.json b/gcc/testsuite/json/invalid-json-object-missing-comma.json new file mode 100644 index 00000000000..9d2bf9476b1 --- /dev/null +++ b/gcc/testsuite/json/invalid-json-object-missing-comma.json @@ -0,0 +1,7 @@ +{ "foo": "bar" + "baz": 42 } // { dg-error "expected ',' or '\}'; got string" } + +{ dg-begin-multiline-output "" } + 2 | "baz": 42 } + | ^~~~~ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/json/invalid-json-object-with-trailing-comma.json b/gcc/testsuite/json/invalid-json-object-with-trailing-comma.json new file mode 100644 index 00000000000..e1aae9b350c --- /dev/null +++ b/gcc/testsuite/json/invalid-json-object-with-trailing-comma.json @@ -0,0 +1,6 @@ +{ "foo": "bar", } /* { dg-error "expected string for object key after ','; got '\\\}'" } */ + +{ dg-begin-multiline-output "" } + 1 | { "foo": "bar", } + | ^ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/json/invalid-jsondump-diag-not-an-object.json b/gcc/testsuite/json/invalid-jsondump-diag-not-an-object.json new file mode 100644 index 00000000000..6f5d37f08e1 --- /dev/null +++ b/gcc/testsuite/json/invalid-jsondump-diag-not-an-object.json @@ -0,0 +1,6 @@ +[ 42 ] /* { dg-error "expected an object" } */ + +/* { dg-begin-multiline-output "" } + 1 | [ 42 ] + | ^~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/json/invalid-jsondump-kind-not-a-string.json b/gcc/testsuite/json/invalid-jsondump-kind-not-a-string.json new file mode 100644 index 00000000000..ab55f1e5573 --- /dev/null +++ b/gcc/testsuite/json/invalid-jsondump-kind-not-a-string.json @@ -0,0 +1,20 @@ +[ + { + "kind": 42, /* { dg-error "expected the value of 'kind' to be a string" } */ + "locations": [], + "column-origin": 1, + "option": "-Wanalyzer-unsafe-call-within-signal-handler", + "escape-source": false, + "children": [], + "option_url": "https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-unsafe-call-within-signal-handler", + "message": "call to \u2018fprintf\u2019 from within signal handler", + "metadata": { + "cwe": 479 + } + } +] + +/* { dg-begin-multiline-output "" } + 3 | "kind": 42, + | ^~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/json/invalid-jsondump-not-an-array.json b/gcc/testsuite/json/invalid-jsondump-not-an-array.json new file mode 100644 index 00000000000..9b14ea35565 --- /dev/null +++ b/gcc/testsuite/json/invalid-jsondump-not-an-array.json @@ -0,0 +1,6 @@ +{ "foo": "bar" } /* { dg-error "expected an array as the top-level value" } */ + +/* { dg-begin-multiline-output "" } + 1 | { "foo": "bar" } + | ^~~~~~~~~~~~~~~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/json/json.exp b/gcc/testsuite/json/json.exp new file mode 100644 index 00000000000..0a33ba53fb6 --- /dev/null +++ b/gcc/testsuite/json/json.exp @@ -0,0 +1,50 @@ +# Copyright (C) 2004-2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# GCC testsuite that uses the `dg.exp' driver. + +# Load support procs. +load_lib json-dg.exp + +#load_lib dg.exp +#load_lib prune.exp +#load_lib target-supports.exp +#load_lib gcc-defs.exp +#load_lib timeout.exp +#load_lib target-libpath.exp +#load_lib gcc.exp +#load_lib g++.exp +#load_lib dejagnu.exp +#load_lib prune.exp +#load_lib gcc-defs.exp +#load_lib timeout.exp +#load_lib target-libpath.exp +#load_lib target-supports.exp +#load_lib gcc-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_JSON_FLAGS +if ![info exists DEFAULT_JSON_FLAGS] then { + set DEFAULT_JSON_FLAGS "-fallow-comments" +} +# Initialize `dg'. +dg-init + +dg-runtest [lsort \ + [glob -nocomplain $srcdir/$subdir/*.json ] ] "" $DEFAULT_JSON_FLAGS + +# All done. +dg-finish diff --git a/gcc/testsuite/json/signal-1.c.json b/gcc/testsuite/json/signal-1.c.json new file mode 100644 index 00000000000..5f4962209a9 --- /dev/null +++ b/gcc/testsuite/json/signal-1.c.json @@ -0,0 +1,131 @@ +[ + { + "kind": "warning", + "locations": [ + { + "finish": { + "byte-column": 33, + "display-column": 33, + "line": 13, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 33 + }, + "caret": { + "byte-column": 3, + "display-column": 3, + "line": 13, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 3 + } + } + ], + "path": [ + { + "location": { + "byte-column": 5, + "display-column": 5, + "line": 21, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 5 + }, + "description": "entry to \u2018main\u2019", + "depth": 1, + "function": "main" + }, + { + "location": { + "byte-column": 3, + "display-column": 3, + "line": 25, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 3 + }, + "description": "registering \u2018handler\u2019 as signal handler", + "depth": 1, + "function": "main" + }, + { + "description": "later on, when the signal is delivered to the process", + "depth": 0 + }, + { + "location": { + "byte-column": 13, + "display-column": 13, + "line": 16, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 13 + }, + "description": "entry to \u2018handler\u2019", + "depth": 1, + "function": "handler" + }, + { + "location": { + "byte-column": 3, + "display-column": 3, + "line": 18, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 3 + }, + "description": "calling \u2018custom_logger\u2019 from \u2018handler\u2019", + "depth": 1, + "function": "handler" + }, + { + "location": { + "byte-column": 6, + "display-column": 6, + "line": 11, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 6 + }, + "description": "entry to \u2018custom_logger\u2019", + "depth": 2, + "function": "custom_logger" + }, + { + "location": { + "byte-column": 3, + "display-column": 3, + "line": 13, + "file": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "column": 3 + }, + "description": "call to \u2018fprintf\u2019 from within signal handler", + "depth": 2, + "function": "custom_logger" + } + ], + "column-origin": 1, + "option": "-Wanalyzer-unsafe-call-within-signal-handler", + "escape-source": false, + "children": [], + "option_url": "https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-unsafe-call-within-signal-handler", + "message": "call to \u2018fprintf\u2019 from within signal handler", + "metadata": { + "cwe": 479 + } + } +] + +/* { dg-begin-multiline-output "" } +../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c:13:3: warning: call to ‘fprintf’ from within signal handler [CWE-479] [-Wanalyzer-unsafe-call-within-signal-handler] + 'main': events 1-2 + | + |...... + | + event 3 + | + |json-replay: + | (3): later on, when the signal is delivered to the process + | + +--> 'handler': events 4-5 + | + | + +--> 'custom_logger': events 6-7 + | + | + { dg-end-multiline-output "" } */ + +// TODO: various things wrong here diff --git a/gcc/testsuite/lib/json-dg.exp b/gcc/testsuite/lib/json-dg.exp new file mode 100644 index 00000000000..18701f84cac --- /dev/null +++ b/gcc/testsuite/lib/json-dg.exp @@ -0,0 +1,233 @@ +# Copyright (C) 2004-2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +load_lib gcc-dg.exp +load_lib torture-options.exp + +#FIXME: copied from gfortran-dg.exp + +# Define json callbacks for dg.exp. + +proc json-dg-test { prog do_what extra_tool_flags } { + set result \ + [gcc-dg-test-1 json_target_compile $prog $do_what $extra_tool_flags] + + set comp_output [lindex $result 0] + set output_file [lindex $result 1] + + # gcc's default is to print the caret and source code, but + # most test cases implicitly use the flag -fno-diagnostics-show-caret + # to disable caret (and source code) printing. + # + # However, a few test cases override this back to the default by + # explicily supplying "-fdiagnostics-show-caret", so that we can have + # test coverage for caret/source code printing. + # + # json error messages with caret-printing look like this: + # [name]:[locus]: + # + # some code + # 1 + # Error: Some error at (1) + # or + # [name]:[locus]: + # + # some code + # 1 + # [name]:[locus2]: + # + # some other code + # 2 + # Error: Some error at (1) and (2) + # or + # [name]:[locus]: + # + # some code and some more code + # 1 2 + # Error: Some error at (1) and (2) + # + # If this is such a test case, skip the rest of this function, so + # that the test case can explicitly verify the output that it expects. + if {[string first "-fdiagnostics-show-caret" $extra_tool_flags] >= 0} { + return [list $comp_output $output_file] + } + + # Otherwise, caret-printing is disabled. + # json errors with caret-printing disabled look like this: + # [name]:[locus]: Error: Some error + # or + # [name]:[locus]: Error: (1) + # [name]:[locus2]: Error: Some error at (1) and (2) + # + # Where [locus] is either [line] or [line].[column] or + # [line].[column]-[column] . + # + # We collapse these to look like: + # [name]:[line]:[column]: Error: Some error at (1) and (2) + # or + # [name]:[line]:[column]: Error: Some error at (1) and (2) + # [name]:[line2]:[column]: Error: Some error at (1) and (2) + # + # Note that these regexps only make sense in the combinations used below. + # Note also that is imperative that we first deal with the form with + # two loci. + set locus_regexp "(\[^\n\]+:\[0-9\]+)\[\.:\](\[0-9\]+)(-\[0-9\]+)?:\n\n\[^\n\]+\n\[^\n\]+\n" + set diag_regexp "(\[^\n\]+)\n" + + # We proceed in steps: + + # 1. We add first a column number if none exists. + # (Some Fortran diagnostics have the locus after Warning|Error) + set colnum_regexp "(^|\n)(Warning: |Error: )?(\[^:\n\]+:\[0-9\]+):(\[ \n\])" + regsub -all $colnum_regexp $comp_output "\\1\\3:0:\\4\\2" comp_output + verbose "comput_output0:\n$comp_output" + + # 2. We deal with the form with two different locus lines, + set two_loci "(^|\n)$locus_regexp$locus_regexp$diag_regexp" + regsub -all $two_loci $comp_output "\\1\\2:\\3: \\8\n\\5\:\\6: \\8\n" comp_output + verbose "comput_output1:\n$comp_output" + + set locus_prefix "(\[^:\n\]+:\[0-9\]+:\[0-9\]+: )(Warning: |Error: )" + set two_loci2 "(^|\n)$locus_prefix\\(1\\)\n$locus_prefix$diag_regexp" + regsub -all $two_loci2 $comp_output "\\1\\2\\3\\6\n\\4\\5\\6\n" comp_output + verbose "comput_output2:\n$comp_output" + + # 3. then with the form with only one locus line. + set single_locus "(^|\n)$locus_regexp$diag_regexp" + regsub -all $single_locus $comp_output "\\1\\2:\\3: \\5\n" comp_output + verbose "comput_output3:\n$comp_output" + + # 4. Add a line number if none exists + regsub -all "(^|\n)(Warning: |Error: )" $comp_output "\\1:0:0: \\2" comp_output + verbose "comput_output4:\n$comp_output" + return [list $comp_output $output_file] +} + +proc json-dg-prune { system text } { + return [gcc-dg-prune $system $text] +} + +# Utility routines. + +# Modified dg-runtest that can cycle through a list of optimization options +# as c-torture does. +proc json-dg-runtest { testcases flags default-extra-flags } { + global runtests + global torture_with_loops + + # Some callers set torture options themselves; don't override those. + set existing_torture_options [torture-options-exist] + if { $existing_torture_options == 0 } { + global DG_TORTURE_OPTIONS + torture-init + set-torture-options $DG_TORTURE_OPTIONS + } + dump-torture-options + + foreach test $testcases { + # If we're only testing specific files and this isn't one of + # them, skip it. + if ![runtest_file_p $runtests $test] { + continue + } + + # look if this is dg-do-run test, in which case + # we cycle through the option list, otherwise we don't + if [expr [search_for $test "dg-do run"]] { + set option_list $torture_with_loops + } else { + set option_list [list { -O } ] + } + + set nshort [file tail [file dirname $test]]/[file tail $test] + list-module-names $test + + foreach flags_t $option_list { + verbose "Testing $nshort, $flags $flags_t" 1 + dg-test $test "$flags $flags_t" ${default-extra-flags} + cleanup-modules "" + } + } + + if { $existing_torture_options == 0 } { + torture-finish + } +} + +proc json-dg-debug-runtest { target_compile trivial opt_opts testcases } { + global srcdir subdir DEBUG_TORTURE_OPTIONS + + if ![info exists DEBUG_TORTURE_OPTIONS] { + set DEBUG_TORTURE_OPTIONS "" + set type_list [list "-gstabs" "-gstabs+" "-gxcoff" "-gxcoff+" "-gdwarf-2" ] + foreach type $type_list { + set comp_output [$target_compile \ + "$srcdir/$subdir/$trivial" "trivial.S" assembly \ + "additional_flags=$type"] + if { [string match "exit status *" $comp_output] } { + continue + } + if { [string match \ + "* target system does not support the * debug format*" \ + $comp_output] + } { + continue + } + remove-build-file "trivial.S" + foreach level {1 "" 3} { + if { ($type == "-gdwarf-2") && ($level != "") } { + lappend DEBUG_TORTURE_OPTIONS [list "${type}" "-g${level}"] + foreach opt $opt_opts { + lappend DEBUG_TORTURE_OPTIONS \ + [list "${type}" "-g${level}" "$opt" ] + } + } else { + lappend DEBUG_TORTURE_OPTIONS [list "${type}${level}"] + foreach opt $opt_opts { + lappend DEBUG_TORTURE_OPTIONS \ + [list "${type}${level}" "$opt" ] + } + } + } + } + } + + verbose -log "Using options $DEBUG_TORTURE_OPTIONS" + + global runtests + + foreach test $testcases { + # If we're only testing specific files and this isn't one of + # them, skip it. + if ![runtest_file_p $runtests $test] { + continue + } + + set nshort [file tail [file dirname $test]]/[file tail $test] + list-module-names $test + + foreach flags $DEBUG_TORTURE_OPTIONS { + set doit 1 + # gcc-specific checking removed here + + if { $doit } { + verbose -log "Testing $nshort, $flags" 1 + dg-test $test $flags "" + cleanup-modules "" + } + } + } +} diff --git a/gcc/testsuite/lib/json.exp b/gcc/testsuite/lib/json.exp new file mode 100644 index 00000000000..52ba75a1a14 --- /dev/null +++ b/gcc/testsuite/lib/json.exp @@ -0,0 +1,36 @@ +# Copyright (C) 2003-2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# FIXME: copied from gfortran.exp + +# This file is just 'sed -e 's/77/fortran/g' \ +# -e 's/f2c/gfortran' g77.exp > gfortran.exp' +# +# with some minor modifications to make it work. + +# +# json support library routines +# +load_lib prune.exp +load_lib gcc-defs.exp +load_lib timeout.exp +load_lib target-libpath.exp +load_lib target-supports.exp + +proc json_target_compile { source dest type options } { + set return_val [target_compile $source $dest $type $options] + return $return_val; +} From patchwork Wed Jun 22 22:34:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646816 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=gscorV3S; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSz772vl6z9sGp for ; Thu, 23 Jun 2022 08:48:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 362843885A0B for ; Wed, 22 Jun 2022 22:48:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 362843885A0B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655938081; bh=l27fU00i+ru0to3JRiS3Y2T+ni9QhNJTFp5YZ09J1OY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=gscorV3SOWB3PIOi8IIAxjzUure7XbZWiRo1XIfVIESOkW/H++pfLjYgCAj2ZsVeX S+1VW2LY4NpwRCBN37AS+FrW/7uIy3TZvrIWjLfhus1SwKfJk728v5hxKNfvpl6d8X +uKppMgsXFhPyEu/sC5b5QR14c1K1kBPZLgFDIMM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 5803A383067E for ; Wed, 22 Jun 2022 22:34:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5803A383067E Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-417-3r4UIOlAMMGtTbfabJOdgQ-1; Wed, 22 Jun 2022 18:34:50 -0400 X-MC-Unique: 3r4UIOlAMMGtTbfabJOdgQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8FCE81C06904 for ; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 61F131121314; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 10/12] Add sarif frontend Date: Wed, 22 Jun 2022 18:34:45 -0400 Message-Id: <20220622223447.2462880-11-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is a work-in-progress (lots of TODOs and FIXMEs) that adds a new SARIF frontend to gcc: sarif-replayer, which is invoked when passing .sarif files to the gcc driver program: it will attempt to replay the .sarif file using the provided diagnostic formatting options. gcc/ChangeLog: * sarif/Make-lang.in: New file. * sarif/lang.opt: New file. * sarif/sarif-frontend.cc: New file. * sarif/sarif-replay.cc: New file. * sarif/sarif-replay.h: New file. gcc/testsuite/ChangeLog: * lib/sarif-dg.exp: New test. * lib/sarif.exp: New test. * sarif/bad-eval-with-code-flow.py: New test. * sarif/escaped-braces.sarif: New test. * sarif/invalid-json-array-missing-comma.sarif: New test. * sarif/invalid-json-array-with-trailing-comma.sarif: New test. * sarif/invalid-json-bad-token.sarif: New test. * sarif/invalid-json-object-missing-comma.sarif: New test. * sarif/invalid-json-object-with-trailing-comma.sarif: New test. * sarif/invalid-sarif-bad-runs.sarif: New test. * sarif/invalid-sarif-missing-arguments-for-placeholders.sarif: New test. * sarif/invalid-sarif-no-runs.sarif: New test. * sarif/invalid-sarif-no-version.sarif: New test. * sarif/invalid-sarif-non-object-in-runs.sarif: New test. * sarif/invalid-sarif-not-an-object.sarif: New test. * sarif/invalid-sarif-not-enough-arguments-for-placeholders.sarif: New test. * sarif/invalid-sarif-version-not-a-string.sarif: New test. * sarif/malformed-placeholder.sarif: New test. * sarif/null-runs.sarif: New test. * sarif/roundtrip-signal-1.c.sarif: New test. * sarif/sarif.exp: New test. * sarif/signal-1.c.sarif: New test. * sarif/spec-example-1.sarif: New test. * sarif/spec-example-2.sarif: New test. * sarif/spec-example-3.sarif: New test. * sarif/spec-example-4.sarif: New test. * sarif/tutorial-example-foo.sarif: New test. Signed-off-by: David Malcolm --- gcc/sarif/Make-lang.in | 132 ++ gcc/sarif/config-lang.in | 34 + gcc/sarif/lang-specs.h | 26 + gcc/sarif/lang.opt | 31 + gcc/sarif/sarif-frontend.cc | 191 +++ gcc/sarif/sarif-replay.cc | 1489 +++++++++++++++++ gcc/sarif/sarif-replay.h | 26 + gcc/testsuite/lib/sarif-dg.exp | 233 +++ gcc/testsuite/lib/sarif.exp | 36 + .../sarif/bad-eval-with-code-flow.py | 10 + gcc/testsuite/sarif/escaped-braces.sarif | 19 + .../invalid-json-array-missing-comma.sarif | 6 + ...valid-json-array-with-trailing-comma.sarif | 6 + .../sarif/invalid-json-bad-token.sarif | 6 + .../invalid-json-object-missing-comma.sarif | 7 + ...alid-json-object-with-trailing-comma.sarif | 6 + .../sarif/invalid-sarif-bad-runs.sarif | 7 + ...f-missing-arguments-for-placeholders.sarif | 14 + .../sarif/invalid-sarif-no-runs.sarif | 6 + .../sarif/invalid-sarif-no-version.sarif | 6 + .../invalid-sarif-non-object-in-runs.sarif | 7 + .../sarif/invalid-sarif-not-an-object.sarif | 6 + ...ot-enough-arguments-for-placeholders.sarif | 14 + .../invalid-sarif-version-not-a-string.sarif | 6 + .../sarif/malformed-placeholder.sarif | 15 + gcc/testsuite/sarif/null-runs.sarif | 2 + .../sarif/roundtrip-signal-1.c.sarif | 398 +++++ gcc/testsuite/sarif/sarif.exp | 50 + gcc/testsuite/sarif/signal-1.c.sarif | 362 ++++ gcc/testsuite/sarif/spec-example-1.sarif | 15 + gcc/testsuite/sarif/spec-example-2.sarif | 74 + gcc/testsuite/sarif/spec-example-3.sarif | 67 + gcc/testsuite/sarif/spec-example-4.sarif | 758 +++++++++ .../sarif/tutorial-example-foo.sarif | 117 ++ 34 files changed, 4182 insertions(+) create mode 100644 gcc/sarif/Make-lang.in create mode 100644 gcc/sarif/config-lang.in create mode 100644 gcc/sarif/lang-specs.h create mode 100644 gcc/sarif/lang.opt create mode 100644 gcc/sarif/sarif-frontend.cc create mode 100644 gcc/sarif/sarif-replay.cc create mode 100644 gcc/sarif/sarif-replay.h create mode 100644 gcc/testsuite/lib/sarif-dg.exp create mode 100644 gcc/testsuite/lib/sarif.exp create mode 100644 gcc/testsuite/sarif/bad-eval-with-code-flow.py create mode 100644 gcc/testsuite/sarif/escaped-braces.sarif create mode 100644 gcc/testsuite/sarif/invalid-json-array-missing-comma.sarif create mode 100644 gcc/testsuite/sarif/invalid-json-array-with-trailing-comma.sarif create mode 100644 gcc/testsuite/sarif/invalid-json-bad-token.sarif create mode 100644 gcc/testsuite/sarif/invalid-json-object-missing-comma.sarif create mode 100644 gcc/testsuite/sarif/invalid-json-object-with-trailing-comma.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-bad-runs.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-missing-arguments-for-placeholders.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-no-runs.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-no-version.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-non-object-in-runs.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-not-an-object.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-not-enough-arguments-for-placeholders.sarif create mode 100644 gcc/testsuite/sarif/invalid-sarif-version-not-a-string.sarif create mode 100644 gcc/testsuite/sarif/malformed-placeholder.sarif create mode 100644 gcc/testsuite/sarif/null-runs.sarif create mode 100644 gcc/testsuite/sarif/roundtrip-signal-1.c.sarif create mode 100644 gcc/testsuite/sarif/sarif.exp create mode 100644 gcc/testsuite/sarif/signal-1.c.sarif create mode 100644 gcc/testsuite/sarif/spec-example-1.sarif create mode 100644 gcc/testsuite/sarif/spec-example-2.sarif create mode 100644 gcc/testsuite/sarif/spec-example-3.sarif create mode 100644 gcc/testsuite/sarif/spec-example-4.sarif create mode 100644 gcc/testsuite/sarif/tutorial-example-foo.sarif diff --git a/gcc/sarif/Make-lang.in b/gcc/sarif/Make-lang.in new file mode 100644 index 00000000000..53b6239da07 --- /dev/null +++ b/gcc/sarif/Make-lang.in @@ -0,0 +1,132 @@ +# Make-lang.in -- Top level -*- makefile -*- fragment for gcc SARIF "frontend". + +# Copyright (C) 2022 Free Software Foundation, Inc. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. + +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# This file provides the language dependent support in the main Makefile. + +# The name for selecting sarif in LANGUAGES. +sarif: sarif-replay$(exeext) + +.PHONY: sarif + +SARIF_OBJS = sarif/sarif-frontend.o sarif/sarif-replay.o \ + attribs.o deferred-locations.o json-reader.o +sarif_OBJS = $(SARIF_OBJS) + +sarif-replay$(exeext): $(SARIF_OBJS) $(BACKEND) $(LIBDEPS) + @$(call LINK_PROGRESS,$(INDEX.sarif),start) + +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \ + $(SARIF_OBJS) $(BACKEND) $(LIBS) $(BACKENDLIBS) + @$(call LINK_PROGRESS,$(INDEX.sarif),end) + +# Build hooks. + +sarif.all.cross: +sarif.start.encap: +sarif.rest.encap: + +sarif.info: +sarif.man: + +lang_checks += check-sarif +#lang_checks_parallelized += check-sarif +#check_sarif_parallelize = 10 + +# No sarif-specific selftests +selftest-sarif: + +# Install hooks. + +sarif.install-common: installdirs + -rm -f $(DESTDIR)$(bindir)/$(GCCSARIF_INSTALL_NAME)$(exeext) + $(INSTALL_PROGRAM) gccsarif$(exeext) $(DESTDIR)$(bindir)/$(GCCSARIF_INSTALL_NAME)$(exeext) + -if test -f sarif-replay$(exeext); then \ + if test -f gccsarif-cross$(exeext); then \ + :; \ + else \ + rm -f $(DESTDIR)$(bindir)/$(GCCSARIF_TARGET_INSTALL_NAME)$(exeext); \ + ( cd $(DESTDIR)$(bindir) && \ + $(LN) $(GCCSARIF_INSTALL_NAME)$(exeext) $(GCCSARIF_TARGET_INSTALL_NAME)$(exeext) ); \ + fi; \ + fi + +sarif.install-plugin: + +sarif.install-info: $(DESTDIR)$(infodir)/gccsarif.info + +sarif.install-pdf: doc/gccsarif.pdf + @$(NORMAL_INSTALL) + test -z "$(pdfdir)" || $(mkinstalldirs) "$(DESTDIR)$(pdfdir)/gcc" + @for p in doc/gccsarif.pdf; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + f=$(pdf__strip_dir) \ + echo " $(INSTALL_DATA) '$$d$$p' '$(DESTDIR)$(pdfdir)/gcc/$$f'"; \ + $(INSTALL_DATA) "$$d$$p" "$(DESTDIR)$(pdfdir)/gcc/$$f"; \ + done + +sarif.install-html: $(build_htmldir)/sarif + @$(NORMAL_INSTALL) + test -z "$(htmldir)" || $(mkinstalldirs) "$(DESTDIR)$(htmldir)" + @for p in $(build_htmldir)/sarif; do \ + if test -f "$$p" || test -d "$$p"; then d=""; else d="$(srcdir)/"; fi; \ + f=$(html__strip_dir) \ + if test -d "$$d$$p"; then \ + echo " $(mkinstalldirs) '$(DESTDIR)$(htmldir)/$$f'"; \ + $(mkinstalldirs) "$(DESTDIR)$(htmldir)/$$f" || exit 1; \ + echo " $(INSTALL_DATA) '$$d$$p'/* '$(DESTDIR)$(htmldir)/$$f'"; \ + $(INSTALL_DATA) "$$d$$p"/* "$(DESTDIR)$(htmldir)/$$f"; \ + else \ + echo " $(INSTALL_DATA) '$$d$$p' '$(DESTDIR)$(htmldir)/$$f'"; \ + $(INSTALL_DATA) "$$d$$p" "$(DESTDIR)$(htmldir)/$$f"; \ + fi; \ + done + +sarif.install-man: $(DESTDIR)$(man1dir)/$(GCCSARIF_INSTALL_NAME)$(man1ext) + +sarif.uninstall: + rm -rf $(DESTDIR)$(bindir)/$(GCCSARIF_INSTALL_NAME)$(exeext) + rm -rf $(DESTDIR)$(man1dir)/$(GCCSARIF_INSTALL_NAME)$(man1ext) + rm -rf $(DESTDIR)$(bindir)/$(GCCSARIF_TARGET_INSTALL_NAME)$(exeext) + rm -rf $(DESTDIR)$(infodir)/gccsarif.info* + +# Clean hooks. + +sarif.mostlyclean: + -rm -f sarif/*$(objext) + -rm -f sarif/*$(coverageexts) + -rm -f gccsarif$(exeext) gccsarif-cross$(exeext) sarif-replay$(exeext) +sarif.clean: +sarif.distclean: +sarif.maintainer-clean: + -rm -f $(docobjdir)/gccsarif.1 + +# Stage hooks. + +sarif.stage1: stage1-start + -mv sarif/*$(objext) stage1/sarif +sarif.stage2: stage2-start + -mv sarif/*$(objext) stage2/sarif +sarif.stage3: stage3-start + -mv sarif/*$(objext) stage3/sarif +sarif.stage4: stage4-start + -mv sarif/*$(objext) stage4/sarif +sarif.stageprofile: stageprofile-start + -mv sarif/*$(objext) stageprofile/sarif +sarif.stagefeedback: stagefeedback-start + -mv sarif/*$(objext) stagefeedback/sarif diff --git a/gcc/sarif/config-lang.in b/gcc/sarif/config-lang.in new file mode 100644 index 00000000000..6ed01f4116d --- /dev/null +++ b/gcc/sarif/config-lang.in @@ -0,0 +1,34 @@ +# config-lang.in -- Top level configure fragment for gcc SARIF "frontend". + +# Copyright (C) 2022 Free Software Foundation, Inc. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. + +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# Configure looks for the existence of this file to auto-config each language. +# We define several parameters used by configure: +# +# language - name of language as it would appear in $(LANGUAGES) +# compilers - value to add to $(COMPILERS) + +language="sarif" + +compilers="sarif-replay\$(exeext)" + +gtfiles="\$(srcdir)/sarif/sarif-frontend.cc" + +# Build by default. +build_by_default="yes" diff --git a/gcc/sarif/lang-specs.h b/gcc/sarif/lang-specs.h new file mode 100644 index 00000000000..750689c8374 --- /dev/null +++ b/gcc/sarif/lang-specs.h @@ -0,0 +1,26 @@ +/* lang-specs.h -- gcc driver specs for the SARIF "frontend". + Copyright (C) 2022 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* This is the contribution to the `default_compilers' array in gcc.cc + for the sarif "frontend". */ + +{".sarif", "@sarif", 0, 1, 0}, +/* FIXME: */ +{"@sarif", "sarif-replay %i %(cc1_options)", + 0, 1, 0}, diff --git a/gcc/sarif/lang.opt b/gcc/sarif/lang.opt new file mode 100644 index 00000000000..33d17879333 --- /dev/null +++ b/gcc/sarif/lang.opt @@ -0,0 +1,31 @@ +; Options for the SARIF front end. +; Copyright (C) 2022 Free Software Foundation, Inc. +; +; This file is part of GCC. +; +; GCC is free software; you can redistribute it and/or modify it under +; the terms of the GNU General Public License as published by the Free +; Software Foundation; either version 3, or (at your option) any later +; version. +; +; GCC is distributed in the hope that it will be useful, but WITHOUT ANY +; WARRANTY; without even the implied warranty of MERCHANTABILITY or +; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +; for more details. +; +; You should have received a copy of the GNU General Public License +; along with GCC; see the file COPYING3. If not see +; . + +; See the GCC internals manual for a description of this file's format. + +; Please try to keep this file in ASCII collating order. + +Language +SARIF + +fallow-comments +SARIF Var(flag_allow_comments) +Extend JSON to support comments + +; This comment is to ensure we retain the blank line above. diff --git a/gcc/sarif/sarif-frontend.cc b/gcc/sarif/sarif-frontend.cc new file mode 100644 index 00000000000..7623c85fee4 --- /dev/null +++ b/gcc/sarif/sarif-frontend.cc @@ -0,0 +1,191 @@ +/* The dummy "frontend" for re-emitting diagnostics saved in SARIF form. + Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +// TODO: prune this +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "debug.h" +#include "langhooks.h" +#include "langhooks-def.h" +#include "diagnostic.h" +#include "diagnostic-metadata.h" +#include "diagnostic-path.h" +#include "opts.h" +#include "options.h" +#include "line-map.h" +#include "stringpool.h" +#include "gcc-rich-location.h" +#include "json.h" +#include "json-reader.h" +#include "deferred-locations.h" +#include "logical-location.h" +#include "diagnostic-client-data-hooks.h" +#include "sarif/sarif-replay.h" + +/* Placeholder implementation; needed by a frontend. */ + +tree +convert (tree, tree) +{ + gcc_unreachable (); + return NULL_TREE; +} + +/* Language-dependent contents of a type. */ + +struct GTY(()) lang_type +{ + char dummy; +}; + +/* Language-dependent contents of a decl. */ + +struct GTY((variable_size)) lang_decl +{ + char dummy; +}; + +/* Language-dependent contents of an identifier. This must include a + tree_identifier. */ + +struct GTY(()) lang_identifier +{ + struct tree_identifier common; +}; + +/* The resulting tree type. */ + +union GTY((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE"), + chain_next ("CODE_CONTAINS_STRUCT (TREE_CODE (&%h.generic), TS_COMMON) ? ((union lang_tree_node *) TREE_CHAIN (&%h.generic)) : NULL"))) +lang_tree_node +{ + union tree_node GTY((tag ("0"), + desc ("tree_node_structure (&%h)"))) generic; + struct lang_identifier GTY((tag ("1"))) identifier; +}; + +/* We don't use language_function. */ + +struct GTY(()) language_function +{ + int dummy; +}; + +/* Language hooks. */ + +static bool +sarif_langhook_init (void) +{ + build_common_tree_nodes (false); + + replay_sarif (main_input_filename); + + return false; +} + +static unsigned int +sarif_langhook_option_lang_mask (void) +{ + return CL_SARIF; +} + +static bool +sarif_langhook_handle_option (size_t scode, + const char *arg, + HOST_WIDE_INT value, + int kind, + location_t loc, + const struct cl_option_handlers *handlers) +{ + bool result = true; + + switch (scode) + { + default: + if (cl_options[scode].flags & sarif_langhook_option_lang_mask ()) + break; + result = false; + } + + SARIF_handle_option_auto (&global_options, &global_options_set, + scode, arg, value, + sarif_langhook_option_lang_mask (), kind, + loc, handlers, global_dc); + + return result; +} + +static tree +sarif_langhook_type_for_mode (machine_mode, int) +{ + gcc_unreachable (); + return NULL_TREE; +} + +static bool +sarif_langhook_global_bindings_p (void) +{ + return true; +} + +static tree +sarif_langhook_pushdecl (tree decl ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +static tree +sarif_langhook_getdecls (void) +{ + return NULL; +} +#undef LANG_HOOKS_NAME +#define LANG_HOOKS_NAME "sarif" + +#undef LANG_HOOKS_INIT +#define LANG_HOOKS_INIT sarif_langhook_init + +#undef LANG_HOOKS_OPTION_LANG_MASK +#define LANG_HOOKS_OPTION_LANG_MASK sarif_langhook_option_lang_mask + +#undef LANG_HOOKS_HANDLE_OPTION +#define LANG_HOOKS_HANDLE_OPTION sarif_langhook_handle_option + +#undef LANG_HOOKS_TYPE_FOR_MODE +#define LANG_HOOKS_TYPE_FOR_MODE sarif_langhook_type_for_mode + +#undef LANG_HOOKS_GLOBAL_BINDINGS_P +#define LANG_HOOKS_GLOBAL_BINDINGS_P sarif_langhook_global_bindings_p + +#undef LANG_HOOKS_PUSHDECL +#define LANG_HOOKS_PUSHDECL sarif_langhook_pushdecl + +#undef LANG_HOOKS_GETDECLS +#define LANG_HOOKS_GETDECLS sarif_langhook_getdecls + +#undef LANG_HOOKS_DEEP_UNSHARING +#define LANG_HOOKS_DEEP_UNSHARING true + +struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; + +#include "gt-sarif-sarif-frontend.h" +#include "gtype-sarif.h" diff --git a/gcc/sarif/sarif-replay.cc b/gcc/sarif/sarif-replay.cc new file mode 100644 index 00000000000..2d5c58ead1e --- /dev/null +++ b/gcc/sarif/sarif-replay.cc @@ -0,0 +1,1489 @@ +/* Re-emitting diagnostics saved in SARIF form. + Copyright (C) 2022 David Malcolm . + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +// TODO: prune this +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "debug.h" +#include "langhooks.h" +#include "langhooks-def.h" +#include "diagnostic.h" +#include "diagnostic-metadata.h" +#include "diagnostic-path.h" +#include "opts.h" +#include "options.h" +#include "line-map.h" +#include "stringpool.h" +#include "gcc-rich-location.h" +#include "json-reader.h" +#include "deferred-locations.h" +#include "logical-location.h" +#include "diagnostic-client-data-hooks.h" +#include "sarif/sarif-replay.h" + +/* Forward decls. */ + +class sarif_replayer; +class sarif_diagnostic_client_data_hooks; + +/* FIXME. */ + +struct pending_location_range +{ + location_t m_start; + location_t m_end; +}; + +/* Concrete subclass of client_version_info, for use when + m_replayer->m_driver_obj is non-NULL. */ + +class current_driver_version_info : public client_version_info +{ +public: + current_driver_version_info (sarif_replayer *replayer) + : m_replayer (replayer) + {} + + const char *get_tool_name () const final override; + char *maybe_make_full_name () const final override; + const char *get_version_string () const final override; + char *maybe_make_version_url () const final override; + void for_each_plugin (plugin_visitor &visitor) const final override; + +private: + sarif_replayer *m_replayer; +}; + +/* Concrete subclass of logical_location, for use when + m_replayer->m_current_logical_loc_obj is non-NULL. */ + +class current_sarif_logical_location : public logical_location +{ +public: + current_sarif_logical_location (sarif_replayer *replayer) + : m_replayer (replayer) + {} + + const char *get_short_name () const final override; + const char *get_name_with_scope () const final override; + const char *get_internal_name () const final override; + enum logical_location_kind get_kind () const final override; + +private: + sarif_replayer *m_replayer; +}; + +/* Concrete subclass of diagnostic_client_data_hooks, for + use when replaying a SARIF file. + + Takes ownership of the sarif_replayer and toplevel json::value objects. */ + +class sarif_diagnostic_client_data_hooks : public diagnostic_client_data_hooks +{ + public: + sarif_diagnostic_client_data_hooks (sarif_replayer *replayer) + : m_replayer (replayer), + m_toplevel_jv (NULL), + m_driver_version_info (replayer), + m_current_logical_location (replayer) + {} + + ~sarif_diagnostic_client_data_hooks (); + + const client_version_info *get_any_version_info () const final override; + const logical_location *get_current_logical_location () const final override; + const char * + maybe_get_sarif_source_language (const char *filename) const final override; + + void stash (json::value *toplevel_jv) + { + m_toplevel_jv = toplevel_jv; + } + + sarif_replayer *m_replayer; + json::value *m_toplevel_jv; + + current_driver_version_info m_driver_version_info; + current_sarif_logical_location m_current_logical_location; +}; + +/* Subclass of diagnostic_metadata::rule for a reference to the SARIF + specification. */ + +class spec_ref : public diagnostic_metadata::rule +{ +public: + spec_ref (const char *section) + : m_section (section) + {} + + char *make_description () const final override + { + /* 'SECTION SIGN' (U+00A7). */ +#define SECTION_SIGN_UTF8 "\xC2\xA7" + return xasprintf ("SARIF v2.1.0 " SECTION_SIGN_UTF8 "%s", m_section); + } + + char *make_url () const final override + { + /* There doesn't seem to be a systematic mapping from spec sections to + HTML anchors, so we can't provide URLs + (filed as https://github.com/oasis-tcs/sarif-spec/issues/533 ). */ + return NULL; + } + +private: + /* e.g. "3.1" for section 3.1 of the spec. */ + const char *m_section; +}; + +/* A reference to the SARIF specification for a particular kind of object. */ + +class object_spec_ref : public spec_ref +{ +public: + object_spec_ref (const char *obj_name, const char *section) + : spec_ref (section), m_obj_name (obj_name) + {} + + const char *get_obj_name () const { return m_obj_name; } + +private: + const char *m_obj_name; +}; + +/* A reference to the SARIF specification for a particular property + of a particular kind of object. */ + +class property_spec_ref : public object_spec_ref +{ +public: + property_spec_ref (const char *obj_name, + const char *property_name, + const char *section) + : object_spec_ref (obj_name, section), m_property_name (property_name) + {} + + const char *get_property_name () const { return m_property_name; } + +private: + const char *m_property_name; +}; + +/* A bundle of state for replaying a SARIF JSON file. */ + +class sarif_replayer : public json_reader +{ +public: + friend class current_driver_version_info; + friend class current_sarif_logical_location; + friend class sarif_diagnostic_client_data_hooks; + + sarif_replayer (const char *filename); + ~sarif_replayer (); + + json::value * + get_optional_property (json::object *obj, const property_spec_ref &ref) + { + return obj->get (ref.get_property_name ()); + } + + json::value * + get_required_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_optional_property (obj, ref); + if (!property_val) + fatal_error_with_ref (obj, ref, + "expected %s object to have a %qs property", + ref.get_obj_name (), ref.get_property_name ()); + return property_val; + } + + int + require_int (json::value *jv, const property_spec_ref &ref) + { + if (json::integer_number *num = dyn_cast (jv)) + return num->get (); + fatal_error_with_ref (jv, ref, "expected %s.%s to be an integer", + ref.get_obj_name (), ref.get_property_name ()); + return 0; + } + + bool + get_optional_int_property (int *out, + json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_optional_property (obj, ref); + if (!property_val) + return false; + *out = require_int (property_val, ref); + return true; + } + + const char * + require_string (json::value *jv, const property_spec_ref &ref) + { + json::string *str = dyn_cast (jv); + if (!str) + fatal_error_with_ref (jv, ref, "expected %s.%s to be a string", + ref.get_obj_name (), ref.get_property_name ()); + return str->get_string (); + } + + const char * + get_optional_string_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_optional_property (obj, ref); + if (!property_val) + return NULL; + return require_string (property_val, ref); + } + + const char * + get_required_string_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_required_property (obj, ref); + return require_string (property_val, ref); + } + + json::object * + require_object (json::value *jv, const property_spec_ref &ref) + { + if (json::object *obj = dyn_cast (jv)) + return obj; + fatal_error_with_ref (jv, ref, "expected %s.%s to be an object", + ref.get_obj_name (), ref.get_property_name ()); + return NULL; + } + + json::object * + require_object_for_element (json::value *jv, const property_spec_ref &ref) + { + if (json::object *obj = dyn_cast (jv)) + return obj; + fatal_error_with_ref (jv, ref, + "expected element of %s.%s array to be an object", + ref.get_obj_name (), ref.get_property_name ()); + return NULL; + } + + json::object * + get_optional_object_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_optional_property (obj, ref); + if (!property_val) + return NULL; + return require_object (property_val, ref); + } + + json::object * + get_required_object_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_required_property (obj, ref); + if (!property_val) + return NULL; + return require_object (property_val, ref); + } + + json::array * + require_array_property (json::value *jv, const property_spec_ref &ref) + { + if (json::array *obj = dyn_cast (jv)) + return obj; + fatal_error_with_ref (jv, ref, "expected %s.%s to be an array", + ref.get_obj_name (), ref.get_property_name ()); + return NULL; + } + + json::array * + get_optional_array_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_optional_property (obj, ref); + if (!property_val) + return NULL; + return require_array_property (property_val, ref); + } + + json::array * + get_required_array_property (json::object *obj, const property_spec_ref &ref) + { + json::value *property_val = get_required_property (obj, ref); + return require_array_property (property_val, ref); + } + + void emit_sarif_as_diagnostics (json::value *jv, int pass); + void handle_run_obj (json::object *run_obj); + json::object * + lookup_rule_by_id_in_tool (const char *rule_id, + json::object *tool_obj); + json::object * + lookup_rule_by_id_in_component (const char *rule_id, + json::object *tool_component_obj); + diagnostic_path * + make_sarif_diagnostic_path (json::object *thread_flow_obj); + void handle_result_obj (json::object *result_obj, + json::object *tool_obj); + + char * + make_plain_text_within_result_message (json::object *tool_component_obj, + json::object *message_obj, + json::object *rule_obj); + const char * + lookup_plain_text_within_result_message (json::object *tool_component_obj, + json::object *message_obj, + json::object *rule_obj); + + location_t handle_location_object (json::object *location_obj, + json::object **out_logical_location_obj); + location_t handle_physical_location_object (json::object *phys_loc_obj); + const char *handle_artifact_location_object (json::object *artifact_loc_obj); + void handle_region_object (json::object *region_obj, + expanded_location *start_exp_loc, + expanded_location *end_exp_loc); + + tree get_or_create_fndecl (const char *func_str); + tree get_or_create_fndecl (json::object *logical_location_obj); + + void fatal_error_with_ref (json::value *jv, const spec_ref &ref, + const char *gmsgid, ...) ATTRIBUTE_GCC_DIAG(4,5) + { + location_t loc = m_json_loc_map.get_range_for_value (jv); + + auto_diagnostic_group d; + va_list ap; + va_start (ap, gmsgid); + rich_location richloc (line_table, loc); + diagnostic_metadata metadata; + metadata.add_rule (ref); + emit_diagnostic_valist (DK_ERROR, &richloc, &metadata, 0, gmsgid, &ap); + va_end (ap); + exit (1); + // FIXME: ::fatal_error, but make it usable from DejaGnu + } + +private: + deferred_locations m_deferred_locs; + + /* Map from physicalLocation object to pending_location_range ptr. + Populated in pass 0; used in pass 1. */ + hash_map m_phys_loc_map; + // TODO: delete this in dtor + + int m_pass; + + hash_map m_map_id_to_fndecl; + + json::object *m_driver_obj; + json::array *m_artifacts_arr; + json::object *m_current_logical_loc_obj; +}; + +/* class current_driver_version_info : public client_version_info. */ + +const char * +current_driver_version_info::get_tool_name () const +{ + gcc_assert (m_replayer->m_driver_obj); + + property_spec_ref name_prop ("toolComponent", "name", "3.19.8"); + return m_replayer->get_optional_string_property + (m_replayer->m_driver_obj, name_prop); +} + +char * +current_driver_version_info::maybe_make_full_name () const +{ + gcc_assert (m_replayer->m_driver_obj); + + property_spec_ref full_name_prop ("toolComponent", "fullName", "3.19.9"); + if (const char *full_name = m_replayer->get_optional_string_property + (m_replayer->m_driver_obj, full_name_prop)) + return xstrdup (full_name); + return NULL; +} + +const char * +current_driver_version_info::get_version_string () const +{ + gcc_assert (m_replayer->m_driver_obj); + + property_spec_ref version_prop ("toolComponent", "version", "3.19.13"); + return m_replayer->get_optional_string_property (m_replayer->m_driver_obj, + version_prop); +} + +char * +current_driver_version_info::maybe_make_version_url () const +{ + gcc_assert (m_replayer->m_driver_obj); + + property_spec_ref info_uri_prop ("toolComponent", "informationUri", + "3.19.17"); + if (const char *version_url = m_replayer->get_optional_string_property + (m_replayer->m_driver_obj, info_uri_prop)) + return xstrdup (version_url); + return NULL; +} + +void +current_driver_version_info::for_each_plugin (plugin_visitor &visitor) const +{ + // TODO +} + +/* class current_sarif_logical_location : public logical_location. */ + +const char * +current_sarif_logical_location::get_short_name () const +{ + gcc_assert (m_replayer->m_current_logical_loc_obj); + property_spec_ref name_prop ("logicalLocation", "name", "3.33.4"); + return m_replayer->get_optional_string_property + (m_replayer->m_current_logical_loc_obj, name_prop); +} + +const char * +current_sarif_logical_location::get_name_with_scope () const +{ + gcc_assert (m_replayer->m_current_logical_loc_obj); + property_spec_ref fqname_prop ("logicalLocation", "fullyQualifiedName", + "3.33.5"); + return m_replayer->get_optional_string_property + (m_replayer->m_current_logical_loc_obj, fqname_prop); +} + +const char * +current_sarif_logical_location::get_internal_name () const +{ + gcc_assert (m_replayer->m_current_logical_loc_obj); + property_spec_ref decorated_name_prop ("logicalLocation", "decoratedName", + "3.33.6"); + return m_replayer->get_optional_string_property + (m_replayer->m_current_logical_loc_obj, decorated_name_prop); +} + +enum logical_location_kind +current_sarif_logical_location::get_kind () const +{ + gcc_assert (m_replayer->m_current_logical_loc_obj); + property_spec_ref kind_prop ("logicalLocation", "kind", "3.33.7"); + if (const char *kind = m_replayer->get_optional_string_property + (m_replayer->m_current_logical_loc_obj, kind_prop)) + { + if (!strcmp (kind, "function")) + return LOGICAL_LOCATION_KIND_FUNCTION; + if (!strcmp (kind, "member")) + return LOGICAL_LOCATION_KIND_MEMBER; + if (!strcmp (kind, "module")) + return LOGICAL_LOCATION_KIND_MODULE; + if (!strcmp (kind, "namespace")) + return LOGICAL_LOCATION_KIND_NAMESPACE; + if (!strcmp (kind, "type")) + return LOGICAL_LOCATION_KIND_TYPE; + if (!strcmp (kind, "returnType")) + return LOGICAL_LOCATION_KIND_RETURN_TYPE; + if (!strcmp (kind, "parameter")) + return LOGICAL_LOCATION_KIND_PARAMETER; + if (!strcmp (kind, "variable")) + return LOGICAL_LOCATION_KIND_VARIABLE; + /* TODO: maybe consolidate this with maybe_get_sarif_kind + in the producer code. */ + } + + return LOGICAL_LOCATION_KIND_UNKNOWN; +} + +/* class sarif_diagnostic_client_data_hooks. */ + +sarif_diagnostic_client_data_hooks::~sarif_diagnostic_client_data_hooks () +{ + delete m_replayer; + delete m_toplevel_jv; +} + +/* We only have a client_version_info * if m_replayer has found + a driver object. */ + +const client_version_info * +sarif_diagnostic_client_data_hooks::get_any_version_info () const +{ + if (!m_replayer->m_driver_obj) + return NULL; + return &m_driver_version_info; +} + +/* We only have a current logical_location if m_replayer has one. */ + +const logical_location * +sarif_diagnostic_client_data_hooks::get_current_logical_location () const +{ + if (!m_replayer->m_current_logical_loc_obj) + return NULL; + return &m_current_logical_location; +} + +const char * +sarif_diagnostic_client_data_hooks:: +maybe_get_sarif_source_language (const char *filename) const +{ + if (!m_replayer->m_artifacts_arr) + return NULL; + + for (auto iter : *m_replayer->m_artifacts_arr) + { + property_spec_ref artifacts_prop ("run", "artifacts", "3.14.15"); + json::object *artifact_obj + = m_replayer->require_object_for_element (iter, artifacts_prop); + + property_spec_ref location_prop ("artifact", "location", "3.24.2"); + if (json::object *location_obj + = m_replayer->get_optional_object_property (artifact_obj, + location_prop)) + { + property_spec_ref uri_prop ("artifactLocation", "uri", "3.4.3"); + if (const char *uri + = m_replayer->get_optional_string_property (location_obj, + uri_prop)) + if (!strcmp (uri, filename)) + { + property_spec_ref source_lang_prop + ("artifact", "sourceLanguage", "3.24.10"); + if (const char *source_language + = m_replayer->get_optional_string_property (artifact_obj, + source_lang_prop)) + return source_language; + } + } + } + + return NULL; +} + +/* sarif_replayer's ctor. */ + +sarif_replayer::sarif_replayer (const char *filename) +: json_reader (filename), + m_pass (0), + m_driver_obj (NULL), + m_artifacts_arr (NULL), + m_current_logical_loc_obj (NULL) +{ +} + +/* sarif_replayer's dtor. */ + +sarif_replayer::~sarif_replayer () +{ + for (auto iter : m_phys_loc_map) + delete iter.second; +} + +/* Perform one pass of replay of the output file. + Pass 0 captures the source locations of interest, so that we can generate + line_maps. + Pass 1 uses the line_maps. */ + +void +sarif_replayer::emit_sarif_as_diagnostics (json::value *jv, int pass) +{ + m_pass = pass; + + /* We expect a sarifLog object as the top-level value + (SARIF v2.1.0 section 3.13). */ + json::object *toplev_obj = dyn_cast (jv); + if (!toplev_obj) + fatal_error_with_ref (jv, spec_ref ("3.1"), + "expected a sarifLog object as the top-level value"); + + /* sarifLog objects SHALL have a property named "version" + (SARIF v2.1.0 section 3.13.2) with a string value. */ + get_required_string_property (toplev_obj, + property_spec_ref ("sarifLog", "version", + "3.13.2")); + + /* sarifLog.runs must be null or be an array. */ + property_spec_ref prop_runs ("sarifLog", "runs", "3.13.4"); + json::value *runs = get_required_property (toplev_obj, prop_runs); + switch (runs->get_kind ()) + { + default: + fatal_error_with_ref (runs, prop_runs, + "expected sarifLog.runs to be" + " % or an array"); + break; + case json::JSON_NULL: + /* Nothing to do. */ + return; + case json::JSON_ARRAY: + { + json::array *runs_arr = as_a (runs); + for (auto element : *runs_arr) + { + json::object *run_obj + = require_object_for_element (element, prop_runs); + handle_run_obj (run_obj); + } + } + break; + } + + if (m_pass == 0) + m_deferred_locs.generate_location_t_values (); +} + +/* Process a run object (SARIF v2.1.0 section 3.14). */ + +void +sarif_replayer::handle_run_obj (json::object *run_obj) +{ + json::object *tool_obj + = get_required_object_property (run_obj, + property_spec_ref ("run", "tool", + "3.14.6")); + + m_driver_obj + = get_required_object_property (tool_obj, + property_spec_ref ("tool", "driver", + "3.18.2")); + + m_artifacts_arr + = get_optional_array_property (run_obj, + property_spec_ref ("run", "artifacts", + "3.14.15")); + + /* If present, run.results must be null or be an array. */ + property_spec_ref prop_results ("run", "results", "3.14.23"); + json::value *results = run_obj->get ("results"); + if (!results) + return; + switch (results->get_kind ()) + { + default: + fatal_error_with_ref (results, prop_results, + "expected run.results to be" + " % or an array"); + break; + case json::JSON_NULL: + /* Nothing to do. */ + return; + case json::JSON_ARRAY: + { + json::array *results_arr = as_a (results); + for (auto element : *results_arr) + { + json::object *result_obj + = require_object_for_element (element, prop_results); + handle_result_obj (result_obj, tool_obj); + } + } + break; + } +} + +/* Convert a SARIF result.level (3.27.10) to a GCC diagnostic kind. */ + +static diagnostic_t +get_diagnostic_kind_from_result_level (const char *level) +{ + if (!strcmp (level, "warning")) + return DK_WARNING; + if (!strcmp (level, "error")) + return DK_ERROR; + if (!strcmp (level, "note")) + return DK_NOTE; + return DK_UNSPECIFIED; // FIXME: what if this happens (e.g. with "none") +} + +/* Concrete subclass of diagnostic_metadata::rule for a specific SARIF + rule object. */ + +class sarif_rule : public diagnostic_metadata::rule +{ +public: + sarif_rule (sarif_replayer *replayer, const char *rule_id, + json::object *rule_obj) + : m_replayer (replayer), + m_rule_id (rule_id), + m_rule_obj (rule_obj) + {} + + char *make_description () const final override + { + if (m_rule_id) + return xstrdup (m_rule_id); + else + return NULL; + } + + char *make_url () const final override + { + if (m_rule_obj) + { + property_spec_ref prop_help_uri + ("reportingDescriptor", "helpUri", "3.49.12"); + if (const char *help_uri + = m_replayer->get_optional_string_property (m_rule_obj, + prop_help_uri)) + return xstrdup (help_uri); + } + return NULL; + } + +private: + sarif_replayer *m_replayer; + const char *m_rule_id; + json::object *m_rule_obj; +}; + +/* If ITER_SRC starts with a placeholder as per §3.11.5, advance ITER_SRC + to immediately beyond the placeholder, write to *OUT_ARG_IDX, and + return true. + + Otherwise, leave ITER_SRC untouched and return false. */ + +static bool +maybe_consume_placeholder (const char *&iter_src, unsigned *out_arg_idx) +{ + if (*iter_src != '{') + return false; + const char *first_digit = iter_src + 1; + const char *iter_digit = first_digit; + while (char ch = *iter_digit) + switch (ch) + { + default: + return false; + + case '}': + if (iter_digit == first_digit) + { + /* No digits, we simply have "{}" which is not a placeholder + (and malformed: the braces should have been escaped). */ + return false; + } + *out_arg_idx = atoi (first_digit); + iter_src = iter_digit + 1; + return true; + + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + // FIXME: what about multiple leading zeroes? + iter_digit++; + continue; + } + return false; // TODO +} + +/* Lookup the plain text string within a result.message (§3.27.11), + and substitute for any placeholders (§3.11.5). + TODO: embedded links? + + MESSAGE_OBJ is "theMessage" + RULE_OBJ is "theRule". */ + +char * +sarif_replayer:: +make_plain_text_within_result_message (json::object *tool_component_obj, + json::object *message_obj, + json::object *rule_obj) +{ + const char *original_text + = lookup_plain_text_within_result_message (tool_component_obj, + message_obj, + rule_obj); + if (!original_text) + return NULL; + + /* Look up any arguments for substituting into placeholders. */ + property_spec_ref arguments_prop ("message", "arguments", "3.11.11"); + json::array *arguments + = get_optional_array_property (message_obj, arguments_prop); + + /* Duplicate original_text, substituting any placeholders. */ + pretty_printer pp; + + const char *iter_src = original_text; + while (char ch = *iter_src) + { + unsigned arg_idx; + if (maybe_consume_placeholder (iter_src, &arg_idx)) + { + if (!arguments) + fatal_error_with_ref (message_obj, arguments_prop, + "message string contains placeholder %<{%i}%>" + " but message object has no %qs property", + (int)arg_idx, + arguments_prop.get_property_name ()); + if (arg_idx >= arguments->length ()) + fatal_error_with_ref (message_obj, arguments_prop, + "not enough strings in %qs array for" + " placeholder %<{%i}%>", + arguments_prop.get_property_name (), + (int)arg_idx); + const char *replacement_str + = require_string (arguments->get (arg_idx), arguments_prop); + pp_string (&pp, replacement_str); + } + else if (ch == '{' || ch == '}') + { + /* '{' and '}' are escaped by repeating them. */ + if (iter_src[1] == ch) + { + pp_character (&pp, ch); + iter_src += 2; + } + else + fatal_error_with_ref (message_obj, arguments_prop, + "unescaped '%c' within message string", ch); + } + else + pp_character (&pp, *(iter_src++)); + } + + return xstrdup (pp_formatted_text (&pp)); +} + + /* IF theMessage.text is present and the desired language is theRun.language THEN + Use the text or markdown property of theMessage as appropriate. + IF the string has not yet been found THEN + IF theMessage occurs as the value of result.message (§3.27.11) THEN + LET theRule be the reportingDescriptor object (§3.49), an element of theComponent.rules (§3.19.23), which defines the rule that was violated by this result. + IF theRule exists AND theRule.messageStrings (§3.49.11) is present AND contains a property whose name equals theMessage.id THEN + LET theMFMS be the multiformatMessageString object (§3.12) that is the value of that property. + Use the text or markdown property of theMFMS as appropriate. + ELSE IF theMessage occurs as the value of notification.message (§3.58.5) THEN + LET theDescriptor be the reportingDescriptor object (§3.49), an element of theComponent.notifications (§3.19.23), which describes this notification. + IF theDescriptor exists AND theDescriptor.messageStrings is present AND contains a property whose name equals theMessage.id THEN + LET theMFMS be the multiformatMessageString object that is the value of that property. + Use the text or markdown property of theMFMS as appropriate. + IF the string has not yet been found THEN + IF theComponent.globalMessageStrings (§3.19.22) is present AND contains a property whose name equals theMessage.id THEN + LET theMFMS be the multiformatMessageString object that is the value of that property. + Use the text or markdown property of theMFMS as appropriate. + IF the string has not yet been found THEN + The lookup procedure fails (which means the SARIF log file is invalid). + */ + +/* Implement the message string lookup algorithm from + SARIF v2.1.0 section 3.11.7, for the case where theMessage + is the value of result.message (§3.27.11). + + MESSAGE_OBJ is "theMessage" + RULE_OBJ is "theRule". */ + +const char * +sarif_replayer:: +lookup_plain_text_within_result_message (json::object *tool_component_obj, + json::object *message_obj, + json::object *rule_obj) +{ + gcc_assert (message_obj); + // rule_obj can be NULL + + /* IF theMessage.text is present and the desired language is theRun.language THEN + Use the text or markdown property of theMessage as appropriate. */ + if (const char *text + = get_optional_string_property (message_obj, + property_spec_ref ("message", "text", + "3.11.8"))) + // TODO: check language + return text; + + const char *message_id + = get_optional_string_property (message_obj, + property_spec_ref ("message", "id", + "3.11.10")); + + /* LET theRule be the reportingDescriptor object (§3.49), an element of theComponent.rules (§3.19.23), which defines the rule that was violated by this result. + IF theRule exists AND theRule.messageStrings (§3.49.11) is present AND contains a property whose name equals theMessage.id THEN + LET theMFMS be the multiformatMessageString object (§3.12) that is the value of that property. + Use the text or markdown property of theMFMS as appropriate. + */ + if (rule_obj && message_id) + { + property_spec_ref message_strings ("reportingDescriptor", + "messageStrings", + "3.49.11"); + if (json::object *message_strings_obj + = get_optional_object_property (rule_obj, message_strings)) + { + if (json::value *mfms = message_strings_obj->get (message_id)) + { + json::object *mfms_obj = require_object (mfms, message_strings); + + // 3.12 multiformatMessageString object + const char *text = get_required_string_property + (mfms_obj, + property_spec_ref ("multiformatMessageString", "text", + "3.12.3")); + return text; + } + } + } + + // TODO: + /* IF the string has not yet been found THEN + IF theComponent.globalMessageStrings (§3.19.22) is present AND contains a property whose name equals theMessage.id THEN + LET theMFMS be the multiformatMessageString object that is the value of that property. + Use the text or markdown property of theMFMS as appropriate. + */ + + /* Failure. */ + fatal_error_with_ref (message_obj, spec_ref ("3.11.7"), + "could not find string for message object"); + return NULL; +} + +/* FIXME. */ +// 3.52.3 reportingDescriptor lookup +// "For an example of the interaction between ruleId and rule.id, see §3.52.4." + +json::object * +sarif_replayer::lookup_rule_by_id_in_tool (const char *rule_id, + json::object *tool_obj) +{ + if (!rule_id) + return NULL; + if (!tool_obj) + return NULL; + + json::object *driver_obj + = get_required_object_property (tool_obj, + property_spec_ref ("tool", "driver", + "3.18.2")); + + if (json::object *rule_obj + = lookup_rule_by_id_in_component (rule_id, driver_obj)) + return rule_obj; + + // TODO: also handle extensions + + return NULL; +} + +/* FIXME. */ + +json::object * +sarif_replayer::lookup_rule_by_id_in_component (const char *rule_id, + json::object *tool_component_obj) +{ + property_spec_ref rules ("toolComponent", "rules", "3.18.2"); + + json::array *rules_arr + = get_optional_array_property (tool_component_obj, rules); + if (!rules_arr) + return NULL; + + for (auto element : *rules_arr) + { + json::object *reporting_desc_obj + = require_object_for_element (element, rules); + + /* reportingDescriptor objects (§3.49). */ + property_spec_ref id ("reportingDescriptor", "id", "3.49.3"); + const char *desc_id + = get_required_string_property (reporting_desc_obj, id); + if (!strcmp (rule_id, desc_id)) + return reporting_desc_obj; + } + + return NULL; +} + +/* Ensure that we have a placeholder fndecl named FUNC_STR. + All such placeholder functions merely have signature + void FUNC_STR (void); */ + +tree +sarif_replayer::get_or_create_fndecl (const char *func_str) +{ + tree id = get_identifier (func_str); + if (tree *slot = m_map_id_to_fndecl.get (id)) + return *slot; + tree fntype_void_void + = build_function_type_array (void_type_node, 0, NULL); + tree fn_type = fntype_void_void; + tree fndecl = build_fn_decl (func_str, fn_type); + m_map_id_to_fndecl.put (id, fndecl); + return fndecl; +} + +/* Attempt to get a placeholder fndecl for the given SARIF logicalLocation + object (3.33). + + All such placeholder functions merely have signature + void FUNC_STR (void); */ + +tree +sarif_replayer::get_or_create_fndecl (json::object *logical_location_obj) +{ + /* First try "name". */ + property_spec_ref prop_name ("logicalLocation", "name", "3.33.4"); + if (const char *name + = get_optional_string_property (logical_location_obj, prop_name)) + return get_or_create_fndecl (name); + + /* Failing that, try "fullyQualifiedName". */ + property_spec_ref + prop_fqname ("logicalLocation", "fullyQualifiedName", "3.33.5"); + if (const char *fqname + = get_optional_string_property (logical_location_obj, prop_fqname)) + return get_or_create_fndecl (fqname); + + /* Failure. */ + return NULL; +} + +/* Concrete subclass of diagnostic_event for use when replaying diagnostics + from a SARIF file. + + Corresponds to a threadFlowLocation object (§3.38). */ + +class sarif_replay_diagnostic_event : public diagnostic_event +{ +public: + sarif_replay_diagnostic_event (sarif_replayer *replayer, + json::object *tflow_loc_obj) + : m_replayer (replayer), + m_tflow_loc_obj (tflow_loc_obj), + m_logical_location_obj (NULL), + m_loc (UNKNOWN_LOCATION), + m_fndecl (NULL), + m_stack_depth (0), + m_message (NULL), + m_meaning () + { + property_spec_ref location_prop ("threadFlowLocation", "location", + "3.38.3"); + if (json::object *location_obj + = replayer->get_optional_object_property (tflow_loc_obj, location_prop)) + { + /* location object (§3.28). */ + m_loc = replayer->handle_location_object (location_obj, + &m_logical_location_obj); + if (m_logical_location_obj) + m_fndecl = replayer->get_or_create_fndecl (m_logical_location_obj); + + /* Get any message from here. */ + property_spec_ref location_message ("location", "message", "3.28.5"); + if (json::object *message_obj + = replayer->get_optional_object_property (location_obj, + location_message)) + m_message = replayer->make_plain_text_within_result_message + (NULL, + message_obj, + NULL/* FIXME. */); + } + + // 3.38.8 kinds property + // TODO: populate m_meaning + + /* nestingLevel property (§3.38.10). */ + property_spec_ref nesting_level ("threadFlowLocation", "nestingLevel", + "3.38.10"); + replayer->get_optional_int_property + (&m_stack_depth, tflow_loc_obj, nesting_level); + if (m_stack_depth < 0) + replayer->fatal_error_with_ref (tflow_loc_obj, nesting_level, + "expected a non-negative integer"); + } + ~sarif_replay_diagnostic_event () + { + free (m_message); + } + + location_t get_location () const final override { return m_loc; } + tree get_fndecl () const final override { return m_fndecl; } + int get_stack_depth () const final override { return m_stack_depth; } + label_text get_desc (bool) const final override + { + return label_text::borrow (m_message ? m_message : ""); + } + const logical_location *get_logical_location () const final override + { + return NULL; // TODO + } + meaning get_meaning () const final override + { + return m_meaning; + } + +private: + sarif_replayer *m_replayer; + json::object *m_tflow_loc_obj; + json::object *m_logical_location_obj; + location_t m_loc; + tree m_fndecl; + int m_stack_depth; + char *m_message; // has been i18n-ed and formatted, or is NULL + meaning m_meaning; +}; + +/* Concrete subclass of diagnostic_path for use when replaying diagnostics + from a SARIF file. */ + +class sarif_replay_diagnostic_path : public diagnostic_path +{ +public: + unsigned num_events () const final override { return m_events.length (); } + const diagnostic_event & get_event (int idx) const final override + { + return *m_events[idx]; + } + + void add_event (sarif_replay_diagnostic_event *event) + { + m_events.safe_push (event); + } + +private: + auto_delete_vec m_events; +}; + +/* Make a new diagnostic_path instance for THREAD_FLOW_OBJ, a + SARIF threadFlow object (section 3.37). */ + +diagnostic_path * +sarif_replayer::make_sarif_diagnostic_path (json::object *thread_flow_obj) +{ + property_spec_ref locations ("threadFlow", "locations", "3.37.6"); + json::array *locations_arr + = get_required_array_property (thread_flow_obj, locations); + sarif_replay_diagnostic_path *path = new sarif_replay_diagnostic_path (); + for (auto location : *locations_arr) + { + /* threadFlowLocation object (§3.38). */ + json::object *tflow_loc_obj + = require_object_for_element (location, locations); + + sarif_replay_diagnostic_event *event + = new sarif_replay_diagnostic_event (this, tflow_loc_obj); + path->add_event (event); + } + return path; +} + +/* TODO. */ +// location object (§3.28) + +location_t +sarif_replayer::handle_location_object (json::object *location_obj, + json::object **out_logical_location_obj) +{ + location_t loc = UNKNOWN_LOCATION; + + // §3.28.3 physicalLocation property + { + property_spec_ref physical_location_prop ("location", "physicalLocation", + "3.28.3"); + if (json::object *phys_loc_obj + = get_optional_object_property (location_obj, physical_location_prop)) + loc = handle_physical_location_object (phys_loc_obj); + } + + // §3.28.4 logicalLocations property + { + property_spec_ref logical_locations_prop ("location", "logicalLocations", + "3.28.4"); + if (json::array *logical_loc_arr + = get_optional_array_property (location_obj, logical_locations_prop)) + if (logical_loc_arr->length () > 0) + { + /* Only look at the first, if there's more than one. */ + *out_logical_location_obj + = require_object_for_element (logical_loc_arr->get (0), + logical_locations_prop); + } + } + + return loc; +} + +/* TODO. */ +// physicalLocation object (§3.29) + +location_t +sarif_replayer::handle_physical_location_object (json::object *phys_loc_obj) +{ + expanded_location start_exp_loc = {NULL, 0, 0, NULL, false}; + expanded_location end_exp_loc = {NULL, 0, 0, NULL, false}; + + const char *filename = NULL; + //3.29.3 artifactLocation property + property_spec_ref artifact_location_prop ("physicalLocation", "artifactLocation", + "3.29.3"); + if (json::object *artifact_loc_obj + = get_optional_object_property (phys_loc_obj, artifact_location_prop)) + if (const char *filename + = handle_artifact_location_object (artifact_loc_obj)) + { + start_exp_loc.file = filename; + end_exp_loc.file = filename; + } + + //3.29.4 region property + property_spec_ref region_prop ("physicalLocation", "region", "3.29.4"); + if (json::object *region_obj + = get_optional_object_property (phys_loc_obj, region_prop)) + handle_region_object (region_obj, &start_exp_loc, &end_exp_loc); + + // FIXME: what about ranges??? + + // TODO: + //3.29.5 contextRegion property + //3.29.6 address property + + if (m_pass == 0) + { + pending_location_range *pending_range = new pending_location_range (); + m_deferred_locs.add_location (start_exp_loc, &pending_range->m_start); + m_deferred_locs.add_location (end_exp_loc, &pending_range->m_end); + m_phys_loc_map.put (phys_loc_obj, pending_range); + return UNKNOWN_LOCATION; + } + else + { + pending_location_range *pending_range + = *m_phys_loc_map.get (phys_loc_obj); + return make_location (pending_range->m_start, + pending_range->m_start, + pending_range->m_end); + } +} + +// TODO +// artifactLocation object (§3.4) + +const char * +sarif_replayer::handle_artifact_location_object (json::object *artifact_loc_obj) +{ + // TODO + // 3.4.3 uri property + property_spec_ref uri_prop ("artifactLocation", "uri", "3.4.3"); + const char *uri = get_optional_string_property (artifact_loc_obj, uri_prop); + return uri; + + // TODO + // 3.4.4 uriBaseId property + // 3.4.5 index property +} + +// TODO +// region object (§3.30) + +void +sarif_replayer::handle_region_object (json::object *region_obj, + expanded_location *start_exp_loc, + expanded_location *end_exp_loc) +{ +#if 0 + if (pass == 0) + { + sarif::region_object *obj = new region_object (region_obj); + } +#endif + // TODO + // TODO: 3.30.5 startLine property + property_spec_ref start_line_prop ("region", "startLine", "3.30.5"); + int start_line; + if (get_optional_int_property (&start_line, region_obj, start_line_prop)) + { + /* Text region defined by line/column properties. */ + start_exp_loc->line = start_line; + end_exp_loc->column = 1; + + int start_column; + property_spec_ref start_column_prop ("region", "startColumn", "3.30.6"); + if (get_optional_int_property (&start_column, region_obj, + start_column_prop)) + start_exp_loc->column = start_column; + else + start_exp_loc->column = 1; + + int end_line; + property_spec_ref end_line_prop ("region", "endLine", "3.30.7"); + if (get_optional_int_property (&end_line, region_obj, end_line_prop)) + end_exp_loc->line = end_line; + else + end_exp_loc->line = start_line; + + int end_column; + property_spec_ref end_column_prop ("region", "endColumn", "3.30.8"); + if (get_optional_int_property (&end_column, region_obj, + end_column_prop)) + { + /* SARIF's endColumn is 1 beyond the final column in the region, + whereas GCC's end columns are inclusive. */ + end_exp_loc->column = end_column - 1; + } + else + { + // missing "endColumn" means the whole of the rest of the row + // TODO + } + } +} + +/* Process a result object (SARIF v2.1.0 section 3.27). */ + +void +sarif_replayer::handle_result_obj (json::object *result_obj, + json::object *tool_obj) +{ + const char *rule_id + = get_optional_string_property (result_obj, + property_spec_ref ("result", "ruleId", + "3.27.5")); + json::object *rule_obj = lookup_rule_by_id_in_tool (rule_id, tool_obj); + + // 3.27.6 ruleIndex property + + // TODO: 3.27.8 taxa property + if (json::array *taxa_arr + = get_optional_array_property (result_obj, + property_spec_ref ("result", "taxa", + "3.27.8"))) + { + // TODO: + } + + diagnostic_t diag_kind = DK_WARNING; + if (const char *level + = get_optional_string_property (result_obj, + property_spec_ref ("result", "level", + "3.27.10"))) + diag_kind = get_diagnostic_kind_from_result_level (level); + + // 3.27.11 message property + char *text = NULL; + json::object *message_obj + = get_optional_object_property (result_obj, + property_spec_ref ("result", "message", + "3.27.11")); + if (message_obj) + text = make_plain_text_within_result_message (NULL, // FIXME: tool_component_obj, + message_obj, + rule_obj); + + // 3.27.12 locations property + json::object *logical_location_obj = NULL; + tree fndecl = NULL; + location_t loc = UNKNOWN_LOCATION; + property_spec_ref locations_prop ("result", "locations", "3.27.12"); + json::array *locations_arr + = get_required_array_property (result_obj, locations_prop); + if (locations_arr->length () > 0) + { + /* Only look at the first, if there's more than one. */ + // location objects (§3.28) + json::object *location_obj + = require_object_for_element (locations_arr->get (0), locations_prop); + loc = handle_location_object (location_obj, &logical_location_obj); + if (logical_location_obj) + fndecl = get_or_create_fndecl (logical_location_obj); + } + + // 3.27.18 codeFlows property + diagnostic_path *path = NULL; + property_spec_ref code_flows ("result", "codeFlows", "3.27.18"); + if (json::array *code_flows_arr + = get_optional_array_property (result_obj, code_flows)) + { + if (code_flows_arr->length () == 1) + { + json::object *code_flow_obj + = require_object_for_element (code_flows_arr->get (0), code_flows); + + property_spec_ref thread_flows ("result", "threadFlows", "3.36.3"); + if (json::array *thread_flows_arr + = get_optional_array_property (code_flow_obj, thread_flows)) + { + if (thread_flows_arr->length () == 1) + { + json::object *thread_flow_obj + = require_object_for_element (thread_flows_arr->get (0), + thread_flows); + path = make_sarif_diagnostic_path (thread_flow_obj); + } + } + } + } + + // TODO: 3.27.22 relatedLocations property + + // TODO: 3.27.30 fixes property + + // TODO: use logical_location_obj, if non-NULL + //sarif_logical_location logical_loc (logical_location_obj); + //if (logical_location_obj) + m_current_logical_loc_obj = logical_location_obj; + current_function_decl = fndecl; + rich_location rich_loc (line_table, loc); + rich_loc.set_path (path); + sarif_rule rule (this, rule_id, rule_obj); + diagnostic_metadata metadata; + metadata.add_rule (rule); + auto_diagnostic_group d; + if (m_pass == 1) + if (emit_diagnostic (diag_kind, &rich_loc, &metadata, 0, "%s", + text ? text : "FIXME")) + { + // TODO + } + free (text); + delete path; + current_function_decl = NULL; + m_current_logical_loc_obj = NULL; +} + +/* Attempt to load a SARIF file from FILENAME and replay it. + Exit on any errors. */ + +void +replay_sarif (const char *filename) +{ + sarif_replayer *p = new sarif_replayer (filename); + sarif_diagnostic_client_data_hooks *hooks + = new sarif_diagnostic_client_data_hooks (p); + global_dc->m_client_data_hooks = hooks; + + char *content = read_file (filename); + json::error *err = NULL; + json::value *jv = p->parse_utf8_string (content, flag_allow_comments, &err); + if (err) + { + p->fatal_error (err); + delete err; + } + free (content); + + if (jv) + { + for (int pass = 0; pass < 2; pass++) + p->emit_sarif_as_diagnostics (jv, pass); + hooks->stash (jv); + } + + // global_dc's client_data takes ownership of "p" + // TODO: should it take ownership of jv? + // TODO: or should we clone jv? +} diff --git a/gcc/sarif/sarif-replay.h b/gcc/sarif/sarif-replay.h new file mode 100644 index 00000000000..fb2aa79dee2 --- /dev/null +++ b/gcc/sarif/sarif-replay.h @@ -0,0 +1,26 @@ +/* Re-emitting diagnostics saved in SARIF form. + Copyright (C) 2022 David Malcolm . + Contributed by David Malcolm . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_SARIF_SARIF_REPLAY_H +#define GCC_SARIF_SARIF_REPLAY_H + +extern void replay_sarif (const char *filename); + +#endif /* GCC_SARIF_SARIF_H */ diff --git a/gcc/testsuite/lib/sarif-dg.exp b/gcc/testsuite/lib/sarif-dg.exp new file mode 100644 index 00000000000..c82d7a131a1 --- /dev/null +++ b/gcc/testsuite/lib/sarif-dg.exp @@ -0,0 +1,233 @@ +# Copyright (C) 2004-2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +load_lib gcc-dg.exp +load_lib torture-options.exp + +#FIXME: copied from gfortran-dg.exp + +# Define sarif callbacks for dg.exp. + +proc sarif-dg-test { prog do_what extra_tool_flags } { + set result \ + [gcc-dg-test-1 sarif_target_compile $prog $do_what $extra_tool_flags] + + set comp_output [lindex $result 0] + set output_file [lindex $result 1] + + # gcc's default is to print the caret and source code, but + # most test cases implicitly use the flag -fno-diagnostics-show-caret + # to disable caret (and source code) printing. + # + # However, a few test cases override this back to the default by + # explicily supplying "-fdiagnostics-show-caret", so that we can have + # test coverage for caret/source code printing. + # + # sarif error messages with caret-printing look like this: + # [name]:[locus]: + # + # some code + # 1 + # Error: Some error at (1) + # or + # [name]:[locus]: + # + # some code + # 1 + # [name]:[locus2]: + # + # some other code + # 2 + # Error: Some error at (1) and (2) + # or + # [name]:[locus]: + # + # some code and some more code + # 1 2 + # Error: Some error at (1) and (2) + # + # If this is such a test case, skip the rest of this function, so + # that the test case can explicitly verify the output that it expects. + if {[string first "-fdiagnostics-show-caret" $extra_tool_flags] >= 0} { + return [list $comp_output $output_file] + } + + # Otherwise, caret-printing is disabled. + # sarif errors with caret-printing disabled look like this: + # [name]:[locus]: Error: Some error + # or + # [name]:[locus]: Error: (1) + # [name]:[locus2]: Error: Some error at (1) and (2) + # + # Where [locus] is either [line] or [line].[column] or + # [line].[column]-[column] . + # + # We collapse these to look like: + # [name]:[line]:[column]: Error: Some error at (1) and (2) + # or + # [name]:[line]:[column]: Error: Some error at (1) and (2) + # [name]:[line2]:[column]: Error: Some error at (1) and (2) + # + # Note that these regexps only make sense in the combinations used below. + # Note also that is imperative that we first deal with the form with + # two loci. + set locus_regexp "(\[^\n\]+:\[0-9\]+)\[\.:\](\[0-9\]+)(-\[0-9\]+)?:\n\n\[^\n\]+\n\[^\n\]+\n" + set diag_regexp "(\[^\n\]+)\n" + + # We proceed in steps: + + # 1. We add first a column number if none exists. + # (Some Fortran diagnostics have the locus after Warning|Error) + set colnum_regexp "(^|\n)(Warning: |Error: )?(\[^:\n\]+:\[0-9\]+):(\[ \n\])" + regsub -all $colnum_regexp $comp_output "\\1\\3:0:\\4\\2" comp_output + verbose "comput_output0:\n$comp_output" + + # 2. We deal with the form with two different locus lines, + set two_loci "(^|\n)$locus_regexp$locus_regexp$diag_regexp" + regsub -all $two_loci $comp_output "\\1\\2:\\3: \\8\n\\5\:\\6: \\8\n" comp_output + verbose "comput_output1:\n$comp_output" + + set locus_prefix "(\[^:\n\]+:\[0-9\]+:\[0-9\]+: )(Warning: |Error: )" + set two_loci2 "(^|\n)$locus_prefix\\(1\\)\n$locus_prefix$diag_regexp" + regsub -all $two_loci2 $comp_output "\\1\\2\\3\\6\n\\4\\5\\6\n" comp_output + verbose "comput_output2:\n$comp_output" + + # 3. then with the form with only one locus line. + set single_locus "(^|\n)$locus_regexp$diag_regexp" + regsub -all $single_locus $comp_output "\\1\\2:\\3: \\5\n" comp_output + verbose "comput_output3:\n$comp_output" + + # 4. Add a line number if none exists + regsub -all "(^|\n)(Warning: |Error: )" $comp_output "\\1:0:0: \\2" comp_output + verbose "comput_output4:\n$comp_output" + return [list $comp_output $output_file] +} + +proc sarif-dg-prune { system text } { + return [gcc-dg-prune $system $text] +} + +# Utility routines. + +# Modified dg-runtest that can cycle through a list of optimization options +# as c-torture does. +proc sarif-dg-runtest { testcases flags default-extra-flags } { + global runtests + global torture_with_loops + + # Some callers set torture options themselves; don't override those. + set existing_torture_options [torture-options-exist] + if { $existing_torture_options == 0 } { + global DG_TORTURE_OPTIONS + torture-init + set-torture-options $DG_TORTURE_OPTIONS + } + dump-torture-options + + foreach test $testcases { + # If we're only testing specific files and this isn't one of + # them, skip it. + if ![runtest_file_p $runtests $test] { + continue + } + + # look if this is dg-do-run test, in which case + # we cycle through the option list, otherwise we don't + if [expr [search_for $test "dg-do run"]] { + set option_list $torture_with_loops + } else { + set option_list [list { -O } ] + } + + set nshort [file tail [file dirname $test]]/[file tail $test] + list-module-names $test + + foreach flags_t $option_list { + verbose "Testing $nshort, $flags $flags_t" 1 + dg-test $test "$flags $flags_t" ${default-extra-flags} + cleanup-modules "" + } + } + + if { $existing_torture_options == 0 } { + torture-finish + } +} + +proc sarif-dg-debug-runtest { target_compile trivial opt_opts testcases } { + global srcdir subdir DEBUG_TORTURE_OPTIONS + + if ![info exists DEBUG_TORTURE_OPTIONS] { + set DEBUG_TORTURE_OPTIONS "" + set type_list [list "-gstabs" "-gstabs+" "-gxcoff" "-gxcoff+" "-gdwarf-2" ] + foreach type $type_list { + set comp_output [$target_compile \ + "$srcdir/$subdir/$trivial" "trivial.S" assembly \ + "additional_flags=$type"] + if { [string match "exit status *" $comp_output] } { + continue + } + if { [string match \ + "* target system does not support the * debug format*" \ + $comp_output] + } { + continue + } + remove-build-file "trivial.S" + foreach level {1 "" 3} { + if { ($type == "-gdwarf-2") && ($level != "") } { + lappend DEBUG_TORTURE_OPTIONS [list "${type}" "-g${level}"] + foreach opt $opt_opts { + lappend DEBUG_TORTURE_OPTIONS \ + [list "${type}" "-g${level}" "$opt" ] + } + } else { + lappend DEBUG_TORTURE_OPTIONS [list "${type}${level}"] + foreach opt $opt_opts { + lappend DEBUG_TORTURE_OPTIONS \ + [list "${type}${level}" "$opt" ] + } + } + } + } + } + + verbose -log "Using options $DEBUG_TORTURE_OPTIONS" + + global runtests + + foreach test $testcases { + # If we're only testing specific files and this isn't one of + # them, skip it. + if ![runtest_file_p $runtests $test] { + continue + } + + set nshort [file tail [file dirname $test]]/[file tail $test] + list-module-names $test + + foreach flags $DEBUG_TORTURE_OPTIONS { + set doit 1 + # gcc-specific checking removed here + + if { $doit } { + verbose -log "Testing $nshort, $flags" 1 + dg-test $test $flags "" + cleanup-modules "" + } + } + } +} diff --git a/gcc/testsuite/lib/sarif.exp b/gcc/testsuite/lib/sarif.exp new file mode 100644 index 00000000000..925d2787017 --- /dev/null +++ b/gcc/testsuite/lib/sarif.exp @@ -0,0 +1,36 @@ +# Copyright (C) 2003-2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# FIXME: copied from gfortran.exp + +# This file is just 'sed -e 's/77/fortran/g' \ +# -e 's/f2c/gfortran' g77.exp > gfortran.exp' +# +# with some minor modifications to make it work. + +# +# sarif support library routines +# +load_lib prune.exp +load_lib gcc-defs.exp +load_lib timeout.exp +load_lib target-libpath.exp +load_lib target-supports.exp + +proc sarif_target_compile { source dest type options } { + set return_val [target_compile $source $dest $type $options] + return $return_val; +} diff --git a/gcc/testsuite/sarif/bad-eval-with-code-flow.py b/gcc/testsuite/sarif/bad-eval-with-code-flow.py new file mode 100644 index 00000000000..e72d8de48a5 --- /dev/null +++ b/gcc/testsuite/sarif/bad-eval-with-code-flow.py @@ -0,0 +1,10 @@ +# Taken from https://github.com/microsoft/sarif-tutorials +# samples/3-Beyond-basics/bad-eval-with-code-flow.py +# which is licensed under MIT License. + +print("Hello, world!") +expr = input("Expression> ") +use_input(expr) + +def use_input(raw_input): + print(eval(raw_input)) diff --git a/gcc/testsuite/sarif/escaped-braces.sarif b/gcc/testsuite/sarif/escaped-braces.sarif new file mode 100644 index 00000000000..8374a94b835 --- /dev/null +++ b/gcc/testsuite/sarif/escaped-braces.sarif @@ -0,0 +1,19 @@ +{ + "version": "2.1.0", + "runs": [{ + "tool": { "driver": { "name": "example" } }, + "results": [ + { "message": { "text" : "before open '{{' after open" }, + "locations": []}, + { "message": { "text" : "before close '}}' after close" }, + "locations": []} + ] + }] +} + +/* { dg-begin-multiline-output "" } +sarif-replay: warning: before open '{' after open +sarif-replay: warning: before close '}' after close + { dg-end-multiline-output "" } */ + +// TODO: lose the "sarif-replay: " prefixes diff --git a/gcc/testsuite/sarif/invalid-json-array-missing-comma.sarif b/gcc/testsuite/sarif/invalid-json-array-missing-comma.sarif new file mode 100644 index 00000000000..0f32d38420e --- /dev/null +++ b/gcc/testsuite/sarif/invalid-json-array-missing-comma.sarif @@ -0,0 +1,6 @@ +[ "foo", "bar" "baz"] // { dg-error "expected ',' or '\]'; got string" } + +{ dg-begin-multiline-output "" } + 1 | [ "foo", "bar" "baz"] + | ^~~~~ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/sarif/invalid-json-array-with-trailing-comma.sarif b/gcc/testsuite/sarif/invalid-json-array-with-trailing-comma.sarif new file mode 100644 index 00000000000..05b74a81efc --- /dev/null +++ b/gcc/testsuite/sarif/invalid-json-array-with-trailing-comma.sarif @@ -0,0 +1,6 @@ +[ 0, 1, 2, ] /* { dg-error "expected a JSON value but got '\\\]'" } */ + +{ dg-begin-multiline-output "" } + 1 | [ 0, 1, 2, ] + | ^ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/sarif/invalid-json-bad-token.sarif b/gcc/testsuite/sarif/invalid-json-bad-token.sarif new file mode 100644 index 00000000000..7756eef1add --- /dev/null +++ b/gcc/testsuite/sarif/invalid-json-bad-token.sarif @@ -0,0 +1,6 @@ + not a valid JSON file // { dg-error "invalid JSON token: unexpected character: 'n'" } + +{ dg-begin-multiline-output "" } + 1 | not a valid JSON file + | ^ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/sarif/invalid-json-object-missing-comma.sarif b/gcc/testsuite/sarif/invalid-json-object-missing-comma.sarif new file mode 100644 index 00000000000..9d2bf9476b1 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-json-object-missing-comma.sarif @@ -0,0 +1,7 @@ +{ "foo": "bar" + "baz": 42 } // { dg-error "expected ',' or '\}'; got string" } + +{ dg-begin-multiline-output "" } + 2 | "baz": 42 } + | ^~~~~ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/sarif/invalid-json-object-with-trailing-comma.sarif b/gcc/testsuite/sarif/invalid-json-object-with-trailing-comma.sarif new file mode 100644 index 00000000000..e1aae9b350c --- /dev/null +++ b/gcc/testsuite/sarif/invalid-json-object-with-trailing-comma.sarif @@ -0,0 +1,6 @@ +{ "foo": "bar", } /* { dg-error "expected string for object key after ','; got '\\\}'" } */ + +{ dg-begin-multiline-output "" } + 1 | { "foo": "bar", } + | ^ +{ dg-end-multiline-output "" } diff --git a/gcc/testsuite/sarif/invalid-sarif-bad-runs.sarif b/gcc/testsuite/sarif/invalid-sarif-bad-runs.sarif new file mode 100644 index 00000000000..c5a26f05516 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-bad-runs.sarif @@ -0,0 +1,7 @@ +{ "version": "2.1.0", + "runs": 42 } // { dg-error "expected sarifLog.runs to be 'null' or an array \\\[SARIF v2.1.0 §3.13.4\\\]" } + +/* { dg-begin-multiline-output "" } + 2 | "runs": 42 } + | ^~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-missing-arguments-for-placeholders.sarif b/gcc/testsuite/sarif/invalid-sarif-missing-arguments-for-placeholders.sarif new file mode 100644 index 00000000000..c4354d4ef23 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-missing-arguments-for-placeholders.sarif @@ -0,0 +1,14 @@ +{ + "version": "2.1.0", + "runs": [{ + "tool": { "driver": { "name": "example" } }, + "results": [ + { "message": { "text" : "the {0} {1} fox jumps over the {2} dog" } } /* { dg-error "message string contains placeholder '\\{0\\}' but message object has no 'arguments' property \\\[SARIF v2.1.0 §3.11.11\\\]" } */ + ] + }] +} + +/* { dg-begin-multiline-output "" } + 6 | { "message": { "text" : "the {0} {1} fox jumps over the {2} dog" } } + | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-no-runs.sarif b/gcc/testsuite/sarif/invalid-sarif-no-runs.sarif new file mode 100644 index 00000000000..f142321642c --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-no-runs.sarif @@ -0,0 +1,6 @@ +{ "version": "2.1.0" } // { dg-error "expected sarifLog object to have a 'runs' property \\\[SARIF v2.1.0 §3.13.4\\\]" } + +/* { dg-begin-multiline-output "" } + 1 | { "version": "2.1.0" } + | ^~~~~~~~~~~~~~~~~~~~~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-no-version.sarif b/gcc/testsuite/sarif/invalid-sarif-no-version.sarif new file mode 100644 index 00000000000..771bd9c0c05 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-no-version.sarif @@ -0,0 +1,6 @@ +{ } // { dg-error "expected sarifLog object to have a 'version' property \\\[SARIF v2.1.0 §3.13.2\\\]" } + +/* { dg-begin-multiline-output "" } + 1 | { } + | ^~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-non-object-in-runs.sarif b/gcc/testsuite/sarif/invalid-sarif-non-object-in-runs.sarif new file mode 100644 index 00000000000..4eeaaaa7b24 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-non-object-in-runs.sarif @@ -0,0 +1,7 @@ +{ "version": "2.1.0", + "runs" : [42] } // { dg-error "expected element of sarifLog.runs array to be an object \\\[SARIF v2.1.0 §3.13.4\\\]" } + +/* { dg-begin-multiline-output "" } + 2 | "runs" : [42] } + | ^~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-not-an-object.sarif b/gcc/testsuite/sarif/invalid-sarif-not-an-object.sarif new file mode 100644 index 00000000000..4743bad3ba3 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-not-an-object.sarif @@ -0,0 +1,6 @@ +[ null ] // { dg-error "expected a sarifLog object as the top-level value \\\[SARIF v2.1.0 §3.1\\\]" } + +/* { dg-begin-multiline-output "" } + 1 | [ null ] + | ^~~~~~~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-not-enough-arguments-for-placeholders.sarif b/gcc/testsuite/sarif/invalid-sarif-not-enough-arguments-for-placeholders.sarif new file mode 100644 index 00000000000..e3eb5341110 --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-not-enough-arguments-for-placeholders.sarif @@ -0,0 +1,14 @@ +{ + "version": "2.1.0", + "runs": [{ + "tool": { "driver": { "name": "example" } }, + "results": [ + { "message": { "text" : "the {0} {1} fox jumps over the {2} dog", "arguments": ["quick", "brown"] } } /* { dg-error "not enough strings in 'arguments' array for placeholder '\\{2\\}' \\\[SARIF v2.1.0 §3.11.11\\\]" } */ + ] + }] +} + +/* { dg-begin-multiline-output "" } + 6 | { "message": { "text" : "the {0} {1} fox jumps over the {2} dog", "arguments": ["quick", "brown"] } } + | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/invalid-sarif-version-not-a-string.sarif b/gcc/testsuite/sarif/invalid-sarif-version-not-a-string.sarif new file mode 100644 index 00000000000..0ffeb13626e --- /dev/null +++ b/gcc/testsuite/sarif/invalid-sarif-version-not-a-string.sarif @@ -0,0 +1,6 @@ +{ "version" : 42 } // { dg-error "expected sarifLog.version to be a string \\\[SARIF v2.1.0 §3.13.2\\\]" } + +/* { dg-begin-multiline-output "" } + 1 | { "version" : 42 } + | ^~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/malformed-placeholder.sarif b/gcc/testsuite/sarif/malformed-placeholder.sarif new file mode 100644 index 00000000000..72da185de0d --- /dev/null +++ b/gcc/testsuite/sarif/malformed-placeholder.sarif @@ -0,0 +1,15 @@ +{ + "version": "2.1.0", + "runs": [{ + "tool": { "driver": { "name": "example" } }, + "results": [ + { "message": { "text" : "before {} after" }, /* { dg-error "unescaped '\\\{' within message string \\\[SARIF v2.1.0 §3.11.11\\\]" } */ + "locations": [] } + ] + }] +} + +/* { dg-begin-multiline-output "" } + 6 | { "message": { "text" : "before {} after" }, + | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + { dg-end-multiline-output "" } */ diff --git a/gcc/testsuite/sarif/null-runs.sarif b/gcc/testsuite/sarif/null-runs.sarif new file mode 100644 index 00000000000..5fc630eecb3 --- /dev/null +++ b/gcc/testsuite/sarif/null-runs.sarif @@ -0,0 +1,2 @@ +{ "version": "2.1.0", + "runs": null } diff --git a/gcc/testsuite/sarif/roundtrip-signal-1.c.sarif b/gcc/testsuite/sarif/roundtrip-signal-1.c.sarif new file mode 100644 index 00000000000..86f85383b89 --- /dev/null +++ b/gcc/testsuite/sarif/roundtrip-signal-1.c.sarif @@ -0,0 +1,398 @@ +/* { dg-options "-fdiagnostics-format=sarif-file -fallow-comments" } */ + +{ + "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", + "runs": [ + { + "results": [ + { + "level": "warning", + "ruleId": "-Wanalyzer-unsafe-call-within-signal-handler", + "locations": [ + { + "logicalLocations": [ + { + "decoratedName": "custom_logger", + "kind": "function", + "name": "custom_logger", + "fullyQualifiedName": "custom_logger" + } + ], + "physicalLocation": { + "contextRegion": { + "startLine": 13, + "snippet": { + "text": " fprintf(stderr, \"LOG: %s\", msg);" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 13, + "endColumn": 34, + "startColumn": 3 + } + } + } + ], + "message": { + "text": "call to \u2018fprintf\u2019 from within signal handler" + }, + "taxa": [ + { + "id": "479", + "toolComponent": { + "name": "cwe" + } + } + ], + "codeFlows": [ + { + "threadFlows": [ + { + "locations": [ + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "main", + "kind": "function", + "name": "main", + "fullyQualifiedName": "main" + } + ], + "message": { + "text": "entry to \u2018main\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 21, + "snippet": { + "text": "int main(int argc, const char *argv)\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 21, + "endColumn": 9, + "startColumn": 5 + } + } + }, + "kinds": [ + "enter", + "function" + ] + }, + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "main", + "kind": "function", + "name": "main", + "fullyQualifiedName": "main" + } + ], + "message": { + "text": "registering \u2018handler\u2019 as signal handler" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 25, + "snippet": { + "text": " signal(SIGINT, handler);\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 25, + "endColumn": 26, + "startColumn": 3 + } + } + } + }, + { + "nestingLevel": 0, + "location": { + "message": { + "text": "later on, when the signal is delivered to the process" + } + } + }, + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "handler", + "kind": "function", + "name": "handler", + "fullyQualifiedName": "handler" + } + ], + "message": { + "text": "entry to \u2018handler\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 16, + "snippet": { + "text": "static void handler(int signum)\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 16, + "endColumn": 20, + "startColumn": 13 + } + } + }, + "kinds": [ + "enter", + "function" + ] + }, + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "handler", + "kind": "function", + "name": "handler", + "fullyQualifiedName": "handler" + } + ], + "message": { + "text": "calling \u2018custom_logger\u2019 from \u2018handler\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 18, + "snippet": { + "text": " custom_logger(\"got signal\");\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 18, + "endColumn": 30, + "startColumn": 3 + } + } + }, + "kinds": [ + "call", + "function" + ] + }, + { + "nestingLevel": 2, + "location": { + "logicalLocations": [ + { + "decoratedName": "custom_logger", + "kind": "function", + "name": "custom_logger", + "fullyQualifiedName": "custom_logger" + } + ], + "message": { + "text": "entry to \u2018custom_logger\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 11, + "snippet": { + "text": "void custom_logger(const char *msg)\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 11, + "endColumn": 19, + "startColumn": 6 + } + } + }, + "kinds": [ + "enter", + "function" + ] + }, + { + "nestingLevel": 2, + "location": { + "logicalLocations": [ + { + "decoratedName": "custom_logger", + "kind": "function", + "name": "custom_logger", + "fullyQualifiedName": "custom_logger" + } + ], + "message": { + "text": "call to \u2018fprintf\u2019 from within signal handler" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 13, + "snippet": { + "text": " fprintf(stderr, \"LOG: %s\", msg);\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 13, + "endColumn": 34, + "startColumn": 3 + } + } + }, + "kinds": [ + "danger" + ] + } + ] + } + ] + } + ] + } + ], + "artifacts": [ + { + "location": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "sourceLanguage": "c", + "contents": { + "text": "/* Example of a bad call within a signal handler.\n 'handler' calls 'custom_logger' which calls 'fprintf', and 'fprintf' is\n not allowed from a signal handler. */\n\n\n#include \n#include \n\nextern void body_of_program(void);\n\nvoid custom_logger(const char *msg)\n{\n fprintf(stderr, \"LOG: %s\", msg);\n}\n\nstatic void handler(int signum)\n{\n custom_logger(\"got signal\");\n}\n\nint main(int argc, const char *argv)\n{\n custom_logger(\"started\");\n\n signal(SIGINT, handler);\n\n body_of_program();\n\n custom_logger(\"stopped\");\n\n return 0;\n}\n" + } + } + ], + "tool": { + "driver": { + "fullName": "placeholder value for driver.fullName", + "name": "GNU C17", + "rules": [ + { + "id": "-Wanalyzer-unsafe-call-within-signal-handler", + "helpUri": "https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-unsafe-call-within-signal-handler" + } + ], + "informationUri": "https://gcc.gnu.org/gcc-13/", + "version": "placeholder value for driver.version" + } + }, + "originalUriBaseIds": { + "PWD": { + "uri": "file:///home/david/coding/gcc-newgit-serialization/build/gcc/" + } + }, + "taxonomies": [ + { + "organization": "MITRE", + "name": "CWE", + "version": "4.7", + "shortDescription": { + "text": "The MITRE Common Weakness Enumeration" + }, + "taxa": [ + { + "id": "479", + "helpUri": "https://cwe.mitre.org/data/definitions/479.html" + } + ] + } + ] + } + ], + "version": "2.1.0" +} + +/* Verify that some JSON was written to a file with the expected name; + ideally we want as much of the data as possible to survive the round trip. + + The indentation here reflects the expected hierarchy, though these tests + don't check for that, merely the string fragments we expect. + + { dg-final { scan-sarif-file "\"version\": \"2.1.0\"" } } + { dg-final { scan-sarif-file "\"runs\": \\\[" } } + { dg-final { scan-sarif-file "\"artifacts\": \\\[" } } + { dg-final { scan-sarif-file "\"location\": " } } + { dg-final { scan-sarif-file "\"uri\": " } } + + { dg-final { scan-sarif-file "\"sourceLanguage\": \"c\"" } } + + { dg-final { scan-sarif-file "\"contents\": " { xfail *-*-* } } } + { dg-final { scan-sarif-file "\"text\": " } } + { dg-final { scan-sarif-file "\"tool\": " } } + { dg-final { scan-sarif-file "\"driver\": " } } + { dg-final { scan-sarif-file "\"name\": \"GNU C17\"" } } + { dg-final { scan-sarif-file "\"fullName\": \"placeholder value for driver.fullName\"" } } + { dg-final { scan-sarif-file "\"informationUri\": \"https://gcc.gnu.org/gcc-13/\"" } } + { dg-final { scan-sarif-file "\"version\": \"placeholder value for driver.version\"" } } + { dg-final { scan-sarif-file "\"results\": \\\[" } } + { dg-final { scan-sarif-file "\"level\": \"warning\"" } } + { dg-final { scan-sarif-file "\"ruleId\": \"-Wanalyzer-unsafe-call-within-signal-handler\"" } } + { dg-final { scan-sarif-file "\"locations\": \\\[" } } + { dg-final { scan-sarif-file "\"physicalLocation\": " } } + { dg-final { scan-sarif-file "\"contextRegion\": " } } + { dg-final { scan-sarif-file "\"artifactLocation\": " } } + { dg-final { scan-sarif-file "\"region\": " } } + { dg-final { scan-sarif-file "\"startLine\": 13" } } + { dg-final { scan-sarif-file "\"startColumn\": 3" } } + { dg-final { scan-sarif-file "\"endColumn\": 34" } } + + { dg-final { scan-sarif-file "\"logicalLocations\": " } } + { dg-final { scan-sarif-file "\"decoratedName\": \"custom_logger\"" } } + { dg-final { scan-sarif-file "\"kind\": \"function\"" } } + { dg-final { scan-sarif-file "\"name\": \"custom_logger\"" } } + { dg-final { scan-sarif-file "\"fullyQualifiedName\": \"custom_logger\"" } } + + { dg-final { scan-sarif-file "\"message\": " } } + { dg-final { scan-sarif-file "\"text\": \"call to \\u2018fprintf\\u2019 from within signal handler\"" } } + + { dg-final { scan-sarif-file "\"codeFlows\": \\\[" } } + { dg-final { scan-sarif-file "\"threadFlows\": \\\[" } } + { dg-final { scan-sarif-file "\"nestingLevel\": 1" } } + { dg-final { scan-sarif-file "\"kinds\": \\\[\"enter\", \"function\"\\\]" { xfail *-*-* } } } + + { dg-final { scan-sarif-file "\"nestingLevel\": 2" } } + { dg-final { scan-sarif-file "\"kinds\": \\\[\"danger\"\\\]" { xfail *-*-* } } } + +*/ + +// TODO: fix the xfails +// TODO: verify logical locations within the path +// TODO: verify physical locations within the path +// TODO: verify taxa +// TODO: verify message text +// etc diff --git a/gcc/testsuite/sarif/sarif.exp b/gcc/testsuite/sarif/sarif.exp new file mode 100644 index 00000000000..dcb1eb2bd58 --- /dev/null +++ b/gcc/testsuite/sarif/sarif.exp @@ -0,0 +1,50 @@ +# Copyright (C) 2004-2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# GCC testsuite that uses the `dg.exp' driver. + +# Load support procs. +load_lib sarif-dg.exp + +#load_lib dg.exp +#load_lib prune.exp +#load_lib target-supports.exp +#load_lib gcc-defs.exp +#load_lib timeout.exp +#load_lib target-libpath.exp +#load_lib gcc.exp +#load_lib g++.exp +#load_lib dejagnu.exp +#load_lib prune.exp +#load_lib gcc-defs.exp +#load_lib timeout.exp +#load_lib target-libpath.exp +#load_lib target-supports.exp +#load_lib gcc-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_SARIF_FLAGS +if ![info exists DEFAULT_SARIF_FLAGS] then { + set DEFAULT_SARIF_FLAGS " -fallow-comments" +} +# Initialize `dg'. +dg-init + +dg-runtest [lsort \ + [glob -nocomplain $srcdir/$subdir/*.sarif ] ] "" $DEFAULT_SARIF_FLAGS + +# All done. +dg-finish diff --git a/gcc/testsuite/sarif/signal-1.c.sarif b/gcc/testsuite/sarif/signal-1.c.sarif new file mode 100644 index 00000000000..a87ebb261d9 --- /dev/null +++ b/gcc/testsuite/sarif/signal-1.c.sarif @@ -0,0 +1,362 @@ +{ + "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", + "runs": [ + { + "results": [ + { + "level": "warning", + "ruleId": "-Wanalyzer-unsafe-call-within-signal-handler", + "locations": [ + { + "logicalLocations": [ + { + "decoratedName": "custom_logger", + "kind": "function", + "name": "custom_logger", + "fullyQualifiedName": "custom_logger" + } + ], + "physicalLocation": { + "contextRegion": { + "startLine": 13, + "snippet": { + "text": " fprintf(stderr, \"LOG: %s\", msg);" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 13, + "endColumn": 34, + "startColumn": 3 + } + } + } + ], + "message": { + "text": "call to \u2018fprintf\u2019 from within signal handler" + }, + "taxa": [ + { + "id": "479", + "toolComponent": { + "name": "cwe" + } + } + ], + "codeFlows": [ + { + "threadFlows": [ + { + "locations": [ + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "main", + "kind": "function", + "name": "main", + "fullyQualifiedName": "main" + } + ], + "message": { + "text": "entry to \u2018main\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 21, + "snippet": { + "text": "int main(int argc, const char *argv)\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 21, + "endColumn": 9, + "startColumn": 5 + } + } + }, + "kinds": [ + "enter", + "function" + ] + }, + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "main", + "kind": "function", + "name": "main", + "fullyQualifiedName": "main" + } + ], + "message": { + "text": "registering \u2018handler\u2019 as signal handler" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 25, + "snippet": { + "text": " signal(SIGINT, handler);\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 25, + "endColumn": 26, + "startColumn": 3 + } + } + } + }, + { + "nestingLevel": 0, + "location": { + "message": { + "text": "later on, when the signal is delivered to the process" + } + } + }, + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "handler", + "kind": "function", + "name": "handler", + "fullyQualifiedName": "handler" + } + ], + "message": { + "text": "entry to \u2018handler\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 16, + "snippet": { + "text": "static void handler(int signum)\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 16, + "endColumn": 20, + "startColumn": 13 + } + } + }, + "kinds": [ + "enter", + "function" + ] + }, + { + "nestingLevel": 1, + "location": { + "logicalLocations": [ + { + "decoratedName": "handler", + "kind": "function", + "name": "handler", + "fullyQualifiedName": "handler" + } + ], + "message": { + "text": "calling \u2018custom_logger\u2019 from \u2018handler\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 18, + "snippet": { + "text": " custom_logger(\"got signal\");\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 18, + "endColumn": 30, + "startColumn": 3 + } + } + }, + "kinds": [ + "call", + "function" + ] + }, + { + "nestingLevel": 2, + "location": { + "logicalLocations": [ + { + "decoratedName": "custom_logger", + "kind": "function", + "name": "custom_logger", + "fullyQualifiedName": "custom_logger" + } + ], + "message": { + "text": "entry to \u2018custom_logger\u2019" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 11, + "snippet": { + "text": "void custom_logger(const char *msg)\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 11, + "endColumn": 19, + "startColumn": 6 + } + } + }, + "kinds": [ + "enter", + "function" + ] + }, + { + "nestingLevel": 2, + "location": { + "logicalLocations": [ + { + "decoratedName": "custom_logger", + "kind": "function", + "name": "custom_logger", + "fullyQualifiedName": "custom_logger" + } + ], + "message": { + "text": "call to \u2018fprintf\u2019 from within signal handler" + }, + "physicalLocation": { + "contextRegion": { + "startLine": 13, + "snippet": { + "text": " fprintf(stderr, \"LOG: %s\", msg);\n" + } + }, + "artifactLocation": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "region": { + "startLine": 13, + "endColumn": 34, + "startColumn": 3 + } + } + }, + "kinds": [ + "danger" + ] + } + ] + } + ] + } + ] + } + ], + "artifacts": [ + { + "location": { + "uri": "../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c", + "uriBaseId": "PWD" + }, + "sourceLanguage": "c", + "contents": { + "text": "/* Example of a bad call within a signal handler.\n 'handler' calls 'custom_logger' which calls 'fprintf', and 'fprintf' is\n not allowed from a signal handler. */\n\n\n#include \n#include \n\nextern void body_of_program(void);\n\nvoid custom_logger(const char *msg)\n{\n fprintf(stderr, \"LOG: %s\", msg);\n}\n\nstatic void handler(int signum)\n{\n custom_logger(\"got signal\");\n}\n\nint main(int argc, const char *argv)\n{\n custom_logger(\"started\");\n\n signal(SIGINT, handler);\n\n body_of_program();\n\n custom_logger(\"stopped\");\n\n return 0;\n}\n" + } + } + ], + "tool": { + "driver": { + "fullName": "some full name goes here", + "name": "GNU C17", + "rules": [ + { + "id": "-Wanalyzer-unsafe-call-within-signal-handler", + "helpUri": "https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-unsafe-call-within-signal-handler" + } + ], + "informationUri": "https://gcc.gnu.org/gcc-13/", + "version": "13.0.0 20220601" + } + }, + "originalUriBaseIds": { + "PWD": { + "uri": "file:///home/david/coding/gcc-newgit-serialization/build/gcc/" + } + }, + "taxonomies": [ + { + "organization": "MITRE", + "name": "CWE", + "version": "4.7", + "shortDescription": { + "text": "The MITRE Common Weakness Enumeration" + }, + "taxa": [ + { + "id": "479", + "helpUri": "https://cwe.mitre.org/data/definitions/479.html" + } + ] + } + ] + } + ], + "version": "2.1.0" +} + +/* { dg-begin-multiline-output "" } +../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c: In function 'custom_logger': +../../src/gcc/testsuite/gcc.dg/analyzer/signal-1.c:13:3: warning: call to ‘fprintf’ from within signal handler [-Wanalyzer-unsafe-call-within-signal-handler] + 'main': events 1-2 + | + |...... + | + event 3 + | + |sarif-replay: + | (3): later on, when the signal is delivered to the process + | + +--> 'handler': events 4-5 + | + | + +--> 'custom_logger': events 6-7 + | + | + { dg-end-multiline-output "" } */ + +// TODO: fixup the src location +// TODO: quote the source code +// TODO: CWE +// TODO: event messages +// TODO: etc diff --git a/gcc/testsuite/sarif/spec-example-1.sarif b/gcc/testsuite/sarif/spec-example-1.sarif new file mode 100644 index 00000000000..97f409f4aa4 --- /dev/null +++ b/gcc/testsuite/sarif/spec-example-1.sarif @@ -0,0 +1,15 @@ +// Taken from SARIF v2.1.0, Appendix K.1: "Minimal valid SARIF log file" +{ + "version": "2.1.0", + "runs": [ + { + "tool": { + "driver": { + "name": "CodeScanner" + } + }, + "results": [ + ] + } + ] +} diff --git a/gcc/testsuite/sarif/spec-example-2.sarif b/gcc/testsuite/sarif/spec-example-2.sarif new file mode 100644 index 00000000000..352622dd5a5 --- /dev/null +++ b/gcc/testsuite/sarif/spec-example-2.sarif @@ -0,0 +1,74 @@ +/* Taken from SARIF v2.1.0, Appendix K.2: "Minimal recommended SARIF log + file with source information". */ + +{ + "version": "2.1.0", + "runs": [ + { + "tool": { + "driver": { + "name": "CodeScanner", + "rules": [ + { + "id": "C2001", + "fullDescription": { + "text": "A variable was used without being initialized. This can result in runtime errors such as null reference exceptions." + }, + "messageStrings": { + "default": { + "text": "Variable \"{0}\" was used without being initialized." + } + } + } + ] + } + }, + "artifacts": [ + { + "location": { + "uri": "src/collections/list.cpp", + "uriBaseId": "SRCROOT" + }, + "sourceLanguage": "c" + } + ], + "results": [ + { + "ruleId": "C2001", + "ruleIndex": 0, + "message": { + "id": "default", + "arguments": [ + "count" + ] + }, + "locations": [ + { + "physicalLocation": { + "artifactLocation": { + "uri": "src/collections/list.cpp", + "uriBaseId": "SRCROOT", + "index": 0 + }, + "region": { + "startLine": 15 + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add" + } + ] + } + ] + } + ] + } + ] +} + +/* { dg-begin-multiline-output "" } +src/collections/list.cpp:15:1: warning: Variable "count" was used without being initialized. [C2001] + { dg-end-multiline-output "" } */ + +// TODO: logical location diff --git a/gcc/testsuite/sarif/spec-example-3.sarif b/gcc/testsuite/sarif/spec-example-3.sarif new file mode 100644 index 00000000000..0a0018928e1 --- /dev/null +++ b/gcc/testsuite/sarif/spec-example-3.sarif @@ -0,0 +1,67 @@ +/* Taken from SARIF v2.1.0, Appendix K.3: "Minimal recommended SARIF log + file without source information". */ + +{ + "version": "2.1.0", + "runs": [ + { + "tool": { + "driver": { + "name": "BinaryScanner" + } + }, + "artifact": [ + { + "location": { + "uri": "bin/example", + "uriBaseId": "BINROOT" + } + } + ], + "logicalLocations": [ + { + "name": "Example", + "kind": "namespace" + }, + { + "name": "Worker", + "fullyQualifiedName": "Example.Worker", + "kind": "type", + "parentIndex": 0 + }, + { + "name": "DoWork", + "fullyQualifiedName": "Example.Worker.DoWork", + "kind": "function", + "parentIndex": 1 + } + ], + "results": [ + { + "ruleId": "B6412", + "message": { + "text": "The insecure method \"Crypto.Sha1.Encrypt\" should not be used." + }, + "level": "warning", + "locations": [ + { + "logicalLocations": [ + { + "fullyQualifiedName": "Example.Worker.DoWork", + "index": 2 + } + ] + } + ] + } + ] + } + ] +} + +/* { dg-begin-multiline-output "" } +In function 'Example.Worker.DoWork': +sarif-replay: warning: The insecure method "Crypto.Sha1.Encrypt" should not be used. [B6412] + { dg-end-multiline-output "" } */ + +// TODO: the "sarif-replay: " prefix is unhelpful diff --git a/gcc/testsuite/sarif/spec-example-4.sarif b/gcc/testsuite/sarif/spec-example-4.sarif new file mode 100644 index 00000000000..6680d270905 --- /dev/null +++ b/gcc/testsuite/sarif/spec-example-4.sarif @@ -0,0 +1,758 @@ +/* Taken from SARIF v2.1.0, Appendix K.4: "Comprehensive SARIF file". */ + +{ + "version": "2.1.0", + "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", + "runs": [ + { + "automationId": { + "guid": "BC650830-A9FE-44CB-8818-AD6C387279A0", + "id": "Nightly code scan/2018-10-08" + }, + "baselineGuid": "0A106451-C9B1-4309-A7EE-06988B95F723", + "runAggregates": [ + { + "id": "Build/14.0.1.2/Release/20160716-13:22:18", + "correlationGuid": "26F138B6-6014-4D3D-B174-6E1ACE9439F3" + } + ], + "tool": { + "driver": { + "name": "CodeScanner", + "fullName": "CodeScanner 1.1 for Microsoft Windows (R) (en-US)", + "version": "2.1", + "semanticVersion": "2.1.0", + "dottedQuadFileVersion": "2.1.0.0", + "releaseDateUtc": "2019-03-17", + "organization": "Example Corporation", + "product": "Code Scanner", + "productSuite": "Code Quality Tools", + "shortDescription": { + "text": "A scanner for code." + }, + "fullDescription": { + "text": "A really great scanner for all your code." + }, + "properties": { + "copyright": "Copyright (c) 2017 by Example Corporation." + }, + "globalMessageStrings": { + "variableDeclared": { + "text": "Variable \"{0}\" was declared here.", + "markdown": " Variable `{0}` was declared here." + } + }, + "rules": [ + { + "id": "C2001", + "deprecatedIds": [ + "CA2000" + ], + "defaultConfiguration": { + "level": "error", + "rank": 95 + }, + "shortDescription": { + "text": "A variable was used without being initialized." + }, + "fullDescription": { + "text": "A variable was used without being initialized. This can result in runtime errors such as null reference exceptions." + }, + "messageStrings": { + "default": { + "text": "Variable \"{0}\" was used without being initialized. It was declared [here]({1}).", + "markdown": "Variable `{0}` was used without being initialized. It was declared [here]({1})." + } + } + } + ], + "notifications": [ + { + "id": "start", + "shortDescription": { + "text": "The run started." + }, + "messageStrings": { + "default": { + "text": "Run started." + } + } + }, + { + "id": "end", + "shortDescription": { + "text": "The run ended." + }, + "messageStrings": { + "default": { + "text": "Run ended." + } + } + } + ], + "language": "en-US" + }, + "extensions": [ + { + "name": "CodeScanner Security Rules", + "version": "3.1", + "rules": [ + { + "id": "S0001", + "defaultConfiguration": { + "level": "error" + }, + "shortDescription": { + "text": "Do not use weak cryptographic algorithms." + }, + "messageStrings": { + "default": { + "text": "The cryptographic algorithm '{0}' should not be used." + } + } + } + ] + } + ] + }, + "language": "en-US", + "versionControlProvenance": [ + { + "repositoryUri": "https://github.com/example-corp/browser", + "revisionId": "5da53fbb2a0aaa12d648b73984acc9aac2e11c2a", + "mappedTo": { + "uriBaseId": "PROJECTROOT" + } + } + ], + "originalUriBaseIds": { + "PROJECTROOT": { + "uri": "file://build.example.com/work/" + }, + "SRCROOT": { + "uri": " src/", + "uriBaseId": "PROJECTROOT" + }, + "BINROOT": { + "uri": " bin/", + "uriBaseId": "PROJECTROOT" + } + }, + "invocations": [ + { + "commandLine": "CodeScanner @build/collections.rsp", + "responseFiles": [ + { + "uri": "build/collections.rsp", + "uriBaseId": "SRCROOT", + "index": 0 + } + ], + "startTimeUtc": "2016-07-16T14:18:25Z", + "endTimeUtc": "2016-07-16T14:19:01Z", + "machine": "BLD01", + "account": "buildAgent", + "processId": 1218, + "fileName": "/bin/tools/CodeScanner", + "workingDirectory": { + "uri": "file:///home/buildAgent/src" + }, + "environmentVariables": { + "PATH": "/usr/local/bin:/bin:/bin/tools:/home/buildAgent/bin", + "HOME": "/home/buildAgent", + "TZ": "EST" + }, + "toolConfigurationNotifications": [ + { + "descriptor": { + "id": "UnknownRule" + }, + "associatedRule": { + "ruleId": "ABC0001" + }, + "level": "warning", + "message": { + "text": "Could not disable rule \"ABC0001\" because there is no rule with that id." + } + } + ], + "toolExecutionNotifications": [ + { + "descriptor": { + "id": "CTN0001" + }, + "level": "note", + "message": { + "text": "Run started." + } + }, + { + "descriptor": { + "id": "CTN9999" + }, + "associatedRule": { + "id": "C2001", + "index": 0 + }, + "level": "error", + "message": { + "text": "Exception evaluating rule \"C2001\". Rule disabled; run continues." + }, + "locations": [ + { + "physicalLocation": { + "artifactLocation": { + "uri": "crypto/hash.cpp", + "uriBaseId": "SRCROOT", + "index": 4 + } + } + } + ], + "threadId": 52, + "timeUtc": "2016-07-16T14:18:43.119Z", + "exception": { + "kind": "ExecutionEngine.RuleFailureException", + "message": "Unhandled exception during rule evaluation.", + "stack": { + "frames": [ + { + "location": { + "message": { + "text": "Exception thrown" + }, + "logicalLocations": [ + { + "fullyQualifiedName": + "Rules.SecureHashAlgorithmRule.Evaluate" + } + ], + "physicalLocation": { + "address": { + "offset": 4244988 + } + } + }, + "module": "RuleLibrary", + "threadId": 52 + }, + { + "location": { + "logicalLocations": [ + { + "fullyQualifiedName": + "ExecutionEngine.Engine.EvaluateRule" + } + ], + "physicalLocation": { + "address": { + "offset": 4245514 + } + } + }, + "module": "ExecutionEngine", + "threadId": 52 + } + ] + }, + "innerExceptions": [ + { + "kind": "System.ArgumentException", + "message": "length is < 0" + } + ] + } + }, + { + "descriptor": { + "id": "CTN0002" + }, + "level": "note", + "message": { + "text": "Run ended." + } + } + ], + "exitCode": 0, + "executionSuccessful": true + } + ], + "artifacts": [ + { + "location": { + "uri": "build/collections.rsp", + "uriBaseId": "SRCROOT" + }, + "mimeType": "text/plain", + "length": 81, + "contents": { + "text": "-input src/collections/*.cpp -log out/collections.sarif -rules all -disable C9999" + } + }, + { + "location": { + "uri": "application/main.cpp", + "uriBaseId": "SRCROOT" + }, + "sourceLanguage": "cplusplus", + "length": 1742, + "hashes": { + "sha-256": "cc8e6a99f3eff00adc649fee132ba80fe333ea5a" + } + }, + { + "location": { + "uri": "collections/list.cpp", + "uriBaseId": "SRCROOT" + }, + "sourceLanguage": "cplusplus", + "length": 980, + "hashes": { + "sha-256": "b13ce2678a8807ba0765ab94a0ecd394f869bc81" + } + }, + { + "location": { + "uri": "collections/list.h", + "uriBaseId": "SRCROOT" + }, + "sourceLanguage": "cplusplus", + "length": 24656, + "hashes": { + "sha-256": "849be119aaba4e9f88921a99e3036fb6c2a8144a" + } + }, + { + "location": { + "uri": "crypto/hash.cpp", + "uriBaseId": "SRCROOT" + }, + "sourceLanguage": "cplusplus", + "length": 1424, + "hashes": { + "sha-256": "3ffe2b77dz255cdf95f97d986d7a6ad8f287eaed" + } + }, + { + "location": { + "uri": "app.zip", + "uriBaseId": "BINROOT" + }, + "mimeType": "application/zip", + "length": 310450, + "hashes": { + "sha-256": "df18a5e74b6b46ddaa23ad7271ee2b7c5731cbe1" + } + }, + { + "location": { + "uri": "/docs/intro.docx" + }, + "mimeType": + "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + "parentIndex": 5, + "offset": 17522, + "length": 4050 + } + ], + "logicalLocations": [ + { + "name": "add", + "fullyQualifiedName": "collections::list::add", + "decoratedName": "?add@list@collections@@QAEXH@Z", + "kind": "function", + "parentIndex": 1 + }, + { + "name": "list", + "fullyQualifiedName": "collections::list", + "kind": "type", + "parentIndex": 2 + }, + { + "name": "collections", + "kind": "namespace" + }, + { + "name": "add_core", + "fullyQualfiedName": "collections::list::add_core", + "decoratedName": "?add_core@list@collections@@QAEXH@Z", + "kind": "function", + "parentIndex": 1 + }, + { + "fullyQualifiedName": "main", + "kind": "function" + } + ], + "results": [ + { + "ruleId": "C2001", + "ruleIndex": 0, + "kind": "fail", + "level": "error", + "message": { + "id": "default", + "arguments": [ + "ptr", + "0" + ] + }, + "suppressions": [ + { + "kind": "external", + "status": "accepted" + } + ], + "baselineState": "unchanged", + "rank": 95, + "analysisTarget": { + "uri": "collections/list.cpp", + "uriBaseId": "SRCROOT", + "index": 2 + }, + "locations": [ + { + "physicalLocation": { + "artifactLocation": { + "uri": "collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 15, + "startColumn": 9, + "endLine": 15, + "endColumn": 10, + "charLength": 1, + "charOffset": 254, + "snippet": { + "text": "add_core(ptr, offset, val);\n return;" + } + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add", + "index": 0 + } + ] + } + ], + "relatedLocations": [ + { + "id": 0, + "message": { + "id": "variableDeclared", + "arguments": [ + "ptr" + ] + }, + "physicalLocation": { + "artifactLocation": { + "uri": "collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 8, + "startColumn": 5 + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add", + "index": 0 + } + ] + } + ], + "codeFlows": [ + { + "message": { + "text": "Path from declaration to usage" + }, + + "threadFlows": [ + { + "id": "thread-52", + "locations": [ + { + "importance": "essential", + "location": { + "message": { + "text": "Variable \"ptr\" declared.", + "markdown": "Variable `ptr` declared." + }, + "physicalLocation": { + "artifactLocation": { + "uri":"collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 15, + "snippet": { + "text": "int *ptr;" + } + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add", + "index": 0 + } + ] + }, + "module": "platform" + }, + { + "state": { + "y": { + "text": "2" + }, + "z": { + "text": "4" + }, + "y + z": { + "text": "6" + }, + "q": { + "text": "7" + } + }, + "importance": "unimportant", + "location": { + "physicalLocation": { + "artifactLocation": { + "uri":"collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 15, + "snippet": { + "text": "offset = (y + z) * q + 1;" + } + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add", + "index": 0 + } + ], + "annotations": [ + { + "startLine": 15, + "startColumn": 13, + "endColumn": 19, + "message": { + "text": "(y + z) = 42", + "markdown": "`(y + z) = 42`" + } + } + ] + }, + "module": "platform" + }, + { + "importance": "essential", + "location": { + "message": { + "text": "Uninitialized variable \"ptr\" passed to method \"add_core\".", + "markdown": "Uninitialized variable `ptr` passed to method `add_core`." + }, + "physicalLocation": { + "artifactLocation": { + "uri":"collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 25, + "snippet": { + "text": "add_core(ptr, offset, val)" + } + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add", + "index": 0 + } + ] + }, + "module": "platform" + } + ] + } + ] + } + ], + "stacks": [ + { + "message": { + "text": "Call stack resulting from usage of uninitialized variable." + }, + "frames": [ + { + "location": { + "message": { + "text": "Exception thrown." + }, + "physicalLocation": { + "artifactLocation": { + "uri": "collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 110, + "startColumn": 15 + }, + "address": { + "offset": 4229178 + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add_core", + "index": 0 + } + ] + }, + "module": "platform", + "threadId": 52, + "parameters": [ "null", "0", "14" ] + }, + { + "location": { + "physicalLocation": { + "artifactLocation": { + "uri": "collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "region": { + "startLine": 43, + "startColumn": 15 + }, + "address": { + "offset": 4229268 + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "collections::list::add", + "index": 0 + } + ] + }, + "module": "platform", + "threadId": 52, + "parameters": [ "14" ] + }, + { + "location": { + "physicalLocation": { + "artifactLocation": { + "uri": "application/main.cpp", + "uriBaseId": "SRCROOT", + "index": 1 + }, + "region": { + "startLine": 28, + "startColumn": 9 + }, + "address": { + "offset": 4229836 + } + }, + "logicalLocations": [ + { + "fullyQualifiedName": "main", + "index": 4 + } + ] + }, + "module": "application", + "threadId": 52 + } + ] + } + ], + "addresses": [ + { + "baseAddress": 4194304, + "fullyQualifiedName": "collections.dll", + "kind": "module", + "section": ".text" + }, + { + "offset": 100, + "fullyQualifiedName": "collections.dll!collections::list::add", + "kind": "function", + "parentIndex": 0 + }, + { + "offset": 22, + "fullyQualifiedName": "collections.dll!collections::list::add+0x16", + "parentIndex": 1 + } + ], + "fixes": [ + { + "description": { + "text": "Initialize the variable to null" + }, + "artifactChanges": [ + { + "artifactLocation": { + "uri": "collections/list.h", + "uriBaseId": "SRCROOT", + "index": 3 + }, + "replacements": [ + { + "deletedRegion": { + "startLine": 42 + }, + "insertedContent": { + "text": "A different line\n" + } + } + ] + } + ] + } + ], + "hostedViewerUri": + "https://www.example.com/viewer/3918d370-c636-40d8-bf23-8c176043a2df", + "workItemUris": [ + "https://github.com/example/project/issues/42", + "https://github.com/example/project/issues/54" + ], + "provenance": { + "firstDetectionTimeUtc": "2016-07-15T14:20:42Z", + "firstDetectionRunGuid": "8F62D8A0-C14F-4516-9959-1A663BA6FB99", + "lastDetectionTimeUtc": "2016-07-16T14:20:42Z", + "lastDetectionRunGuid": "BC650830-A9FE-44CB-8818-AD6C387279A0", + "invocationIndex": 0 + } + } + ] + } + ] +} + +/* { dg-begin-multiline-output "" } +collections/list.h: In function 'collections::list::add': +collections/list.h:15:9: error: Variable "ptr" was used without being initialized. It was declared [here](0). [C2001] + 'collections::list::add': events 1-3 + | + |...... + | + { dg-end-multiline-output "" } */ + +// TODO: what's up with the events? diff --git a/gcc/testsuite/sarif/tutorial-example-foo.sarif b/gcc/testsuite/sarif/tutorial-example-foo.sarif new file mode 100644 index 00000000000..8fa37ad4b42 --- /dev/null +++ b/gcc/testsuite/sarif/tutorial-example-foo.sarif @@ -0,0 +1,117 @@ +/* Adapted from https://github.com/microsoft/sarif-tutorials + samples/bad-eval-with-code-flow.sarif. + which is licensed under the Creative Commons Attribution 4.0 International Public License + and/or the MIT License. */ + +{ + "version": "2.1.0", + "runs": [ + { + "tool": { + "driver": { + "name": "PythonScanner" + } + }, + "results": [ + { + "ruleId": "PY2335", + "message": { + "text": "Use of tainted variable 'raw_input' in the insecure function 'eval'." + }, + "locations": [ + { + "physicalLocation": { + "artifactLocation": { + "uri": "bad-eval-with-code-flow.py" + }, + "region": { + "startLine": 8 + } + } + } + ], + "codeFlows": [ + { + "message": { + "text": "Tracing the path from user input to insecure usage." + }, + "threadFlows": [ + { + "locations": [ + { + "location": { + "physicalLocation": { + "artifactLocation": { + "uri": "bad-eval-with-code-flow.py" + }, + "region": { + "startLine": 3 + } + } + }, + "state": { + "expr": { + "text": "undef" + } + }, + "nestingLevel": 0 + }, + { + "location": { + "physicalLocation": { + "artifactLocation": { + "uri": "bad-eval-with-code-flow.py" + }, + "region": { + "startLine": 4 + } + } + }, + "state": { + "expr": { + "text": "42" + } + }, + "nestingLevel": 0 + }, + { + "location": { + "physicalLocation": { + "artifactLocation": { + "uri": "bad-eval-with-code-flow.py" + }, + "region": { + "startLine": 38 + } + } + }, + "state": { + "raw_input": { + "text": "42" + } + }, + "nestingLevel": 1 + } + ] + } + ] + } + ] + } + ] + } + ] +} + +/* { dg-begin-multiline-output "" } +bad-eval-with-code-flow.py:8:1: warning: Use of tainted variable 'raw_input' in the insecure function 'eval'. [PY2335] + events 1-2 + | + | + +--> event 3 + | + | + { dg-end-multiline-output "" } */ + +// TODO: logical locations? +// TODO: fix showing the source code From patchwork Wed Jun 22 22:34:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646811 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=d6RthJyS; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSz0s6THsz9sGp for ; Thu, 23 Jun 2022 08:42:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AD531383066C for ; Wed, 22 Jun 2022 22:42:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AD531383066C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937755; bh=HNJfhi2w1vxZuF7ZHs3xUJ2tXDFwxOhZA2rIimAZXEE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=d6RthJySvp5yEM+3ykUjQcJgB7JzyJJjjlJ05RcLmDdmi1iZEzslSdb3QdyR/h9Jv 3CdPzKH+37fVKPpSEqTeWpE+rpxH+sz7fZKCO57D7TSImR0FFKEiPnn8iHnz7dH6cC dJB1HBYgwHelmJ9v9303Z3cs/oeslhpQ+9avBwEA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 69FB138303C0 for ; Wed, 22 Jun 2022 22:34:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 69FB138303C0 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-656-ZNflUyk2MI6y9LvcHdpgZg-1; Wed, 22 Jun 2022 18:34:51 -0400 X-MC-Unique: ZNflUyk2MI6y9LvcHdpgZg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C0E2018188A2 for ; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E50E1121314; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 11/12] Fixups to diagnostic-format-sarif.cc Date: Wed, 22 Jun 2022 18:34:46 -0400 Message-Id: <20220622223447.2462880-12-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" I believe these are needed by one of the rule-handling patches. gcc/ChangeLog: * diagnostic-format-sarif.cc (sarif_builder::sarif_builder): Defer population of m_driver_obj until... (sarif_builder::make_tool_object): ...here. (sarif_builder::make_driver_tool_component_object): Replace with... (sarif_builder::populate_driver_object): ...this. Signed-off-by: David Malcolm --- gcc/diagnostic-format-sarif.cc | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index a409abf648b..9c304fd8e49 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -158,7 +158,7 @@ private: json::object *make_top_level_object (json::array *results); json::object *make_run_object (json::array *results); json::object *make_tool_object (); - sarif_tool_component *make_driver_tool_component_object () const; + void populate_driver_object () const; json::array *maybe_make_taxonomies_array () const; json::object *maybe_make_cwe_taxonomy_object () const; json::object *make_tool_component_reference_object_for_cwe () const; @@ -289,7 +289,7 @@ sarif_builder::sarif_builder (diagnostic_context *context) m_extensions_arr (NULL), m_tabstop (context->tabstop) { - m_driver_obj = make_driver_tool_component_object (); + m_driver_obj = new sarif_tool_component (); m_extensions_arr = new json::array (); } @@ -1256,6 +1256,7 @@ sarif_builder::make_tool_object () /* "driver" property (SARIF v2.1.0 section 3.18.2). */ tool_obj->set ("driver", m_driver_obj); + populate_driver_object (); /* Report plugins via the "extensions" property (SARIF v2.1.0 section 3.18.3). */ @@ -1288,42 +1289,40 @@ sarif_builder::make_tool_object () return tool_obj; } -/* Make a toolComponent object (SARIF v2.1.0 section 3.19) for what SARIF - calls the "driver" (see SARIF v2.1.0 section 3.18.1). */ +/* Populate the toolComponent object (SARIF v2.1.0 section 3.19) for what SARIF + calls the "driver" (see SARIF v2.1.0 section 3.18.1). + We delay this to ensure that the m_client_data_hooks is set up (e.g. for + roundtripping from SARIF to SARIF). */ -sarif_tool_component * -sarif_builder::make_driver_tool_component_object () const +void +sarif_builder::populate_driver_object () const { - sarif_tool_component *driver_obj = new sarif_tool_component (); - if (m_context->m_client_data_hooks) if (const client_version_info *vinfo = m_context->m_client_data_hooks->get_any_version_info ()) { /* "name" property (SARIF v2.1.0 section 3.19.8). */ if (const char *name = vinfo->get_tool_name ()) - driver_obj->set ("name", new json::string (name)); + m_driver_obj->set ("name", new json::string (name)); /* "fullName" property (SARIF v2.1.0 section 3.19.9). */ if (char *full_name = vinfo->maybe_make_full_name ()) { - driver_obj->set ("fullName", new json::string (full_name)); + m_driver_obj->set ("fullName", new json::string (full_name)); free (full_name); } /* "version" property (SARIF v2.1.0 section 3.19.13). */ if (const char *version = vinfo->get_version_string ()) - driver_obj->set ("version", new json::string (version)); + m_driver_obj->set ("version", new json::string (version)); /* "informationUri" property (SARIF v2.1.0 section 3.19.17). */ if (char *version_url = vinfo->maybe_make_version_url ()) { - driver_obj->set ("informationUri", new json::string (version_url)); + m_driver_obj->set ("informationUri", new json::string (version_url)); free (version_url); } } - - return driver_obj; } /* If we've seen any CWE IDs, make an array for the "taxonomies" property From patchwork Wed Jun 22 22:34:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 1646813 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=sDKWMJlD; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LSz346K5rz9sGp for ; Thu, 23 Jun 2022 08:44:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AC40B38845FF for ; Wed, 22 Jun 2022 22:44:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AC40B38845FF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655937870; bh=0+b8DZ0ZPSZTN+HoxcvZ61Q//23qHXhFxMJE1r5pFNw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=sDKWMJlDbtBD9fSWgxx66YTAA9DcMz4rL8VBuevBFpj2H1kw3EJv9ycgIgeI5boej CorBAB4Z9D1MnSr3JJ3Djay89xgh1WIKU4/G2LAdNF4f99A2477t6gS7xs2XR7oOoW y90dskH/FnKUioPc5k6u27fMmU4gfi2TKIsnhNFI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id BC06E3830665 for ; Wed, 22 Jun 2022 22:34:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BC06E3830665 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-643-LP0CHJxkOD6v3M6qohK8_g-1; Wed, 22 Jun 2022 18:34:51 -0400 X-MC-Unique: LP0CHJxkOD6v3M6qohK8_g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EFDF31C06903 for ; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id CFBED1121314; Wed, 22 Jun 2022 22:34:50 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH 12/12] Work-in-progress of path remapping Date: Wed, 22 Jun 2022 18:34:47 -0400 Message-Id: <20220622223447.2462880-13-dmalcolm@redhat.com> In-Reply-To: <20220622223447.2462880-1-dmalcolm@redhat.com> References: <20220622223447.2462880-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_FILL_THIS_FORM_SHORT, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This work-in-progress hacks up the file_cache code in input.cc so that it can have an optional path remapper, which can map e.g. paths in a .sarif file to paths relative to that .sarif file, so that the sarif-replayer can locate and display sources. gcc/ChangeLog: * input.cc (file_cache::get_file_path): Replace with... (file_cache::get_original_file_path): ...this, and... (file_cache::get_remapped_file_path): ...this. (file_cache::create): Split "file_path" param into "original_file_path" and "remapped_file_path". (file_cache::m_file_path): Replace with... (file_cache::m_original_file_path): ...this, and... (file_cache::m_remapped_file_path): ...this. (total_lines_num): Rename file_path to original_file_path. (file_cache::lookup_file): Rename file_path to remapped_file_path; update cache lookup to use remapped file path. (file_cache::remap_file): New. (file_cache::set_path_remapper): New. (diagnostics_file_cache_set_path_remapper): New. (file_cache_slot::evict): Update for split into a pair of paths. (file_cache::evicted_cache_tab_entry): Likewise. (file_cache::add_file): Likewise. (file_cache_slot::create): Likewise. (file_cache::file_cache): Initialize m_remapper (file_cache::~file_cache): Delete m_remapper. (file_cache::lookup_or_add_file): Remap the file path. (file_cache_slot::file_cache_slot): Update for changes to fields. (file_cache_slot::~file_cache_slot): Free the m_remapped_file_path. * input.h (class path_remapper): New. (file_cache::set_path_remapper): New decl. (file_cache::lookup_file): Update decl. (file_cache::add_file): Update decl. (file_cache::remap_file): New decl. (file_cache::m_remapper): New field. (diagnostics_file_cache_set_path_remapper): New decl. * sarif/sarif-replay.cc (class sarif_path_remapper): New class. (sarif_replayer::emit_sarif_as_diagnostics): Use it. gcc/testsuite/ChangeLog: * sarif/tutorial-example-foo.sarif: Fix the line numbers. Update the expected multiline output to show the source code. Signed-off-by: David Malcolm --- gcc/input.cc | 107 +++++++++++++----- gcc/input.h | 18 ++- gcc/sarif/sarif-replay.cc | 39 +++++++ .../sarif/tutorial-example-foo.sarif | 25 +++- 4 files changed, 155 insertions(+), 34 deletions(-) diff --git a/gcc/input.cc b/gcc/input.cc index 2acbfdea4f8..a78b949faa5 100644 --- a/gcc/input.cc +++ b/gcc/input.cc @@ -55,7 +55,8 @@ public: char ** line, ssize_t *line_len); /* Accessors. */ - const char *get_file_path () const { return m_file_path; } + const char *get_original_file_path () const { return m_original_file_path; } + const char *get_remapped_file_path () const { return m_remapped_file_path; } unsigned get_use_count () const { return m_use_count; } bool missing_trailing_newline_p () const { @@ -65,7 +66,10 @@ public: void inc_use_count () { m_use_count++; } bool create (const file_cache::input_context &in_context, - const char *file_path, FILE *fp, unsigned highest_use_count); + const char *original_file_path, + char *remapped_file_path, + FILE *fp, + unsigned highest_use_count); void evict (); private: @@ -112,11 +116,14 @@ public: array. */ unsigned m_use_count; - /* The file_path is the key for identifying a particular file in + /* The m_original_file_path is the key for identifying a particular file in the cache. For libcpp-using code, the underlying buffer for this field is owned by the corresponding _cpp_file within the cpp_reader. */ - const char *m_file_path; + const char *m_original_file_path; + + // FIXME: + char *m_remapped_file_path; FILE *m_fp; @@ -310,11 +317,11 @@ diagnostic_file_cache_fini (void) equals the actual number of lines of the file. */ static size_t -total_lines_num (const char *file_path) +total_lines_num (const char *original_file_path) { size_t r = 0; location_t l = 0; - if (linemap_get_file_highest_location (line_table, file_path, &l)) + if (linemap_get_file_highest_location (line_table, original_file_path, &l)) { gcc_assert (l >= RESERVED_LOCATION_COUNT); expanded_location xloc = expand_location (l); @@ -328,16 +335,17 @@ total_lines_num (const char *file_path) cached file was found. */ file_cache_slot * -file_cache::lookup_file (const char *file_path) +file_cache::lookup_file (const char *remapped_file_path) { - gcc_assert (file_path); + gcc_assert (remapped_file_path); /* This will contain the found cached file. */ file_cache_slot *r = NULL; for (unsigned i = 0; i < num_file_slots; ++i) { file_cache_slot *c = &m_file_slots[i]; - if (c->get_file_path () && !strcmp (c->get_file_path (), file_path)) + if (c->get_remapped_file_path () + && !strcmp (c->get_remapped_file_path (), remapped_file_path)) { c->inc_use_count (); r = c; @@ -350,6 +358,27 @@ file_cache::lookup_file (const char *file_path) return r; } +// FIXME + +char * +file_cache::remap_file (const char *file_path) const +{ + if (m_remapper) + return m_remapper->remap_file (file_path); + else + return xstrdup (file_path); + // FIXME: this is probably being called too much +} + +// FIXME: + +void +file_cache::set_path_remapper (path_remapper *remapper) +{ + delete m_remapper; + m_remapper = remapper; +} + /* Purge any mention of FILENAME from the cache of files used for printing source code. For use in selftests when working with tempfiles. */ @@ -365,6 +394,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path) global_dc->m_file_cache->forcibly_evict_file (file_path); } +// FIXME: + +void +diagnostics_file_cache_set_path_remapper (path_remapper *remapper) +{ + diagnostic_file_cache_init (); + global_dc->m_file_cache->set_path_remapper (remapper); +} + void file_cache::forcibly_evict_file (const char *file_path) { @@ -381,7 +419,9 @@ file_cache::forcibly_evict_file (const char *file_path) void file_cache_slot::evict () { - m_file_path = NULL; + m_original_file_path = NULL; + free (m_remapped_file_path); + m_remapped_file_path = NULL; if (m_fp) fclose (m_fp); m_fp = NULL; @@ -409,10 +449,10 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count) for (unsigned i = 1; i < num_file_slots; ++i) { file_cache_slot *c = &m_file_slots[i]; - bool c_is_empty = (c->get_file_path () == NULL); + bool c_is_empty = (c->get_original_file_path () == NULL); if (c->get_use_count () < to_evict->get_use_count () - || (to_evict->get_file_path () && c_is_empty)) + || (to_evict->get_original_file_path () && c_is_empty)) /* We evict C because it's either an entry with a lower use count or one that is empty. */ to_evict = c; @@ -432,6 +472,7 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count) return to_evict; } +// FIXME: /* Create the cache used for the content of a given file to be accessed by caret diagnostic. This cache is added to an array of cache and can be retrieved by lookup_file_in_cache_tab. This @@ -439,29 +480,35 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count) num_file_slots files are cached. */ file_cache_slot* -file_cache::add_file (const char *file_path) +file_cache::add_file (const char *original_file_path, + char *remapped_file_path) { - - FILE *fp = fopen (file_path, "r"); + FILE *fp = fopen (remapped_file_path, "r"); if (fp == NULL) return NULL; unsigned highest_use_count = 0; file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count); - if (!r->create (in_context, file_path, fp, highest_use_count)) + if (!r->create (in_context, original_file_path, remapped_file_path, + fp, highest_use_count)) return NULL; return r; } +// FIXME: /* Populate this slot for use on FILE_PATH and FP, dropping any existing cached content within it. */ +// FIXME: take ownership of REMAPPED_FILE_PATH. bool file_cache_slot::create (const file_cache::input_context &in_context, - const char *file_path, FILE *fp, + const char *original_file_path, + char *remapped_file_path, + FILE *fp, unsigned highest_use_count) { - m_file_path = file_path; + m_original_file_path = original_file_path; + m_remapped_file_path = remapped_file_path; if (m_fp) fclose (m_fp); m_fp = fp; @@ -474,19 +521,20 @@ file_cache_slot::create (const file_cache::input_context &in_context, /* Ensure that this cache entry doesn't get evicted next time add_file_to_cache_tab is called. */ m_use_count = ++highest_use_count; - m_total_lines = total_lines_num (file_path); + m_total_lines = total_lines_num (original_file_path); m_missing_trailing_newline = true; + // FIXME: which file_path should we be using below? /* Check the input configuration to determine if we need to do any transformations, such as charset conversion or BOM skipping. */ - if (const char *input_charset = in_context.ccb (file_path)) + if (const char *input_charset = in_context.ccb (original_file_path)) { /* Need a full-blown conversion of the input charset. */ fclose (m_fp); m_fp = NULL; const cpp_converted_source cs - = cpp_get_converted_source (file_path, input_charset); + = cpp_get_converted_source (original_file_path, input_charset); if (!cs.data) return false; if (m_data) @@ -511,7 +559,8 @@ file_cache_slot::create (const file_cache::input_context &in_context, /* file_cache's ctor. */ file_cache::file_cache () -: m_file_slots (new file_cache_slot[num_file_slots]) +: m_file_slots (new file_cache_slot[num_file_slots]), + m_remapper (NULL) { initialize_input_context (nullptr, false); } @@ -521,6 +570,7 @@ file_cache::file_cache () file_cache::~file_cache () { delete[] m_file_slots; + delete m_remapper; } /* Lookup the cache used for the content of a given file accessed by @@ -529,11 +579,14 @@ file_cache::~file_cache () it. */ file_cache_slot* -file_cache::lookup_or_add_file (const char *file_path) +file_cache::lookup_or_add_file (const char *original_file_path) { - file_cache_slot *r = lookup_file (file_path); + char *remapped_file_path = remap_file (original_file_path); + file_cache_slot *r = lookup_file (remapped_file_path); if (r == NULL) - r = add_file (file_path); + r = add_file (original_file_path, remapped_file_path); + else + free (remapped_file_path); return r; } @@ -541,7 +594,8 @@ file_cache::lookup_or_add_file (const char *file_path) diagnostic. */ file_cache_slot::file_cache_slot () -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0), +: m_use_count (0), m_original_file_path (NULL), m_remapped_file_path (NULL), + m_fp (NULL), m_data (0), m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0), m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true) { @@ -552,6 +606,7 @@ file_cache_slot::file_cache_slot () file_cache_slot::~file_cache_slot () { + free (m_remapped_file_path); if (m_fp) { fclose (m_fp); diff --git a/gcc/input.h b/gcc/input.h index f1ae3aec95c..8539317c513 100644 --- a/gcc/input.h +++ b/gcc/input.h @@ -118,6 +118,13 @@ extern bool location_missing_trailing_newline (const char *file_path); need to be in this header. */ class file_cache_slot; +class path_remapper +{ +public: + virtual ~path_remapper () {} + virtual char *remap_file (const char *file_path) const = 0; +}; + /* A cache of source files for use when emitting diagnostics (and in a few places in the C/C++ frontends). @@ -145,15 +152,20 @@ class file_cache void initialize_input_context (diagnostic_input_charset_callback ccb, bool should_skip_bom); + void set_path_remapper (path_remapper *remapper); + private: file_cache_slot *evicted_cache_tab_entry (unsigned *highest_use_count); - file_cache_slot *add_file (const char *file_path); - file_cache_slot *lookup_file (const char *file_path); + file_cache_slot *add_file (const char *original_file_path, + char *remapped_file_path); + file_cache_slot *lookup_file (const char *remapped_file_path); + char *remap_file (const char *file_path) const; private: static const size_t num_file_slots = 16; file_cache_slot *m_file_slots; input_context in_context; + path_remapper *m_remapper; }; extern expanded_location @@ -246,6 +258,8 @@ void diagnostics_file_cache_fini (void); void diagnostics_file_cache_forcibly_evict_file (const char *file_path); +void diagnostics_file_cache_set_path_remapper (path_remapper *remapper); + class GTY(()) string_concat { public: diff --git a/gcc/sarif/sarif-replay.cc b/gcc/sarif/sarif-replay.cc index 2d5c58ead1e..b8f9c152879 100644 --- a/gcc/sarif/sarif-replay.cc +++ b/gcc/sarif/sarif-replay.cc @@ -597,6 +597,37 @@ sarif_replayer::~sarif_replayer () delete iter.second; } +// FIXME: + +class sarif_path_remapper : public path_remapper +{ +public: + sarif_path_remapper (const char *sarif_filename) + { + /* Get the directory containing SARIF_FILENAME. */ + size_t overall_len = strlen (sarif_filename); + const char *last_comp = basename (sarif_filename); + gcc_assert (last_comp); + size_t dir_len = last_comp - sarif_filename; + m_dir = (char *)xmalloc (dir_len + 1); + memcpy (m_dir, sarif_filename, dir_len); + m_dir[dir_len] = '\0'; + } + ~sarif_path_remapper () + { + free (m_dir); + } + + char *remap_file (const char *file_path) const final override + { + // TODO: what about absolute FILE_PATH ? + return concat (m_dir, file_path, NULL); + } + +private: + char *m_dir; +}; + /* Perform one pass of replay of the output file. Pass 0 captures the source locations of interest, so that we can generate line_maps. @@ -607,6 +638,14 @@ sarif_replayer::emit_sarif_as_diagnostics (json::value *jv, int pass) { m_pass = pass; + /* Remap paths on 2nd pass: that way, errors in the SARIF file get + reported directly, whereas replayed diagnostics get remapped. */ + if (pass == 1) + { + diagnostics_file_cache_set_path_remapper + (new sarif_path_remapper (m_filename)); + } + /* We expect a sarifLog object as the top-level value (SARIF v2.1.0 section 3.13). */ json::object *toplev_obj = dyn_cast (jv); diff --git a/gcc/testsuite/sarif/tutorial-example-foo.sarif b/gcc/testsuite/sarif/tutorial-example-foo.sarif index 8fa37ad4b42..d73bbd3b93e 100644 --- a/gcc/testsuite/sarif/tutorial-example-foo.sarif +++ b/gcc/testsuite/sarif/tutorial-example-foo.sarif @@ -25,7 +25,7 @@ "uri": "bad-eval-with-code-flow.py" }, "region": { - "startLine": 8 + "startLine": 10 } } } @@ -45,7 +45,7 @@ "uri": "bad-eval-with-code-flow.py" }, "region": { - "startLine": 3 + "startLine": 5 } } }, @@ -63,7 +63,7 @@ "uri": "bad-eval-with-code-flow.py" }, "region": { - "startLine": 4 + "startLine": 6 } } }, @@ -81,7 +81,7 @@ "uri": "bad-eval-with-code-flow.py" }, "region": { - "startLine": 38 + "startLine": 10 } } }, @@ -104,14 +104,27 @@ } /* { dg-begin-multiline-output "" } -bad-eval-with-code-flow.py:8:1: warning: Use of tainted variable 'raw_input' in the insecure function 'eval'. [PY2335] +bad-eval-with-code-flow.py:10:1: warning: Use of tainted variable 'raw_input' in the insecure function 'eval'. [PY2335] + 10 | print(eval(raw_input)) + | ^ events 1-2 | + | 5 | print("Hello, world!") + | | ^ + | | | + | | (1) + | 6 | expr = input("Expression> ") + | | ~ + | | | + | | (2) | +--> event 3 | + | 10 | print(eval(raw_input)) + | | ^ + | | | + | | (3) | { dg-end-multiline-output "" } */ // TODO: logical locations? -// TODO: fix showing the source code