From aff7f677120ec394adcedd0dd5cc3afa3b5be102 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Thu, 29 Aug 2024 18:48:32 -0400 Subject: [PATCH] =?UTF-8?q?SARIF=20output:=20implement=20embedded=20URLs?= =?UTF-8?q?=20in=20messages=20(=C2=A73.11.6;=20PR=20other/116419)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GCC diagnostic messages can contain URLs, such as to our documentation when we suggest an option name to correct a misspelling. SARIF message strings can contain embedded URLs in the plain text messages (see SARIF v2.1.0 §3.11.6), but previously we were simply dropping any URLs from the diagnostic messages. This patch adds support for encoding URLs into messages in our SARIF output, using the pp_token machinery added in the previous patch. As well as supporting URLs, the patch also adjusts how we report event IDs in SARIF message, so that rather than e.g. "text": "second 'free' here; first 'free' was at (1)" we now report: "text": "second 'free' here; first 'free' was at [(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)" i.e. the text "(1)" now has a embedded link referring within the sarif log to the threadFlowLocation object for the other event, via JSON pointer (see §3.10.3 "URIs that use the sarif scheme"). Doing so requires the arious objects to know their index within their containing array, requiring some reworking of how they are constructed. gcc/ChangeLog: PR other/116419 * diagnostic-event-id.h (diagnostic_event_id_t::zero_based): New. * diagnostic-format-sarif.cc: Include "pretty-print-format-impl.h" and "pretty-print-urlifier.h". (sarif_result::sarif_result): Add param "idx_within_parent". (sarif_result::get_index_within_parent): New accessor. (sarif_result::m_idx_within_parent): New field. (sarif_code_flow::sarif_code_flow): New ctor. (sarif_code_flow::get_parent): New accessor. (sarif_code_flow::get_index_within_parent): New accessor. (sarif_code_flow::m_parent): New field. (sarif_code_flow::m_thread_id_map): New field. (sarif_code_flow::m_thread_flows_arr): New field. (sarif_code_flow::m_all_tfl_objs): New field. (sarif_thread_flow::sarif_thread_flow): Add "parent" and "idx_within_parent" params. (sarif_thread_flow::get_parent): New accessor. (sarif_thread_flow::get_index_within_parent): New accessor. (sarif_thread_flow::m_parent): New field. (sarif_thread_flow::m_idx_within_parent): New field. (sarif_thread_flow_location::sarif_thread_flow_location): New ctor. (sarif_thread_flow_location::get_parent): New accessor. (sarif_thread_flow_location::get_index_within_parent): New accessor. (sarif_thread_flow_location::m_parent): New field. (sarif_thread_flow_location::m_idx_within_parent): New field. (sarif_builder::get_code_flow_for_event_ids): New accessor. (class sarif_builder::sarif_token_printer): New. (sarif_builder::m_token_printer): New member. (sarif_builder::m_next_result_idx): New field. (sarif_builder::m_current_code_flow): New field. (sarif_code_flow::get_or_append_thread_flow): New. (sarif_code_flow::get_thread_flow): New. (sarif_code_flow::add_location): New. (sarif_code_flow::get_thread_flow_loc_obj): New. (sarif_thread_flow::add_location): Create the new sarif_thread_flow_location internally, rather than passing it in as a parm so that we can keep track of its index in the array. Return a reference to it. (sarif_builder::sarif_builder): Initialize m_token_printer, m_next_result_idx, and m_current_code_flow. (sarif_builder::on_report_diagnostic): Pass index to make_result_object. (sarif_builder::make_result_object): Add "idx_within_parent" param and pass to sarif_result ctor. Pass code flow index to call to make_code_flow_object. (make_sarif_url_for_event): New. (sarif_builder::make_code_flow_object): Add "idx_within_parent" param and pass it to sarif_code_flow ctor. Reimplement walking of events so that we first create threadFlow objects for each thread, then populate them with threadFlowLocation objects, so that the IDs work. Set m_current_code_flow whilst creating the latter, so that we can create correct URIs for "%@". (sarif_builder::make_thread_flow_location_object): Replace with... (sarif_builder::populate_thread_flow_location_object): ...this. (sarif_output_format::get_builder): New accessor. (sarif_begin_embedded_link): New. (sarif_end_embedded_link): New. (sarif_builder::sarif_token_printer::print_tokens): New. (diagnostic_output_format_init_sarif): Add "fmt" param; use it to set the token printer and output format for the context. (diagnostic_output_format_init_sarif_stderr): Move responsibility for setting the context's output format to within diagnostic_output_format_init_sarif. (diagnostic_output_format_init_sarif_file): Likewise. (diagnostic_output_format_init_sarif_stream): Likewise. (test_sarif_diagnostic_context::test_sarif_diagnostic_context): Likewise. (selftest::test_make_location_object): Provide an idx for the result. (selftest::get_result_from_log): New. (selftest::get_message_from_log): New. (selftest::test_message_with_embedded_link): New test. (selftest::diagnostic_format_sarif_cc_tests): Call it. * pretty-print-format-impl.h: Include "diagnostic-event-id.h". (pp_token::kind): Add "event_id". (struct pp_token_event_id): New. (is_a_helper ::test): New. (is_a_helper ::test): New. * pretty-print.cc (pp_token::dump): Handle kind::event_id. (pretty_printer::format): Update handling of "%@" in phase 2 so that we add a pp_token_event_id, rather that the text "(N)". (default_token_printer): Handle pp_token::kind::event_id by printing the text "(N)". gcc/testsuite/ChangeLog: PR other/116419 * gcc.dg/sarif-output/bad-pragma.c: New test. * gcc.dg/sarif-output/test-bad-pragma.py: New test. * gcc.dg/sarif-output/test-include-chain-2.py (test_location_relationships): Update expected text of event to include an intra-sarif URI to the other event. Signed-off-by: David Malcolm --- gcc/diagnostic-event-id.h | 6 + gcc/diagnostic-format-sarif.cc | 607 +++++++++++++++--- gcc/pretty-print-format-impl.h | 32 + gcc/pretty-print.cc | 29 +- .../gcc.dg/sarif-output/bad-pragma.c | 16 + .../gcc.dg/sarif-output/test-bad-pragma.py | 38 ++ .../sarif-output/test-include-chain-2.py | 6 +- 7 files changed, 637 insertions(+), 97 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c create mode 100644 gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py diff --git a/gcc/diagnostic-event-id.h b/gcc/diagnostic-event-id.h index 78c2ccbbc99..8237ba34df3 100644 --- a/gcc/diagnostic-event-id.h +++ b/gcc/diagnostic-event-id.h @@ -41,6 +41,12 @@ class diagnostic_event_id_t bool known_p () const { return m_index != UNKNOWN_EVENT_IDX; } + int zero_based () const + { + gcc_assert (known_p ()); + return m_index; + } + int one_based () const { gcc_assert (known_p ()); diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index 59d9cd72183..9d9e7ae6073 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3. If not see #include "selftest-diagnostic-show-locus.h" #include "selftest-json.h" #include "text-range-label.h" +#include "pretty-print-format-impl.h" +#include "pretty-print-urlifier.h" /* Forward decls. */ class sarif_builder; @@ -385,7 +387,12 @@ private: class sarif_result : public sarif_location_manager { public: - sarif_result () : m_related_locations_arr (nullptr) {} + sarif_result (unsigned idx_within_parent) + : m_related_locations_arr (nullptr), + m_idx_within_parent (idx_within_parent) + {} + + unsigned get_index_within_parent () const { return m_idx_within_parent; } void on_nested_diagnostic (diagnostic_context &context, @@ -402,6 +409,7 @@ public: private: json::array *m_related_locations_arr; // borrowed + const unsigned m_idx_within_parent; }; /* Subclass of sarif_object for SARIF "location" objects @@ -460,7 +468,39 @@ private: /* Subclass of sarif_object for SARIF "codeFlow" objects (SARIF v2.1.0 section 3.36). */ -class sarif_code_flow : public sarif_object {}; +class sarif_code_flow : public sarif_object +{ +public: + sarif_code_flow (sarif_result &parent, + unsigned idx_within_parent); + + sarif_result &get_parent () const { return m_parent; } + unsigned get_index_within_parent () const { return m_idx_within_parent; } + + sarif_thread_flow & + get_or_append_thread_flow (const diagnostic_thread &thread, + diagnostic_thread_id_t thread_id); + + sarif_thread_flow & + get_thread_flow (diagnostic_thread_id_t thread_id); + + void add_location (sarif_thread_flow_location &); + + sarif_thread_flow_location & + get_thread_flow_loc_obj (diagnostic_event_id_t event_id) const; + +private: + sarif_result &m_parent; + const unsigned m_idx_within_parent; + + hash_map, + sarif_thread_flow *> m_thread_id_map; // borrowed ptr + json::array *m_thread_flows_arr; // borrowed + + /* Vec of borrowed ptr, allowing for going easily from + an event_id to the corresponding threadFlowLocation object. */ + std::vector m_all_tfl_objs; +}; /* Subclass of sarif_object for SARIF "threadFlow" objects (SARIF v2.1.0 section 3.37). */ @@ -468,20 +508,41 @@ class sarif_code_flow : public sarif_object {}; class sarif_thread_flow : public sarif_object { public: - sarif_thread_flow (const diagnostic_thread &thread); + sarif_thread_flow (sarif_code_flow &parent, + const diagnostic_thread &thread, + unsigned idx_within_parent); - void - add_location - (std::unique_ptr thread_flow_loc_obj); + sarif_code_flow &get_parent () const { return m_parent; } + unsigned get_index_within_parent () const { return m_idx_within_parent; } + + sarif_thread_flow_location &add_location (); private: + sarif_code_flow &m_parent; json::array *m_locations_arr; // borrowed + const unsigned m_idx_within_parent; }; /* Subclass of sarif_object for SARIF "threadFlowLocation" objects (SARIF v2.1.0 section 3.38). */ -class sarif_thread_flow_location : public sarif_object {}; +class sarif_thread_flow_location : public sarif_object +{ +public: + sarif_thread_flow_location (sarif_thread_flow &parent, + unsigned idx_within_parent) + : m_parent (parent), + m_idx_within_parent (idx_within_parent) + { + } + + sarif_thread_flow &get_parent () const { return m_parent; } + unsigned get_index_within_parent () const { return m_idx_within_parent; } + +private: + sarif_thread_flow &m_parent; + const unsigned m_idx_within_parent; +}; /* Subclass of sarif_object for SARIF "reportingDescriptor" objects (SARIF v2.1.0 section 3.49). */ @@ -632,11 +693,33 @@ public: std::unique_ptr make_artifact_location_object (const char *filename); + const sarif_code_flow * + get_code_flow_for_event_ids () const + { + return m_current_code_flow; + } + + token_printer &get_token_printer () { return m_token_printer; } + private: + class sarif_token_printer : public token_printer + { + public: + sarif_token_printer (sarif_builder &builder) + : m_builder (builder) + { + } + void print_tokens (pretty_printer *pp, + const pp_token_list &tokens) final override; + private: + sarif_builder &m_builder; + }; + std::unique_ptr make_result_object (diagnostic_context &context, const diagnostic_info &diagnostic, - diagnostic_t orig_diag_kind); + diagnostic_t orig_diag_kind, + unsigned idx_within_parent); void add_any_include_chain (sarif_location_manager &loc_mgr, sarif_location &location_obj, @@ -650,11 +733,13 @@ private: enum diagnostic_artifact_role role); std::unique_ptr make_code_flow_object (sarif_result &result, + unsigned idx_within_parent, const diagnostic_path &path); - std::unique_ptr - make_thread_flow_location_object (sarif_result &result, - const diagnostic_event &event, - int path_event_idx); + void + populate_thread_flow_location_object (sarif_result &result, + sarif_thread_flow_location &thread_flow_loc_obj, + const diagnostic_event &event, + int event_execution_idx); std::unique_ptr maybe_make_kinds_array (diagnostic_event::meaning m) const; std::unique_ptr @@ -725,6 +810,7 @@ private: diagnostic_context &m_context; const line_maps *m_line_maps; + sarif_token_printer m_token_printer; /* The JSON object for the invocation object. */ std::unique_ptr m_invocation_obj; @@ -751,6 +837,9 @@ private: int m_tabstop; bool m_formatted; + + unsigned m_next_result_idx; + sarif_code_flow *m_current_code_flow; }; /* class sarif_object : public json::object. */ @@ -1294,9 +1383,67 @@ lazily_add_kind (enum location_relationship_kind kind) kinds_arr->append_string (kind_str); } +/* class sarif_code_flow : public sarif_object. */ + +sarif_code_flow::sarif_code_flow (sarif_result &parent, + unsigned idx_within_parent) +: m_parent (parent), + m_idx_within_parent (idx_within_parent) +{ + /* "threadFlows" property (SARIF v2.1.0 section 3.36.3). */ + auto thread_flows_arr = ::make_unique (); + m_thread_flows_arr = thread_flows_arr.get (); // borrowed + set ("threadFlows", std::move (thread_flows_arr)); +} + +sarif_thread_flow & +sarif_code_flow::get_or_append_thread_flow (const diagnostic_thread &thread, + diagnostic_thread_id_t thread_id) +{ + sarif_thread_flow **slot = m_thread_id_map.get (thread_id); + if (slot) + return **slot; + + unsigned next_thread_flow_idx = m_thread_flows_arr->size (); + auto thread_flow_obj + = ::make_unique (*this, thread, next_thread_flow_idx); + m_thread_id_map.put (thread_id, thread_flow_obj.get ()); // borrowed + sarif_thread_flow *result = thread_flow_obj.get (); + m_thread_flows_arr->append (std::move (thread_flow_obj)); + return *result; +} + +sarif_thread_flow & +sarif_code_flow::get_thread_flow (diagnostic_thread_id_t thread_id) +{ + sarif_thread_flow **slot = m_thread_id_map.get (thread_id); + gcc_assert (slot); // it must already have one + return **slot; +} + +void +sarif_code_flow::add_location (sarif_thread_flow_location &tfl_obj) +{ + m_all_tfl_objs.push_back (&tfl_obj); +} + +sarif_thread_flow_location & +sarif_code_flow::get_thread_flow_loc_obj (diagnostic_event_id_t event_id) const +{ + gcc_assert (event_id.known_p ()); + gcc_assert ((size_t)event_id.zero_based () < m_all_tfl_objs.size ()); + sarif_thread_flow_location *tfl_obj = m_all_tfl_objs[event_id.zero_based ()]; + gcc_assert (tfl_obj); + return *tfl_obj; +} + /* class sarif_thread_flow : public sarif_object. */ -sarif_thread_flow::sarif_thread_flow (const diagnostic_thread &thread) +sarif_thread_flow::sarif_thread_flow (sarif_code_flow &parent, + const diagnostic_thread &thread, + unsigned idx_within_parent) +: m_parent (parent), + m_idx_within_parent (idx_within_parent) { /* "id" property (SARIF v2.1.0 section 3.37.2). */ label_text name (thread.get_name (false)); @@ -1310,11 +1457,18 @@ sarif_thread_flow::sarif_thread_flow (const diagnostic_thread &thread) set ("locations", m_locations_arr); } -void -sarif_thread_flow:: -add_location (std::unique_ptr thread_flow_loc_obj) +/* Add a sarif_thread_flow_location to this threadFlow object, but + don't populate it yet. */ + +sarif_thread_flow_location & +sarif_thread_flow::add_location () { - m_locations_arr->append (std::move (thread_flow_loc_obj)); + const unsigned thread_flow_location_idx = m_locations_arr->size (); + sarif_thread_flow_location *thread_flow_loc_obj + = new sarif_thread_flow_location (*this, thread_flow_location_idx); + m_locations_arr->append (thread_flow_loc_obj); + m_parent.add_location (*thread_flow_loc_obj); + return *thread_flow_loc_obj; } /* class sarif_builder. */ @@ -1327,6 +1481,7 @@ sarif_builder::sarif_builder (diagnostic_context &context, bool formatted) : m_context (context), m_line_maps (line_maps), + m_token_printer (*this), m_invocation_obj (::make_unique (*this, context.get_original_argv ())), @@ -1336,7 +1491,9 @@ sarif_builder::sarif_builder (diagnostic_context &context, m_rule_id_set (), m_rules_arr (new json::array ()), m_tabstop (context.m_tabstop), - m_formatted (formatted) + m_formatted (formatted), + m_next_result_idx (0), + m_current_code_flow (nullptr) { gcc_assert (m_line_maps); @@ -1376,7 +1533,8 @@ sarif_builder::on_report_diagnostic (diagnostic_context &context, { /* Top-level diagnostic. */ m_cur_group_result - = make_result_object (context, diagnostic, orig_diag_kind); + = make_result_object (context, diagnostic, orig_diag_kind, + m_next_result_idx++); } } @@ -1476,9 +1634,10 @@ make_rule_id_for_diagnostic_kind (diagnostic_t diag_kind) std::unique_ptr sarif_builder::make_result_object (diagnostic_context &context, const diagnostic_info &diagnostic, - diagnostic_t orig_diag_kind) + diagnostic_t orig_diag_kind, + unsigned idx_within_parent) { - auto result_obj = ::make_unique (); + auto result_obj = ::make_unique (idx_within_parent); /* "ruleId" property (SARIF v2.1.0 section 3.27.5). */ /* Ideally we'd have an option_name for these. */ @@ -1552,8 +1711,10 @@ sarif_builder::make_result_object (diagnostic_context &context, if (const diagnostic_path *path = diagnostic.richloc->get_path ()) { auto code_flows_arr = ::make_unique (); + const unsigned code_flow_index = 0; code_flows_arr->append (make_code_flow_object (*result_obj.get (), + code_flow_index, *path)); result_obj->set ("codeFlows", std::move (code_flows_arr)); } @@ -2272,83 +2433,115 @@ make_sarif_logical_location_object (const logical_location &logical_loc) return logical_loc_obj; } +label_text +make_sarif_url_for_event (const sarif_code_flow *code_flow, + diagnostic_event_id_t event_id) +{ + gcc_assert (event_id.known_p ()); + + if (!code_flow) + return label_text (); + + const sarif_thread_flow_location &tfl_obj + = code_flow->get_thread_flow_loc_obj (event_id); + const int location_idx = tfl_obj.get_index_within_parent (); + + const sarif_thread_flow &thread_flow_obj = tfl_obj.get_parent (); + const int thread_flow_idx = thread_flow_obj.get_index_within_parent (); + + const sarif_code_flow &code_flow_obj = thread_flow_obj.get_parent (); + const int code_flow_idx = code_flow_obj.get_index_within_parent (); + + const sarif_result &result_obj = code_flow_obj.get_parent (); + const int result_idx = result_obj.get_index_within_parent (); + + /* We only support a single run object in the log. */ + const int run_idx = 0; + + char *buf = xasprintf + ("sarif:/runs/%i/results/%i/codeFlows/%i/threadFlows/%i/locations/%i", + run_idx, result_idx, code_flow_idx, thread_flow_idx, location_idx); + return label_text::take (buf); +} + /* Make a "codeFlow" object (SARIF v2.1.0 section 3.36) for PATH. */ std::unique_ptr sarif_builder::make_code_flow_object (sarif_result &result, + unsigned idx_within_parent, const diagnostic_path &path) { - auto code_flow_obj = ::make_unique (); + auto code_flow_obj + = ::make_unique (result, idx_within_parent); - /* "threadFlows" property (SARIF v2.1.0 section 3.36.3). */ - auto thread_flows_arr = ::make_unique (); - - /* Walk the events, consolidating into per-thread threadFlow objects, - using the index with PATH as the overall executionOrder. */ - hash_map, - sarif_thread_flow *> thread_id_map; // borrowed + /* First pass: + Create threadFlows and threadFlowLocation objects within them, + effectively recording a mapping from event_id to threadFlowLocation + so that we can later go from an event_id to a URI within the + SARIF file. */ for (unsigned i = 0; i < path.num_events (); i++) { const diagnostic_event &event = path.get_event (i); const diagnostic_thread_id_t thread_id = event.get_thread_id (); - sarif_thread_flow *thread_flow_obj; - if (sarif_thread_flow **slot = thread_id_map.get (thread_id)) - thread_flow_obj = *slot; - else - { - const diagnostic_thread &thread = path.get_thread (thread_id); - thread_flow_obj = new sarif_thread_flow (thread); - thread_id_map.put (thread_id, thread_flow_obj); // borrowed - thread_flows_arr->append (thread_flow_obj); - } - - /* Add event to thread's threadFlow object. */ - std::unique_ptr thread_flow_loc_obj - = make_thread_flow_location_object (result, event, i); - thread_flow_obj->add_location (std::move (thread_flow_loc_obj)); + sarif_thread_flow &thread_flow_obj + = code_flow_obj->get_or_append_thread_flow (path.get_thread (thread_id), + thread_id); + thread_flow_obj.add_location (); } - code_flow_obj->set ("threadFlows", std::move (thread_flows_arr)); + + /* Second pass: walk the events, populating the tfl objs. */ + m_current_code_flow = code_flow_obj.get (); + for (unsigned i = 0; i < path.num_events (); i++) + { + const diagnostic_event &event = path.get_event (i); + sarif_thread_flow_location &thread_flow_loc_obj + = code_flow_obj->get_thread_flow_loc_obj (i); + populate_thread_flow_location_object (result, + thread_flow_loc_obj, + event, + i); + } + m_current_code_flow = nullptr; return code_flow_obj; } -/* Make a "threadFlowLocation" object (SARIF v2.1.0 section 3.38) for EVENT. */ +/* Populate TFL_OBJ, a "threadFlowLocation" object (SARIF v2.1.0 section 3.38) + based on EVENT. */ -std::unique_ptr -sarif_builder::make_thread_flow_location_object (sarif_result &result, - const diagnostic_event &ev, - int path_event_idx) +void +sarif_builder:: +populate_thread_flow_location_object (sarif_result &result, + sarif_thread_flow_location &tfl_obj, + const diagnostic_event &ev, + int event_execution_idx) { - auto thread_flow_loc_obj = ::make_unique (); - /* Give diagnostic_event subclasses a chance to add custom properties via a property bag. */ - ev.maybe_add_sarif_properties (*thread_flow_loc_obj); + ev.maybe_add_sarif_properties (tfl_obj); /* "location" property (SARIF v2.1.0 section 3.38.3). */ - thread_flow_loc_obj->set + tfl_obj.set ("location", make_location_object (result, ev, diagnostic_artifact_role::traced_file)); /* "kinds" property (SARIF v2.1.0 section 3.38.8). */ diagnostic_event::meaning m = ev.get_meaning (); if (auto kinds_arr = maybe_make_kinds_array (m)) - thread_flow_loc_obj->set ("kinds", std::move (kinds_arr)); + tfl_obj.set ("kinds", std::move (kinds_arr)); /* "nestingLevel" property (SARIF v2.1.0 section 3.38.10). */ - thread_flow_loc_obj->set_integer ("nestingLevel", ev.get_stack_depth ()); + tfl_obj.set_integer ("nestingLevel", ev.get_stack_depth ()); /* "executionOrder" property (SARIF v2.1.0 3.38.11). Offset by 1 to match the human-readable values emitted by %@. */ - thread_flow_loc_obj->set_integer ("executionOrder", path_event_idx + 1); + tfl_obj.set_integer ("executionOrder", event_execution_idx + 1); /* It might be nice to eventually implement the following for -fanalyzer: - the "stack" property (SARIF v2.1.0 section 3.38.5) - the "state" property (SARIF v2.1.0 section 3.38.9) - the "importance" property (SARIF v2.1.0 section 3.38.13). */ - - return thread_flow_loc_obj; } /* If M has any known meaning, make a json array suitable for the "kinds" @@ -2933,6 +3126,8 @@ public: m_builder.emit_diagram (m_context, diagram); } + sarif_builder &get_builder () { return m_builder; } + protected: sarif_output_format (diagnostic_context &context, const line_maps *line_maps, @@ -3008,11 +3203,128 @@ private: char *m_base_file_name; }; +/* Print the start of an embedded link to PP, as per 3.11.6. */ + +static void +sarif_begin_embedded_link (pretty_printer *pp) +{ + pp_character (pp, '['); +} + +/* Print the end of an embedded link to PP, as per 3.11.6. */ + +static void +sarif_end_embedded_link (pretty_printer *pp, + const char *url) +{ + pp_string (pp, "]("); + /* TODO: does the URI need escaping? + See https://github.com/oasis-tcs/sarif-spec/issues/657 */ + pp_string (pp, url); + pp_character (pp, ')'); +} + +/* class sarif_token_printer : public token_printer. */ + +/* Implementation of pretty_printer::token_printer for SARIF output. + Emit URLs as per 3.11.6 ("Messages with embedded links"). */ + +void +sarif_builder::sarif_token_printer::print_tokens (pretty_printer *pp, + const pp_token_list &tokens) +{ + /* Convert to text, possibly with colorization, URLs, etc. */ + label_text current_url; + for (auto iter = tokens.m_first; iter; iter = iter->m_next) + switch (iter->m_kind) + { + default: + gcc_unreachable (); + + case pp_token::kind::text: + { + const pp_token_text *sub = as_a (iter); + const char * const str = sub->m_value.get (); + if (current_url.get ()) + { + /* Write iter->m_value, but escaping any + escaped link characters as per 3.11.6. */ + for (const char *ptr = str; *ptr; ptr++) + { + const char ch = *ptr; + switch (ch) + { + default: + pp_character (pp, ch); + break; + case '\\': + case '[': + case ']': + pp_character (pp, '\\'); + pp_character (pp, ch); + break; + } + } + } + else + /* TODO: is other escaping needed? (e.g. of '[') + See https://github.com/oasis-tcs/sarif-spec/issues/658 */ + pp_string (pp, str); + } + break; + + case pp_token::kind::begin_color: + case pp_token::kind::end_color: + /* These are no-ops. */ + break; + + case pp_token::kind::begin_quote: + pp_begin_quote (pp, pp_show_color (pp)); + break; + case pp_token::kind::end_quote: + pp_end_quote (pp, pp_show_color (pp)); + break; + + /* Emit URLs as per 3.11.6 ("Messages with embedded links"). */ + case pp_token::kind::begin_url: + { + pp_token_begin_url *sub = as_a (iter); + sarif_begin_embedded_link (pp); + current_url = std::move (sub->m_value); + } + break; + case pp_token::kind::end_url: + gcc_assert (current_url.get ()); + sarif_end_embedded_link (pp, current_url.get ()); + current_url = label_text::borrow (nullptr); + break; + + case pp_token::kind::event_id: + { + pp_token_event_id *sub = as_a (iter); + gcc_assert (sub->m_event_id.known_p ()); + const sarif_code_flow *code_flow + = m_builder.get_code_flow_for_event_ids (); + label_text url = make_sarif_url_for_event (code_flow, + sub->m_event_id); + if (url.get ()) + sarif_begin_embedded_link (pp); + pp_character (pp, '('); + pp_decimal_int (pp, sub->m_event_id.one_based ()); + pp_character (pp, ')'); + if (url.get ()) + sarif_end_embedded_link (pp, url.get ()); + } + break; + } +} + /* Populate CONTEXT in preparation for SARIF output (either to stderr, or to a file). */ static void -diagnostic_output_format_init_sarif (diagnostic_context &context) +diagnostic_output_format_init_sarif (diagnostic_context &context, + std::unique_ptr fmt) { /* Suppress normal textual path output. */ context.set_path_format (DPF_NONE); @@ -3023,6 +3335,10 @@ diagnostic_output_format_init_sarif (diagnostic_context &context) /* Don't colorize the text. */ pp_show_color (context.printer) = false; context.set_show_highlight_colors (false); + + context.printer->set_token_printer + (&fmt->get_builder ().get_token_printer ()); + context.set_output_format (fmt.release ()); } /* Populate CONTEXT in preparation for SARIF output to stderr. */ @@ -3034,13 +3350,13 @@ diagnostic_output_format_init_sarif_stderr (diagnostic_context &context, bool formatted) { gcc_assert (line_maps); - diagnostic_output_format_init_sarif (context); - context.set_output_format - (new sarif_stream_output_format (context, - line_maps, - main_input_filename_, - formatted, - stderr)); + diagnostic_output_format_init_sarif + (context, + ::make_unique (context, + line_maps, + main_input_filename_, + formatted, + stderr)); } /* Populate CONTEXT in preparation for SARIF output to a file named @@ -3054,13 +3370,13 @@ diagnostic_output_format_init_sarif_file (diagnostic_context &context, const char *base_file_name) { gcc_assert (line_maps); - diagnostic_output_format_init_sarif (context); - context.set_output_format - (new sarif_file_output_format (context, - line_maps, - main_input_filename_, - formatted, - base_file_name)); + diagnostic_output_format_init_sarif + (context, + ::make_unique (context, + line_maps, + main_input_filename_, + formatted, + base_file_name)); } /* Populate CONTEXT in preparation for SARIF output to STREAM. */ @@ -3073,13 +3389,13 @@ diagnostic_output_format_init_sarif_stream (diagnostic_context &context, FILE *stream) { gcc_assert (line_maps); - diagnostic_output_format_init_sarif (context); - context.set_output_format - (new sarif_stream_output_format (context, - line_maps, - main_input_filename_, - formatted, - stream)); + diagnostic_output_format_init_sarif + (context, + ::make_unique (context, + line_maps, + main_input_filename_, + formatted, + stream)); } #if CHECKING_P @@ -3095,13 +3411,12 @@ class test_sarif_diagnostic_context : public test_diagnostic_context public: test_sarif_diagnostic_context (const char *main_input_filename) { - diagnostic_output_format_init_sarif (*this); - - m_format = new buffered_output_format (*this, - line_table, - main_input_filename, - true); - set_output_format (m_format); // give ownership; + auto format = ::make_unique (*this, + line_table, + main_input_filename, + true); + m_format = format.get (); // borrowed + diagnostic_output_format_init_sarif (*this, std::move (format)); } std::unique_ptr flush_to_object () @@ -3175,7 +3490,7 @@ test_make_location_object (const line_table_case &case_) richloc.add_range (field, SHOW_RANGE_WITHOUT_CARET, &label2); richloc.set_escape_on_output (true); - sarif_result result; + sarif_result result (0); std::unique_ptr location_obj = builder.make_location_object @@ -3471,6 +3786,119 @@ test_simple_log_2 (const line_table_case &case_) } } +/* Assuming that a single diagnostic has been emitted within + LOG, get a json::object for the result object. */ + +static const json::object * +get_result_from_log (const sarif_log *log) +{ + auto runs = EXPECT_JSON_OBJECT_WITH_ARRAY_PROPERTY (log, "runs"); // 3.13.4 + ASSERT_EQ (runs->size (), 1); + + // 3.14 "run" object: + auto run = (*runs)[0]; + + // 3.14.23: + auto results = EXPECT_JSON_OBJECT_WITH_ARRAY_PROPERTY (run, "results"); + ASSERT_EQ (results->size (), 1); + + // 3.27 "result" object: + auto result = (*results)[0]; + return expect_json_object (SELFTEST_LOCATION, result); +} + +/* Assuming that a single diagnostic has been emitted to + DC, get a json::object for the messsage object within + the result. */ + +static const json::object * +get_message_from_log (const sarif_log *log) +{ + auto result_obj = get_result_from_log (log); + + // 3.27.11: + auto message_obj + = EXPECT_JSON_OBJECT_WITH_OBJECT_PROPERTY (result_obj, "message"); + return message_obj; +} + +/* Tests of messages with embedded links; see SARIF v2.1.0 3.11.6. */ + +static void +test_message_with_embedded_link () +{ + auto_fix_quotes fix_quotes; + { + test_sarif_diagnostic_context dc ("test.c"); + rich_location richloc (line_table, UNKNOWN_LOCATION); + dc.report (DK_ERROR, richloc, nullptr, 0, + "before %{text%} after", + "http://example.com"); + std::unique_ptr log = dc.flush_to_object (); + + auto message_obj = get_message_from_log (log.get ()); + ASSERT_JSON_STRING_PROPERTY_EQ + (message_obj, "text", + "before [text](http://example.com) after"); + } + + /* Escaping in message text. + This is "EXAMPLE 1" from 3.11.6. */ + { + test_sarif_diagnostic_context dc ("test.c"); + rich_location richloc (line_table, UNKNOWN_LOCATION); + + /* Disable "unquoted sequence of 2 consecutive punctuation + characters `]\' in format" warning. */ +#if __GNUC__ >= 10 +# pragma GCC diagnostic push +# pragma GCC diagnostic ignored "-Wformat-diag" +#endif + dc.report (DK_ERROR, richloc, nullptr, 0, + "Prohibited term used in %{para[0]\\spans[2]%}.", + "1"); +#if __GNUC__ >= 10 +# pragma GCC diagnostic pop +#endif + + std::unique_ptr log = dc.flush_to_object (); + + auto message_obj = get_message_from_log (log.get ()); + ASSERT_JSON_STRING_PROPERTY_EQ + (message_obj, "text", + "Prohibited term used in [para\\[0\\]\\\\spans\\[2\\]](1)."); + /* This isn't exactly what EXAMPLE 1 of the spec has; reported as + https://github.com/oasis-tcs/sarif-spec/issues/656 */ + } + + /* Urlifier. */ + { + class test_urlifier : public urlifier + { + public: + char * + get_url_for_quoted_text (const char *p, size_t sz) const final override + { + if (!strncmp (p, "-foption", sz)) + return xstrdup ("http://example.com"); + return nullptr; + } + }; + + test_sarif_diagnostic_context dc ("test.c"); + dc.set_urlifier (new test_urlifier ()); + rich_location richloc (line_table, UNKNOWN_LOCATION); + dc.report (DK_ERROR, richloc, nullptr, 0, + "foo %<-foption%> % bar"); + std::unique_ptr log = dc.flush_to_object (); + + auto message_obj = get_message_from_log (log.get ()); + ASSERT_JSON_STRING_PROPERTY_EQ + (message_obj, "text", + "foo `[-foption](http://example.com)' `unrecognized' bar"); + } +} + /* Run all of the selftests within this file. */ void @@ -3479,6 +3907,7 @@ diagnostic_format_sarif_cc_tests () for_each_line_table_case (test_make_location_object); test_simple_log (); for_each_line_table_case (test_simple_log_2); + test_message_with_embedded_link (); } } // namespace selftest diff --git a/gcc/pretty-print-format-impl.h b/gcc/pretty-print-format-impl.h index cffdd461a33..f1996284f4a 100644 --- a/gcc/pretty-print-format-impl.h +++ b/gcc/pretty-print-format-impl.h @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see #define GCC_PRETTY_PRINT_FORMAT_IMPL_H #include "pretty-print.h" +#include "diagnostic-event-id.h" /* A struct representing a pending item to be printed within pp_format. @@ -31,6 +32,7 @@ along with GCC; see the file COPYING3. If not see - begin/end named color - open/close quote - begin/end URL + - event IDs - custom data (for the formatter, for the pretty_printer, or the output format) @@ -72,6 +74,8 @@ public: begin_url, end_url, + event_id, + custom_data, NUM_KINDS @@ -218,6 +222,34 @@ struct pp_token_end_url : public pp_token } }; +struct pp_token_event_id : public pp_token +{ + pp_token_event_id (diagnostic_event_id_t event_id) + : pp_token (kind::event_id), + m_event_id (event_id) + { + gcc_assert (event_id.known_p ()); + } + + diagnostic_event_id_t m_event_id; +}; + +template <> +template <> +inline bool +is_a_helper ::test (pp_token *tok) +{ + return tok->m_kind == pp_token::kind::event_id; +} + +template <> +template <> +inline bool +is_a_helper ::test (const pp_token *tok) +{ + return tok->m_kind == pp_token::kind::event_id; +} + struct pp_token_custom_data : public pp_token { class value diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc index d2c0a197680..fe6b6090f32 100644 --- a/gcc/pretty-print.cc +++ b/gcc/pretty-print.cc @@ -1120,6 +1120,16 @@ pp_token::dump (FILE *out) const case kind::end_url: fprintf (out, "END_URL"); break; + + case kind::event_id: + { + const pp_token_event_id *sub + = as_a (this); + gcc_assert (sub->m_event_id.known_p ()); + fprintf (out, "EVENT((%i))", sub->m_event_id.one_based ()); + } + break; + case kind::custom_data: { const pp_token_custom_data *sub @@ -1984,12 +1994,7 @@ pretty_printer::format (text_info *text) diagnostic_event_id_ptr event_id = va_arg (*text->m_args_ptr, diagnostic_event_id_ptr); gcc_assert (event_id->known_p ()); - - pp_string (this, colorize_start (m_show_color, "path")); - pp_character (this, '('); - pp_decimal_int (this, event_id->one_based ()); - pp_character (this, ')'); - pp_string (this, colorize_stop (m_show_color)); + formatted_tok_list->push_back (*event_id); } break; @@ -2183,6 +2188,18 @@ default_token_printer (pretty_printer *pp, pp_end_url (pp); break; + case pp_token::kind::event_id: + { + pp_token_event_id *sub = as_a (iter); + gcc_assert (sub->m_event_id.known_p ()); + pp_string (pp, colorize_start (pp_show_color (pp), "path")); + pp_character (pp, '('); + pp_decimal_int (pp, sub->m_event_id.one_based ()); + pp_character (pp, ')'); + pp_string (pp, colorize_stop (pp_show_color (pp))); + } + break; + case pp_token::kind::custom_data: /* These should have been eliminated by replace_custom_tokens. */ gcc_unreachable (); diff --git a/gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c b/gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c new file mode 100644 index 00000000000..db274de4b97 --- /dev/null +++ b/gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c @@ -0,0 +1,16 @@ +/* Verify that SARIF output can capture URLs in diagnostics + related to a bad pragma. */ + +/* { dg-do compile } */ +/* { dg-options "-fdiagnostics-format=sarif-file -Wpragmas" } */ + +#pragma GCC diagnostic ignored "-Wmisleading-indenttion" + +int nonempty; + +/* Verify that some JSON was written to a file with the expected name: + { dg-final { verify-sarif-file } } */ + +/* Use a Python script to verify various properties about the generated + .sarif file: + { dg-final { run-sarif-pytest bad-pragma.c "test-bad-pragma.py" } } */ diff --git a/gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py b/gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py new file mode 100644 index 00000000000..140bb338198 --- /dev/null +++ b/gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py @@ -0,0 +1,38 @@ +from sarif import * + +import pytest + +@pytest.fixture(scope='function', autouse=True) +def sarif(): + return sarif_from_env() + +def test_messages_have_embedded_urls(sarif): + runs = sarif['runs'] + run = runs[0] + results = run['results'] + + # We expect a single warning with a secondary location. + # + # The textual form of the diagnostic would look like this: + # . PATH/bad-pragma.c:7:32: warning: unknown option after '#pragma GCC diagnostic' kind [-Wpragmas] + # . 7 | #pragma GCC diagnostic ignored "-Wmisleading-indenttion" + # . | ^~~~~~~~~~~~~~~~~~~~~~~~~ + # . PATH/bad-pragma.c:7:32: note: did you mean '-Wmisleading-indentation'? + assert len(results) == 1 + + result = results[0] + assert result['ruleId'] == '-Wpragmas' + assert result['level'] == 'warning' + assert result['message']['text'] \ + == "unknown option after '[#pragma GCC diagnostic](https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html)' kind" + # Note how we expect an embedded link in the above for the docs for #pragma GCC diagnostic + + # We expect one related location, for the note. + relatedLocations = result['relatedLocations'] + assert len(relatedLocations) == 1 + + rel_loc = relatedLocations[0] + assert rel_loc['message']['text'] \ + == "did you mean '[-Wmisleading-indentation](https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmisleading-indentation)'?" + # Again, we expect an embedded link in the above, this time to the + # docs for the suggested option diff --git a/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py b/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py index 761fe1b59a9..843f89a9485 100644 --- a/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py +++ b/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py @@ -96,9 +96,11 @@ def test_location_relationships(sarif): == " __builtin_free (ptr); // 1st\n" assert threadFlow['locations'][0]['kinds'] == ['release', 'memory'] assert threadFlow['locations'][0]['executionOrder'] == 1 - + + # We should have an embedded link in this event's message to the + # other event's location within the SARIF file: assert threadFlow['locations'][1]['location']['message']['text'] \ - == "second 'free' here; first 'free' was at (1)" + == "second 'free' here; first 'free' was at [(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)" assert threadFlow['locations'][1]['location']['physicalLocation']['contextRegion']['snippet']['text'] \ == " __builtin_free (ptr); // 2nd\n" assert threadFlow['locations'][1]['kinds'] == ['danger']