SARIF output: implement embedded URLs in messages (§3.11.6; PR other/116419)

GCC diagnostic messages can contain URLs, such as to our documentation
when we suggest an option name to correct a misspelling.

SARIF message strings can contain embedded URLs in the plain text
messages (see SARIF v2.1.0 §3.11.6), but previously we were
simply dropping any URLs from the diagnostic messages.

This patch adds support for encoding URLs into messages in our SARIF
output, using the pp_token machinery added in the previous patch.

As well as supporting URLs, the patch also adjusts how we report
event IDs in SARIF message, so that rather than e.g.
  "text": "second 'free' here; first 'free' was at (1)"
we now report:
  "text": "second 'free' here; first 'free' was at [(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)"

i.e. the text "(1)" now has a embedded link referring within the sarif
log to the threadFlowLocation object for the other event, via JSON
pointer (see §3.10.3 "URIs that use the sarif scheme").  Doing so
requires the arious objects to know their index within their containing
array, requiring some reworking of how they are constructed.

gcc/ChangeLog:
	PR other/116419
	* diagnostic-event-id.h (diagnostic_event_id_t::zero_based): New.
	* diagnostic-format-sarif.cc: Include "pretty-print-format-impl.h"
	and "pretty-print-urlifier.h".
	(sarif_result::sarif_result): Add param "idx_within_parent".
	(sarif_result::get_index_within_parent): New accessor.
	(sarif_result::m_idx_within_parent): New field.
	(sarif_code_flow::sarif_code_flow): New ctor.
	(sarif_code_flow::get_parent): New accessor.
	(sarif_code_flow::get_index_within_parent): New accessor.
	(sarif_code_flow::m_parent): New field.
	(sarif_code_flow::m_thread_id_map): New field.
	(sarif_code_flow::m_thread_flows_arr): New field.
	(sarif_code_flow::m_all_tfl_objs): New field.
	(sarif_thread_flow::sarif_thread_flow): Add "parent" and
	"idx_within_parent" params.
	(sarif_thread_flow::get_parent): New accessor.
	(sarif_thread_flow::get_index_within_parent): New accessor.
	(sarif_thread_flow::m_parent): New field.
	(sarif_thread_flow::m_idx_within_parent): New field.
	(sarif_thread_flow_location::sarif_thread_flow_location): New
	ctor.
	(sarif_thread_flow_location::get_parent): New accessor.
	(sarif_thread_flow_location::get_index_within_parent): New
	accessor.
	(sarif_thread_flow_location::m_parent): New field.
	(sarif_thread_flow_location::m_idx_within_parent): New field.
	(sarif_builder::get_code_flow_for_event_ids): New accessor.
	(class sarif_builder::sarif_token_printer): New.
	(sarif_builder::m_token_printer): New member.
	(sarif_builder::m_next_result_idx): New field.
	(sarif_builder::m_current_code_flow): New field.
	(sarif_code_flow::get_or_append_thread_flow): New.
	(sarif_code_flow::get_thread_flow): New.
	(sarif_code_flow::add_location): New.
	(sarif_code_flow::get_thread_flow_loc_obj): New.
	(sarif_thread_flow::add_location): Create the new
	sarif_thread_flow_location internally, rather than passing
	it in as a parm so that we can keep track of its index in
	the array.  Return a reference to it.
	(sarif_builder::sarif_builder): Initialize m_token_printer,
	m_next_result_idx, and m_current_code_flow.
	(sarif_builder::on_report_diagnostic): Pass index to
	make_result_object.
	(sarif_builder::make_result_object): Add "idx_within_parent" param
	and pass to sarif_result ctor.  Pass code flow index to call to
	make_code_flow_object.
	(make_sarif_url_for_event): New.
	(sarif_builder::make_code_flow_object): Add "idx_within_parent"
	param and pass it to sarif_code_flow ctor.  Reimplement walking
	of events so that we first create threadFlow objects for each
	thread, then populate them with threadFlowLocation objects, so
	that the IDs work.  Set m_current_code_flow whilst creating the
	latter, so that we can create correct URIs for "%@".
	(sarif_builder::make_thread_flow_location_object): Replace with...
	(sarif_builder::populate_thread_flow_location_object): ...this.
	(sarif_output_format::get_builder): New accessor.
	(sarif_begin_embedded_link): New.
	(sarif_end_embedded_link): New.
	(sarif_builder::sarif_token_printer::print_tokens): New.
	(diagnostic_output_format_init_sarif): Add "fmt" param; use it to
	set the token printer and output format for the context.
	(diagnostic_output_format_init_sarif_stderr): Move responsibility
	for setting the context's output format to within
	diagnostic_output_format_init_sarif.
	(diagnostic_output_format_init_sarif_file): Likewise.
	(diagnostic_output_format_init_sarif_stream): Likewise.
	(test_sarif_diagnostic_context::test_sarif_diagnostic_context):
	Likewise.
	(selftest::test_make_location_object): Provide an idx for the
	result.
	(selftest::get_result_from_log): New.
	(selftest::get_message_from_log): New.
	(selftest::test_message_with_embedded_link): New test.
	(selftest::diagnostic_format_sarif_cc_tests): Call it.
	* pretty-print-format-impl.h: Include "diagnostic-event-id.h".
	(pp_token::kind): Add "event_id".
	(struct pp_token_event_id): New.
	(is_a_helper <pp_token_event_id *>::test): New.
	(is_a_helper <const pp_token_event_id *>::test): New.
	* pretty-print.cc (pp_token::dump): Handle kind::event_id.
	(pretty_printer::format): Update handling of "%@" in phase 2
	so that we add a pp_token_event_id, rather that the text "(N)".
	(default_token_printer): Handle pp_token::kind::event_id by
	printing the text "(N)".

gcc/testsuite/ChangeLog:
	PR other/116419
	* gcc.dg/sarif-output/bad-pragma.c: New test.
	* gcc.dg/sarif-output/test-bad-pragma.py: New test.
	* gcc.dg/sarif-output/test-include-chain-2.py
	(test_location_relationships): Update expected text of event to
	include an intra-sarif URI to the other event.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
This commit is contained in:
David Malcolm 2024-08-29 18:48:32 -04:00
parent e31b617699
commit aff7f67712
7 changed files with 637 additions and 97 deletions

View file

@ -41,6 +41,12 @@ class diagnostic_event_id_t
bool known_p () const { return m_index != UNKNOWN_EVENT_IDX; }
int zero_based () const
{
gcc_assert (known_p ());
return m_index;
}
int one_based () const
{
gcc_assert (known_p ());

View file

@ -45,6 +45,8 @@ along with GCC; see the file COPYING3. If not see
#include "selftest-diagnostic-show-locus.h"
#include "selftest-json.h"
#include "text-range-label.h"
#include "pretty-print-format-impl.h"
#include "pretty-print-urlifier.h"
/* Forward decls. */
class sarif_builder;
@ -385,7 +387,12 @@ private:
class sarif_result : public sarif_location_manager
{
public:
sarif_result () : m_related_locations_arr (nullptr) {}
sarif_result (unsigned idx_within_parent)
: m_related_locations_arr (nullptr),
m_idx_within_parent (idx_within_parent)
{}
unsigned get_index_within_parent () const { return m_idx_within_parent; }
void
on_nested_diagnostic (diagnostic_context &context,
@ -402,6 +409,7 @@ public:
private:
json::array *m_related_locations_arr; // borrowed
const unsigned m_idx_within_parent;
};
/* Subclass of sarif_object for SARIF "location" objects
@ -460,7 +468,39 @@ private:
/* Subclass of sarif_object for SARIF "codeFlow" objects
(SARIF v2.1.0 section 3.36). */
class sarif_code_flow : public sarif_object {};
class sarif_code_flow : public sarif_object
{
public:
sarif_code_flow (sarif_result &parent,
unsigned idx_within_parent);
sarif_result &get_parent () const { return m_parent; }
unsigned get_index_within_parent () const { return m_idx_within_parent; }
sarif_thread_flow &
get_or_append_thread_flow (const diagnostic_thread &thread,
diagnostic_thread_id_t thread_id);
sarif_thread_flow &
get_thread_flow (diagnostic_thread_id_t thread_id);
void add_location (sarif_thread_flow_location &);
sarif_thread_flow_location &
get_thread_flow_loc_obj (diagnostic_event_id_t event_id) const;
private:
sarif_result &m_parent;
const unsigned m_idx_within_parent;
hash_map<int_hash<diagnostic_thread_id_t, -1, -2>,
sarif_thread_flow *> m_thread_id_map; // borrowed ptr
json::array *m_thread_flows_arr; // borrowed
/* Vec of borrowed ptr, allowing for going easily from
an event_id to the corresponding threadFlowLocation object. */
std::vector<sarif_thread_flow_location *> m_all_tfl_objs;
};
/* Subclass of sarif_object for SARIF "threadFlow" objects
(SARIF v2.1.0 section 3.37). */
@ -468,20 +508,41 @@ class sarif_code_flow : public sarif_object {};
class sarif_thread_flow : public sarif_object
{
public:
sarif_thread_flow (const diagnostic_thread &thread);
sarif_thread_flow (sarif_code_flow &parent,
const diagnostic_thread &thread,
unsigned idx_within_parent);
void
add_location
(std::unique_ptr<sarif_thread_flow_location> thread_flow_loc_obj);
sarif_code_flow &get_parent () const { return m_parent; }
unsigned get_index_within_parent () const { return m_idx_within_parent; }
sarif_thread_flow_location &add_location ();
private:
sarif_code_flow &m_parent;
json::array *m_locations_arr; // borrowed
const unsigned m_idx_within_parent;
};
/* Subclass of sarif_object for SARIF "threadFlowLocation" objects
(SARIF v2.1.0 section 3.38). */
class sarif_thread_flow_location : public sarif_object {};
class sarif_thread_flow_location : public sarif_object
{
public:
sarif_thread_flow_location (sarif_thread_flow &parent,
unsigned idx_within_parent)
: m_parent (parent),
m_idx_within_parent (idx_within_parent)
{
}
sarif_thread_flow &get_parent () const { return m_parent; }
unsigned get_index_within_parent () const { return m_idx_within_parent; }
private:
sarif_thread_flow &m_parent;
const unsigned m_idx_within_parent;
};
/* Subclass of sarif_object for SARIF "reportingDescriptor" objects
(SARIF v2.1.0 section 3.49). */
@ -632,11 +693,33 @@ public:
std::unique_ptr<sarif_artifact_location>
make_artifact_location_object (const char *filename);
const sarif_code_flow *
get_code_flow_for_event_ids () const
{
return m_current_code_flow;
}
token_printer &get_token_printer () { return m_token_printer; }
private:
class sarif_token_printer : public token_printer
{
public:
sarif_token_printer (sarif_builder &builder)
: m_builder (builder)
{
}
void print_tokens (pretty_printer *pp,
const pp_token_list &tokens) final override;
private:
sarif_builder &m_builder;
};
std::unique_ptr<sarif_result>
make_result_object (diagnostic_context &context,
const diagnostic_info &diagnostic,
diagnostic_t orig_diag_kind);
diagnostic_t orig_diag_kind,
unsigned idx_within_parent);
void
add_any_include_chain (sarif_location_manager &loc_mgr,
sarif_location &location_obj,
@ -650,11 +733,13 @@ private:
enum diagnostic_artifact_role role);
std::unique_ptr<sarif_code_flow>
make_code_flow_object (sarif_result &result,
unsigned idx_within_parent,
const diagnostic_path &path);
std::unique_ptr<sarif_thread_flow_location>
make_thread_flow_location_object (sarif_result &result,
const diagnostic_event &event,
int path_event_idx);
void
populate_thread_flow_location_object (sarif_result &result,
sarif_thread_flow_location &thread_flow_loc_obj,
const diagnostic_event &event,
int event_execution_idx);
std::unique_ptr<json::array>
maybe_make_kinds_array (diagnostic_event::meaning m) const;
std::unique_ptr<sarif_physical_location>
@ -725,6 +810,7 @@ private:
diagnostic_context &m_context;
const line_maps *m_line_maps;
sarif_token_printer m_token_printer;
/* The JSON object for the invocation object. */
std::unique_ptr<sarif_invocation> m_invocation_obj;
@ -751,6 +837,9 @@ private:
int m_tabstop;
bool m_formatted;
unsigned m_next_result_idx;
sarif_code_flow *m_current_code_flow;
};
/* class sarif_object : public json::object. */
@ -1294,9 +1383,67 @@ lazily_add_kind (enum location_relationship_kind kind)
kinds_arr->append_string (kind_str);
}
/* class sarif_code_flow : public sarif_object. */
sarif_code_flow::sarif_code_flow (sarif_result &parent,
unsigned idx_within_parent)
: m_parent (parent),
m_idx_within_parent (idx_within_parent)
{
/* "threadFlows" property (SARIF v2.1.0 section 3.36.3). */
auto thread_flows_arr = ::make_unique<json::array> ();
m_thread_flows_arr = thread_flows_arr.get (); // borrowed
set<json::array> ("threadFlows", std::move (thread_flows_arr));
}
sarif_thread_flow &
sarif_code_flow::get_or_append_thread_flow (const diagnostic_thread &thread,
diagnostic_thread_id_t thread_id)
{
sarif_thread_flow **slot = m_thread_id_map.get (thread_id);
if (slot)
return **slot;
unsigned next_thread_flow_idx = m_thread_flows_arr->size ();
auto thread_flow_obj
= ::make_unique<sarif_thread_flow> (*this, thread, next_thread_flow_idx);
m_thread_id_map.put (thread_id, thread_flow_obj.get ()); // borrowed
sarif_thread_flow *result = thread_flow_obj.get ();
m_thread_flows_arr->append<sarif_thread_flow> (std::move (thread_flow_obj));
return *result;
}
sarif_thread_flow &
sarif_code_flow::get_thread_flow (diagnostic_thread_id_t thread_id)
{
sarif_thread_flow **slot = m_thread_id_map.get (thread_id);
gcc_assert (slot); // it must already have one
return **slot;
}
void
sarif_code_flow::add_location (sarif_thread_flow_location &tfl_obj)
{
m_all_tfl_objs.push_back (&tfl_obj);
}
sarif_thread_flow_location &
sarif_code_flow::get_thread_flow_loc_obj (diagnostic_event_id_t event_id) const
{
gcc_assert (event_id.known_p ());
gcc_assert ((size_t)event_id.zero_based () < m_all_tfl_objs.size ());
sarif_thread_flow_location *tfl_obj = m_all_tfl_objs[event_id.zero_based ()];
gcc_assert (tfl_obj);
return *tfl_obj;
}
/* class sarif_thread_flow : public sarif_object. */
sarif_thread_flow::sarif_thread_flow (const diagnostic_thread &thread)
sarif_thread_flow::sarif_thread_flow (sarif_code_flow &parent,
const diagnostic_thread &thread,
unsigned idx_within_parent)
: m_parent (parent),
m_idx_within_parent (idx_within_parent)
{
/* "id" property (SARIF v2.1.0 section 3.37.2). */
label_text name (thread.get_name (false));
@ -1310,11 +1457,18 @@ sarif_thread_flow::sarif_thread_flow (const diagnostic_thread &thread)
set ("locations", m_locations_arr);
}
void
sarif_thread_flow::
add_location (std::unique_ptr<sarif_thread_flow_location> thread_flow_loc_obj)
/* Add a sarif_thread_flow_location to this threadFlow object, but
don't populate it yet. */
sarif_thread_flow_location &
sarif_thread_flow::add_location ()
{
m_locations_arr->append (std::move (thread_flow_loc_obj));
const unsigned thread_flow_location_idx = m_locations_arr->size ();
sarif_thread_flow_location *thread_flow_loc_obj
= new sarif_thread_flow_location (*this, thread_flow_location_idx);
m_locations_arr->append (thread_flow_loc_obj);
m_parent.add_location (*thread_flow_loc_obj);
return *thread_flow_loc_obj;
}
/* class sarif_builder. */
@ -1327,6 +1481,7 @@ sarif_builder::sarif_builder (diagnostic_context &context,
bool formatted)
: m_context (context),
m_line_maps (line_maps),
m_token_printer (*this),
m_invocation_obj
(::make_unique<sarif_invocation> (*this,
context.get_original_argv ())),
@ -1336,7 +1491,9 @@ sarif_builder::sarif_builder (diagnostic_context &context,
m_rule_id_set (),
m_rules_arr (new json::array ()),
m_tabstop (context.m_tabstop),
m_formatted (formatted)
m_formatted (formatted),
m_next_result_idx (0),
m_current_code_flow (nullptr)
{
gcc_assert (m_line_maps);
@ -1376,7 +1533,8 @@ sarif_builder::on_report_diagnostic (diagnostic_context &context,
{
/* Top-level diagnostic. */
m_cur_group_result
= make_result_object (context, diagnostic, orig_diag_kind);
= make_result_object (context, diagnostic, orig_diag_kind,
m_next_result_idx++);
}
}
@ -1476,9 +1634,10 @@ make_rule_id_for_diagnostic_kind (diagnostic_t diag_kind)
std::unique_ptr<sarif_result>
sarif_builder::make_result_object (diagnostic_context &context,
const diagnostic_info &diagnostic,
diagnostic_t orig_diag_kind)
diagnostic_t orig_diag_kind,
unsigned idx_within_parent)
{
auto result_obj = ::make_unique<sarif_result> ();
auto result_obj = ::make_unique<sarif_result> (idx_within_parent);
/* "ruleId" property (SARIF v2.1.0 section 3.27.5). */
/* Ideally we'd have an option_name for these. */
@ -1552,8 +1711,10 @@ sarif_builder::make_result_object (diagnostic_context &context,
if (const diagnostic_path *path = diagnostic.richloc->get_path ())
{
auto code_flows_arr = ::make_unique<json::array> ();
const unsigned code_flow_index = 0;
code_flows_arr->append<sarif_code_flow>
(make_code_flow_object (*result_obj.get (),
code_flow_index,
*path));
result_obj->set<json::array> ("codeFlows", std::move (code_flows_arr));
}
@ -2272,83 +2433,115 @@ make_sarif_logical_location_object (const logical_location &logical_loc)
return logical_loc_obj;
}
label_text
make_sarif_url_for_event (const sarif_code_flow *code_flow,
diagnostic_event_id_t event_id)
{
gcc_assert (event_id.known_p ());
if (!code_flow)
return label_text ();
const sarif_thread_flow_location &tfl_obj
= code_flow->get_thread_flow_loc_obj (event_id);
const int location_idx = tfl_obj.get_index_within_parent ();
const sarif_thread_flow &thread_flow_obj = tfl_obj.get_parent ();
const int thread_flow_idx = thread_flow_obj.get_index_within_parent ();
const sarif_code_flow &code_flow_obj = thread_flow_obj.get_parent ();
const int code_flow_idx = code_flow_obj.get_index_within_parent ();
const sarif_result &result_obj = code_flow_obj.get_parent ();
const int result_idx = result_obj.get_index_within_parent ();
/* We only support a single run object in the log. */
const int run_idx = 0;
char *buf = xasprintf
("sarif:/runs/%i/results/%i/codeFlows/%i/threadFlows/%i/locations/%i",
run_idx, result_idx, code_flow_idx, thread_flow_idx, location_idx);
return label_text::take (buf);
}
/* Make a "codeFlow" object (SARIF v2.1.0 section 3.36) for PATH. */
std::unique_ptr<sarif_code_flow>
sarif_builder::make_code_flow_object (sarif_result &result,
unsigned idx_within_parent,
const diagnostic_path &path)
{
auto code_flow_obj = ::make_unique <sarif_code_flow> ();
auto code_flow_obj
= ::make_unique <sarif_code_flow> (result, idx_within_parent);
/* "threadFlows" property (SARIF v2.1.0 section 3.36.3). */
auto thread_flows_arr = ::make_unique<json::array> ();
/* Walk the events, consolidating into per-thread threadFlow objects,
using the index with PATH as the overall executionOrder. */
hash_map<int_hash<diagnostic_thread_id_t, -1, -2>,
sarif_thread_flow *> thread_id_map; // borrowed
/* First pass:
Create threadFlows and threadFlowLocation objects within them,
effectively recording a mapping from event_id to threadFlowLocation
so that we can later go from an event_id to a URI within the
SARIF file. */
for (unsigned i = 0; i < path.num_events (); i++)
{
const diagnostic_event &event = path.get_event (i);
const diagnostic_thread_id_t thread_id = event.get_thread_id ();
sarif_thread_flow *thread_flow_obj;
if (sarif_thread_flow **slot = thread_id_map.get (thread_id))
thread_flow_obj = *slot;
else
{
const diagnostic_thread &thread = path.get_thread (thread_id);
thread_flow_obj = new sarif_thread_flow (thread);
thread_id_map.put (thread_id, thread_flow_obj); // borrowed
thread_flows_arr->append (thread_flow_obj);
}
/* Add event to thread's threadFlow object. */
std::unique_ptr<sarif_thread_flow_location> thread_flow_loc_obj
= make_thread_flow_location_object (result, event, i);
thread_flow_obj->add_location (std::move (thread_flow_loc_obj));
sarif_thread_flow &thread_flow_obj
= code_flow_obj->get_or_append_thread_flow (path.get_thread (thread_id),
thread_id);
thread_flow_obj.add_location ();
}
code_flow_obj->set<json::array> ("threadFlows", std::move (thread_flows_arr));
/* Second pass: walk the events, populating the tfl objs. */
m_current_code_flow = code_flow_obj.get ();
for (unsigned i = 0; i < path.num_events (); i++)
{
const diagnostic_event &event = path.get_event (i);
sarif_thread_flow_location &thread_flow_loc_obj
= code_flow_obj->get_thread_flow_loc_obj (i);
populate_thread_flow_location_object (result,
thread_flow_loc_obj,
event,
i);
}
m_current_code_flow = nullptr;
return code_flow_obj;
}
/* Make a "threadFlowLocation" object (SARIF v2.1.0 section 3.38) for EVENT. */
/* Populate TFL_OBJ, a "threadFlowLocation" object (SARIF v2.1.0 section 3.38)
based on EVENT. */
std::unique_ptr<sarif_thread_flow_location>
sarif_builder::make_thread_flow_location_object (sarif_result &result,
const diagnostic_event &ev,
int path_event_idx)
void
sarif_builder::
populate_thread_flow_location_object (sarif_result &result,
sarif_thread_flow_location &tfl_obj,
const diagnostic_event &ev,
int event_execution_idx)
{
auto thread_flow_loc_obj = ::make_unique<sarif_thread_flow_location> ();
/* Give diagnostic_event subclasses a chance to add custom properties
via a property bag. */
ev.maybe_add_sarif_properties (*thread_flow_loc_obj);
ev.maybe_add_sarif_properties (tfl_obj);
/* "location" property (SARIF v2.1.0 section 3.38.3). */
thread_flow_loc_obj->set<sarif_location>
tfl_obj.set<sarif_location>
("location",
make_location_object (result, ev, diagnostic_artifact_role::traced_file));
/* "kinds" property (SARIF v2.1.0 section 3.38.8). */
diagnostic_event::meaning m = ev.get_meaning ();
if (auto kinds_arr = maybe_make_kinds_array (m))
thread_flow_loc_obj->set<json::array> ("kinds", std::move (kinds_arr));
tfl_obj.set<json::array> ("kinds", std::move (kinds_arr));
/* "nestingLevel" property (SARIF v2.1.0 section 3.38.10). */
thread_flow_loc_obj->set_integer ("nestingLevel", ev.get_stack_depth ());
tfl_obj.set_integer ("nestingLevel", ev.get_stack_depth ());
/* "executionOrder" property (SARIF v2.1.0 3.38.11).
Offset by 1 to match the human-readable values emitted by %@. */
thread_flow_loc_obj->set_integer ("executionOrder", path_event_idx + 1);
tfl_obj.set_integer ("executionOrder", event_execution_idx + 1);
/* It might be nice to eventually implement the following for -fanalyzer:
- the "stack" property (SARIF v2.1.0 section 3.38.5)
- the "state" property (SARIF v2.1.0 section 3.38.9)
- the "importance" property (SARIF v2.1.0 section 3.38.13). */
return thread_flow_loc_obj;
}
/* If M has any known meaning, make a json array suitable for the "kinds"
@ -2933,6 +3126,8 @@ public:
m_builder.emit_diagram (m_context, diagram);
}
sarif_builder &get_builder () { return m_builder; }
protected:
sarif_output_format (diagnostic_context &context,
const line_maps *line_maps,
@ -3008,11 +3203,128 @@ private:
char *m_base_file_name;
};
/* Print the start of an embedded link to PP, as per 3.11.6. */
static void
sarif_begin_embedded_link (pretty_printer *pp)
{
pp_character (pp, '[');
}
/* Print the end of an embedded link to PP, as per 3.11.6. */
static void
sarif_end_embedded_link (pretty_printer *pp,
const char *url)
{
pp_string (pp, "](");
/* TODO: does the URI need escaping?
See https://github.com/oasis-tcs/sarif-spec/issues/657 */
pp_string (pp, url);
pp_character (pp, ')');
}
/* class sarif_token_printer : public token_printer. */
/* Implementation of pretty_printer::token_printer for SARIF output.
Emit URLs as per 3.11.6 ("Messages with embedded links"). */
void
sarif_builder::sarif_token_printer::print_tokens (pretty_printer *pp,
const pp_token_list &tokens)
{
/* Convert to text, possibly with colorization, URLs, etc. */
label_text current_url;
for (auto iter = tokens.m_first; iter; iter = iter->m_next)
switch (iter->m_kind)
{
default:
gcc_unreachable ();
case pp_token::kind::text:
{
const pp_token_text *sub = as_a <const pp_token_text *> (iter);
const char * const str = sub->m_value.get ();
if (current_url.get ())
{
/* Write iter->m_value, but escaping any
escaped link characters as per 3.11.6. */
for (const char *ptr = str; *ptr; ptr++)
{
const char ch = *ptr;
switch (ch)
{
default:
pp_character (pp, ch);
break;
case '\\':
case '[':
case ']':
pp_character (pp, '\\');
pp_character (pp, ch);
break;
}
}
}
else
/* TODO: is other escaping needed? (e.g. of '[')
See https://github.com/oasis-tcs/sarif-spec/issues/658 */
pp_string (pp, str);
}
break;
case pp_token::kind::begin_color:
case pp_token::kind::end_color:
/* These are no-ops. */
break;
case pp_token::kind::begin_quote:
pp_begin_quote (pp, pp_show_color (pp));
break;
case pp_token::kind::end_quote:
pp_end_quote (pp, pp_show_color (pp));
break;
/* Emit URLs as per 3.11.6 ("Messages with embedded links"). */
case pp_token::kind::begin_url:
{
pp_token_begin_url *sub = as_a <pp_token_begin_url *> (iter);
sarif_begin_embedded_link (pp);
current_url = std::move (sub->m_value);
}
break;
case pp_token::kind::end_url:
gcc_assert (current_url.get ());
sarif_end_embedded_link (pp, current_url.get ());
current_url = label_text::borrow (nullptr);
break;
case pp_token::kind::event_id:
{
pp_token_event_id *sub = as_a <pp_token_event_id *> (iter);
gcc_assert (sub->m_event_id.known_p ());
const sarif_code_flow *code_flow
= m_builder.get_code_flow_for_event_ids ();
label_text url = make_sarif_url_for_event (code_flow,
sub->m_event_id);
if (url.get ())
sarif_begin_embedded_link (pp);
pp_character (pp, '(');
pp_decimal_int (pp, sub->m_event_id.one_based ());
pp_character (pp, ')');
if (url.get ())
sarif_end_embedded_link (pp, url.get ());
}
break;
}
}
/* Populate CONTEXT in preparation for SARIF output (either to stderr, or
to a file). */
static void
diagnostic_output_format_init_sarif (diagnostic_context &context)
diagnostic_output_format_init_sarif (diagnostic_context &context,
std::unique_ptr<sarif_output_format> fmt)
{
/* Suppress normal textual path output. */
context.set_path_format (DPF_NONE);
@ -3023,6 +3335,10 @@ diagnostic_output_format_init_sarif (diagnostic_context &context)
/* Don't colorize the text. */
pp_show_color (context.printer) = false;
context.set_show_highlight_colors (false);
context.printer->set_token_printer
(&fmt->get_builder ().get_token_printer ());
context.set_output_format (fmt.release ());
}
/* Populate CONTEXT in preparation for SARIF output to stderr. */
@ -3034,13 +3350,13 @@ diagnostic_output_format_init_sarif_stderr (diagnostic_context &context,
bool formatted)
{
gcc_assert (line_maps);
diagnostic_output_format_init_sarif (context);
context.set_output_format
(new sarif_stream_output_format (context,
line_maps,
main_input_filename_,
formatted,
stderr));
diagnostic_output_format_init_sarif
(context,
::make_unique<sarif_stream_output_format> (context,
line_maps,
main_input_filename_,
formatted,
stderr));
}
/* Populate CONTEXT in preparation for SARIF output to a file named
@ -3054,13 +3370,13 @@ diagnostic_output_format_init_sarif_file (diagnostic_context &context,
const char *base_file_name)
{
gcc_assert (line_maps);
diagnostic_output_format_init_sarif (context);
context.set_output_format
(new sarif_file_output_format (context,
line_maps,
main_input_filename_,
formatted,
base_file_name));
diagnostic_output_format_init_sarif
(context,
::make_unique<sarif_file_output_format> (context,
line_maps,
main_input_filename_,
formatted,
base_file_name));
}
/* Populate CONTEXT in preparation for SARIF output to STREAM. */
@ -3073,13 +3389,13 @@ diagnostic_output_format_init_sarif_stream (diagnostic_context &context,
FILE *stream)
{
gcc_assert (line_maps);
diagnostic_output_format_init_sarif (context);
context.set_output_format
(new sarif_stream_output_format (context,
line_maps,
main_input_filename_,
formatted,
stream));
diagnostic_output_format_init_sarif
(context,
::make_unique<sarif_stream_output_format> (context,
line_maps,
main_input_filename_,
formatted,
stream));
}
#if CHECKING_P
@ -3095,13 +3411,12 @@ class test_sarif_diagnostic_context : public test_diagnostic_context
public:
test_sarif_diagnostic_context (const char *main_input_filename)
{
diagnostic_output_format_init_sarif (*this);
m_format = new buffered_output_format (*this,
line_table,
main_input_filename,
true);
set_output_format (m_format); // give ownership;
auto format = ::make_unique<buffered_output_format> (*this,
line_table,
main_input_filename,
true);
m_format = format.get (); // borrowed
diagnostic_output_format_init_sarif (*this, std::move (format));
}
std::unique_ptr<sarif_log> flush_to_object ()
@ -3175,7 +3490,7 @@ test_make_location_object (const line_table_case &case_)
richloc.add_range (field, SHOW_RANGE_WITHOUT_CARET, &label2);
richloc.set_escape_on_output (true);
sarif_result result;
sarif_result result (0);
std::unique_ptr<sarif_location> location_obj
= builder.make_location_object
@ -3471,6 +3786,119 @@ test_simple_log_2 (const line_table_case &case_)
}
}
/* Assuming that a single diagnostic has been emitted within
LOG, get a json::object for the result object. */
static const json::object *
get_result_from_log (const sarif_log *log)
{
auto runs = EXPECT_JSON_OBJECT_WITH_ARRAY_PROPERTY (log, "runs"); // 3.13.4
ASSERT_EQ (runs->size (), 1);
// 3.14 "run" object:
auto run = (*runs)[0];
// 3.14.23:
auto results = EXPECT_JSON_OBJECT_WITH_ARRAY_PROPERTY (run, "results");
ASSERT_EQ (results->size (), 1);
// 3.27 "result" object:
auto result = (*results)[0];
return expect_json_object (SELFTEST_LOCATION, result);
}
/* Assuming that a single diagnostic has been emitted to
DC, get a json::object for the messsage object within
the result. */
static const json::object *
get_message_from_log (const sarif_log *log)
{
auto result_obj = get_result_from_log (log);
// 3.27.11:
auto message_obj
= EXPECT_JSON_OBJECT_WITH_OBJECT_PROPERTY (result_obj, "message");
return message_obj;
}
/* Tests of messages with embedded links; see SARIF v2.1.0 3.11.6. */
static void
test_message_with_embedded_link ()
{
auto_fix_quotes fix_quotes;
{
test_sarif_diagnostic_context dc ("test.c");
rich_location richloc (line_table, UNKNOWN_LOCATION);
dc.report (DK_ERROR, richloc, nullptr, 0,
"before %{text%} after",
"http://example.com");
std::unique_ptr<sarif_log> log = dc.flush_to_object ();
auto message_obj = get_message_from_log (log.get ());
ASSERT_JSON_STRING_PROPERTY_EQ
(message_obj, "text",
"before [text](http://example.com) after");
}
/* Escaping in message text.
This is "EXAMPLE 1" from 3.11.6. */
{
test_sarif_diagnostic_context dc ("test.c");
rich_location richloc (line_table, UNKNOWN_LOCATION);
/* Disable "unquoted sequence of 2 consecutive punctuation
characters `]\' in format" warning. */
#if __GNUC__ >= 10
# pragma GCC diagnostic push
# pragma GCC diagnostic ignored "-Wformat-diag"
#endif
dc.report (DK_ERROR, richloc, nullptr, 0,
"Prohibited term used in %{para[0]\\spans[2]%}.",
"1");
#if __GNUC__ >= 10
# pragma GCC diagnostic pop
#endif
std::unique_ptr<sarif_log> log = dc.flush_to_object ();
auto message_obj = get_message_from_log (log.get ());
ASSERT_JSON_STRING_PROPERTY_EQ
(message_obj, "text",
"Prohibited term used in [para\\[0\\]\\\\spans\\[2\\]](1).");
/* This isn't exactly what EXAMPLE 1 of the spec has; reported as
https://github.com/oasis-tcs/sarif-spec/issues/656 */
}
/* Urlifier. */
{
class test_urlifier : public urlifier
{
public:
char *
get_url_for_quoted_text (const char *p, size_t sz) const final override
{
if (!strncmp (p, "-foption", sz))
return xstrdup ("http://example.com");
return nullptr;
}
};
test_sarif_diagnostic_context dc ("test.c");
dc.set_urlifier (new test_urlifier ());
rich_location richloc (line_table, UNKNOWN_LOCATION);
dc.report (DK_ERROR, richloc, nullptr, 0,
"foo %<-foption%> %<unrecognized%> bar");
std::unique_ptr<sarif_log> log = dc.flush_to_object ();
auto message_obj = get_message_from_log (log.get ());
ASSERT_JSON_STRING_PROPERTY_EQ
(message_obj, "text",
"foo `[-foption](http://example.com)' `unrecognized' bar");
}
}
/* Run all of the selftests within this file. */
void
@ -3479,6 +3907,7 @@ diagnostic_format_sarif_cc_tests ()
for_each_line_table_case (test_make_location_object);
test_simple_log ();
for_each_line_table_case (test_simple_log_2);
test_message_with_embedded_link ();
}
} // namespace selftest

View file

@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see
#define GCC_PRETTY_PRINT_FORMAT_IMPL_H
#include "pretty-print.h"
#include "diagnostic-event-id.h"
/* A struct representing a pending item to be printed within
pp_format.
@ -31,6 +32,7 @@ along with GCC; see the file COPYING3. If not see
- begin/end named color
- open/close quote
- begin/end URL
- event IDs
- custom data (for the formatter, for the pretty_printer,
or the output format)
@ -72,6 +74,8 @@ public:
begin_url,
end_url,
event_id,
custom_data,
NUM_KINDS
@ -218,6 +222,34 @@ struct pp_token_end_url : public pp_token
}
};
struct pp_token_event_id : public pp_token
{
pp_token_event_id (diagnostic_event_id_t event_id)
: pp_token (kind::event_id),
m_event_id (event_id)
{
gcc_assert (event_id.known_p ());
}
diagnostic_event_id_t m_event_id;
};
template <>
template <>
inline bool
is_a_helper <pp_token_event_id *>::test (pp_token *tok)
{
return tok->m_kind == pp_token::kind::event_id;
}
template <>
template <>
inline bool
is_a_helper <const pp_token_event_id *>::test (const pp_token *tok)
{
return tok->m_kind == pp_token::kind::event_id;
}
struct pp_token_custom_data : public pp_token
{
class value

View file

@ -1120,6 +1120,16 @@ pp_token::dump (FILE *out) const
case kind::end_url:
fprintf (out, "END_URL");
break;
case kind::event_id:
{
const pp_token_event_id *sub
= as_a <const pp_token_event_id *> (this);
gcc_assert (sub->m_event_id.known_p ());
fprintf (out, "EVENT((%i))", sub->m_event_id.one_based ());
}
break;
case kind::custom_data:
{
const pp_token_custom_data *sub
@ -1984,12 +1994,7 @@ pretty_printer::format (text_info *text)
diagnostic_event_id_ptr event_id
= va_arg (*text->m_args_ptr, diagnostic_event_id_ptr);
gcc_assert (event_id->known_p ());
pp_string (this, colorize_start (m_show_color, "path"));
pp_character (this, '(');
pp_decimal_int (this, event_id->one_based ());
pp_character (this, ')');
pp_string (this, colorize_stop (m_show_color));
formatted_tok_list->push_back<pp_token_event_id> (*event_id);
}
break;
@ -2183,6 +2188,18 @@ default_token_printer (pretty_printer *pp,
pp_end_url (pp);
break;
case pp_token::kind::event_id:
{
pp_token_event_id *sub = as_a <pp_token_event_id *> (iter);
gcc_assert (sub->m_event_id.known_p ());
pp_string (pp, colorize_start (pp_show_color (pp), "path"));
pp_character (pp, '(');
pp_decimal_int (pp, sub->m_event_id.one_based ());
pp_character (pp, ')');
pp_string (pp, colorize_stop (pp_show_color (pp)));
}
break;
case pp_token::kind::custom_data:
/* These should have been eliminated by replace_custom_tokens. */
gcc_unreachable ();

View file

@ -0,0 +1,16 @@
/* Verify that SARIF output can capture URLs in diagnostics
related to a bad pragma. */
/* { dg-do compile } */
/* { dg-options "-fdiagnostics-format=sarif-file -Wpragmas" } */
#pragma GCC diagnostic ignored "-Wmisleading-indenttion"
int nonempty;
/* Verify that some JSON was written to a file with the expected name:
{ dg-final { verify-sarif-file } } */
/* Use a Python script to verify various properties about the generated
.sarif file:
{ dg-final { run-sarif-pytest bad-pragma.c "test-bad-pragma.py" } } */

View file

@ -0,0 +1,38 @@
from sarif import *
import pytest
@pytest.fixture(scope='function', autouse=True)
def sarif():
return sarif_from_env()
def test_messages_have_embedded_urls(sarif):
runs = sarif['runs']
run = runs[0]
results = run['results']
# We expect a single warning with a secondary location.
#
# The textual form of the diagnostic would look like this:
# . PATH/bad-pragma.c:7:32: warning: unknown option after '#pragma GCC diagnostic' kind [-Wpragmas]
# . 7 | #pragma GCC diagnostic ignored "-Wmisleading-indenttion"
# . | ^~~~~~~~~~~~~~~~~~~~~~~~~
# . PATH/bad-pragma.c:7:32: note: did you mean '-Wmisleading-indentation'?
assert len(results) == 1
result = results[0]
assert result['ruleId'] == '-Wpragmas'
assert result['level'] == 'warning'
assert result['message']['text'] \
== "unknown option after '[#pragma GCC diagnostic](https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html)' kind"
# Note how we expect an embedded link in the above for the docs for #pragma GCC diagnostic
# We expect one related location, for the note.
relatedLocations = result['relatedLocations']
assert len(relatedLocations) == 1
rel_loc = relatedLocations[0]
assert rel_loc['message']['text'] \
== "did you mean '[-Wmisleading-indentation](https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmisleading-indentation)'?"
# Again, we expect an embedded link in the above, this time to the
# docs for the suggested option

View file

@ -96,9 +96,11 @@ def test_location_relationships(sarif):
== " __builtin_free (ptr); // 1st\n"
assert threadFlow['locations'][0]['kinds'] == ['release', 'memory']
assert threadFlow['locations'][0]['executionOrder'] == 1
# We should have an embedded link in this event's message to the
# other event's location within the SARIF file:
assert threadFlow['locations'][1]['location']['message']['text'] \
== "second 'free' here; first 'free' was at (1)"
== "second 'free' here; first 'free' was at [(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)"
assert threadFlow['locations'][1]['location']['physicalLocation']['contextRegion']['snippet']['text'] \
== " __builtin_free (ptr); // 2nd\n"
assert threadFlow['locations'][1]['kinds'] == ['danger']