[cfe-dev] [RFC] Emit SARIF Diagnostics via -fdiagnostics-format=sarif

Aaron Ballman via cfe-dev cfe-dev at lists.llvm.org
Thu Mar 11 10:22:45 PST 2021


On Thu, Mar 11, 2021 at 1:00 PM Vaibhav Yenamandra (BLOOMBERG/ 919 3RD
A) via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> Hello Everyone,
>
> Below is an RFC on extending the clang `-fdiagnostics-format` option's to
> let clang to emit machine readable json diagnostics. Feedback is highly appreciated!
>
> # Why
> Machine consumable diagnostics are important for writing generic static
> analysis wrappers and harnesses that want to interact with code bases through
> clang, There are two options to consider for the diagnostic format to use in
> clang:
>
> 1. Mimic `gcc-9 -fdiagnostics-format=json`, covered in the previous work section
> 2. Emit [SARIF][0] diagnostic information, a cross-language standardized format
> that is already supported in `clang/lib/StaticAnalyzer` (through `--analyzer-output=sarif`)
>
> We propose (2) as it is a standardized format, which should make it easier for tools to
> implement support for it.

I'd support option #2 -- SARIF has a lot of nice tooling support
that's forming in the industry (such as
https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/uploading-a-sarif-file-to-github).
I'm not super excited about #1 given the existence of #2.

> ## Previous Work
>
> ### `gcc-9 -fdiagnostics-format=json`
> GCC [recently][1] [implemented][2] serializing diagnostics to JSON. This option
> could be implemented as a `-fdiagnostics-format=json-gcc` in clang to signal
> users of its intended interoperability with the corresponding gcc option.
> The schema for this format may be inferred from [current gcc code][3].
>
> While not community standard, it can be expected to be reasonably stable as the
> [original patch][2] states the flag emits machine readable diagnostics.
>
> ## SARIF diagnostics in LLVM
>
> [SARIF][0] (Static Analysis Results Interchange Format) is a standard format
> for the output for static analysis tools.
>
> Clang StaticAnalyzer already implements a SARIF diagnostic consumer in
> [D53814][4], this should allow us to implement (necessary, if any) extra fields
> to the diagnostics output
>
> ### Mapping clang diagnostics to SARIF
>
> This section assumes the typical compiler diagnostic which looks like what is
> provided in the [expressive diagnostics page][5]
>
> In SARIF, the attributes can be mapped to the [`results`][7] property as follows:
> 1. File name where the diagnostic occurs is relocated to the [`physicalLocation`][8]
> property
> 2. Line/Column of the caret marking the error can be stored in the [`region`][9]
> property, this can also encode the source range to which an error corresponds
> 3. The error message can be transferred to the [`message`][10]
> 4. Each of the locations can store the rendered caret & snippet from clang using the
> [`snippet`][12] property for that region
> 5. Nested diagnostics (typically `note` level items) can be represented using the
> [`locationRelationShip`][14] object
> 6. Fixit hints can be communicated through the [`fixes`][13] property

This looks sensible to me.

~Aaron

> ## Interface Changes
>
> We propose the following interface changes:
>
> - Input: Extend the `-fdiagnostics-format` flag to recognize: `-fdiagnostics-format=sarif`
> - Output: Clang will emit SARIF formatted diagnostics when `-fdiagnostics-format=sarif` is provided.
>
> ## Diagnostic Examples
>
> Various examples for what are available on this github gist (which also renders this message in markdown): https://gist.github.com/envp/3a5fdd33115b91c391c22e5e8a5210f4#diagnostic-examples
>
>
> [0]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html
> [1]: https://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9
> [2]: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=478dd60ddcf17773ebd1af367c9dcaee2401f797
> [3]: https://github.com/gcc-mirror/gcc/blob/master/gcc/diagnostic-format-json.cc
> [4]: https://reviews.llvm.org/D53814
> [5]: https://clang.llvm.org/diagnostics.html
> [6]: https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#results
> [7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0-cs01.html#_Toc16012463
> [8]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0-cs01.html#_Toc16012634
> [9]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0-cs01.html#_Toc16012641
> [10]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0-cs01.html#_Toc16012655
> [11]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0-cs01.html#_Toc16012632
> [12]: https://docs.oasis-open.org/sarif/sarif/v2.0/csprd02/sarif-v2.0-csprd02.html#_Toc10127889
> [13]: https://docs.oasis-open.org/sarif/sarif/v2.0/csprd02/sarif-v2.0-csprd02.html#_Toc10128072
> [14]: https://docs.oasis-open.org/sarif/sarif/v2.0/csprd02/sarif-v2.0-csprd02.html#_Toc10127919
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


More information about the cfe-dev mailing list