[PATCH] D53814: Allow the analyzer to output to a SARIF file

Mon Oct 29 14:57:55 PDT 2018

george.karpenkov requested changes to this revision.
george.karpenkov added a comment.
This revision now requires changes to proceed.

I don't think a new PathGenerationScheme is needed, unless you plan changes to BugReporter.cpp.

The code is fine otherwise, but could we try to use `diff` for testing?

================
Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
----------------
aaron.ballman wrote:
> george.karpenkov wrote:
> > Would it make more sense to just use `diff` + json pretty-formatter to write a test?
> > With this test I can't even quite figure out how the output should look like.
> I'm not super comfortable with that approach, but perhaps I'm thinking of something different than what you're proposing. The reason I went with this approach is because diff would be fragile (depends heavily on field ordering, which the JSON support library doesn't give control over) and the physical layout of the file isn't what needs to be tested anyway. SARIF has a fair amount of optional data that can be provided as well, so using a purely textual diff worried me that exporting additional optional data in the future would require extensive unrelated changes to all SARIF diffs in the test directory.
> 
> The goal for this test was to demonstrate that we can validate that the interesting bits of information are present in the output without worrying about the details.
> 
> Also, the python approach allows us to express relationships between data items more easily than a textual diff tool would. I've not used that here, but I could imagine a test where we want to check that each code location has a corresponding file entry in another list.
Using a sample file + diff would have an advantage of being easier to read (just glance at the "Inputs/blah.serif" to see a sample output), and consistent with how we already do checking for plists.

> depends heavily on field ordering

Is it an issue in practice though? I would assume that JSON support library would not switch field ordering too often (and even if it does, we can have a python wrapper testing that)

> SARIF has a fair amount of optional data

Would diff `--ignore-matching-lines` be enough for those?

================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:69
+    return std::string(&C, 1);
+  return "%" + llvm::toHex(StringRef(&C, 1));
+}
----------------
+1, I would use this in other consumers.

================
Comment at: clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h:128
+    /// Used for SARIF output.
+    Sarif,
   };
----------------
Do you actually need a new generation scheme here?
I'm pretty sure that using "Minimal" would give you the same effect.

https://reviews.llvm.org/D53814